pydeseq2.DeseqStats.DeseqStats

class pydeseq2.DeseqStats.DeseqStats(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)

Bases: object

PyDESeq2 statistical tests for differential expression.

Implements p-value estimation for differential gene expression according to the DESeq2 pipeline 1.

Also supports apeGLM log-fold change shrinkage 2.

Parameters
  • dds (DeseqDataSet) – DeseqDataSet for which dispersion and LFCs were already estimated.

  • contrast (list[str] or None) – A list of three strings, in the following format: [‘variable_of_interest’, ‘tested_level’, ‘reference_level’]. Names must correspond to the clinical data passed to the DeseqDataSet. E.g., [‘condition’, ‘B’, ‘A’] will measure the LFC of ‘condition B’ compared to ‘condition A’. If None, the last variable from the design matrix is chosen as the variable of interest, and the reference level is picked alphabetically. (default: None).

  • alpha (float) – P-value and adjusted p-value significance threshold (usually 0.05). (default: 0.05).

  • cooks_filter (bool) – Whether to filter p-values based on cooks outliers. (default: True).

  • independent_filter (bool) – Whether to perform independent filtering to correct p-value trends. (default: True).

  • n_cpus (int) – Number of cpus to use for multiprocessing. If None, all available CPUs will be used. (default: None).

  • prior_disp_var (ndarray) – Prior variance for LFCs, used for ridge regularization. (default: None).

  • batch_size (int) – Number of tasks to allocate to each joblib parallel worker. (default: 128).

  • joblib_verbosity (int) – The verbosity level for joblib tasks. The higher the value, the more updates are reported. (default: 0).

base_mean

Genewise means of normalized counts.

Type

pandas.Series

contrast_idx

Index of the LFC column corresponding to the variable being tested.

Type

int

design_matrix

A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Depending on the contrast that is provided to the DeseqStats object, it may differ from the DeseqDataSet design matrix, as the reference level may need to be adapted.

Type

pandas.DataFrame

LFCs

Estimated log-fold change between conditions and intercept, in natural log scale.

Type

pandas.DataFrame

SE

Standard LFC error.

Type

pandas.Series

statistics

Wald statistics.

Type

pandas.Series

p_values

P-values estimated from Wald statistics.

Type

pandas.Series

padj

P-values adjusted for multiple testing.

Type

pandas.Series

results_df

Summary of the statistical analysis.

Type

pandas.DataFrame

shrunk_LFCs

Whether LFCs are shrunk.

Type

bool

n_processes

Number of threads to use for multiprocessing.

Type

int

References

1

Love, M. I., Huber, W., & Anders, S. (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology, 15(12), 1-21. <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8>

2(1,2,3)

Zhu, A., Ibrahim, J. G., & Love, M. I. (2019). “Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences.” Bioinformatics, 35(12), 2084-2092. <https://academic.oup.com/bioinformatics/article/35/12/2084/5159452>

__init__(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)

Methods

__init__(dds[, contrast, alpha, ...])

lfc_shrink()

LFC shrinkage with an apeGLM prior 2.

run_wald_test()

Perform a Wald test.

summary()

Run the statistical analysis.

lfc_shrink()

LFC shrinkage with an apeGLM prior 2.

Shrinks LFCs using a heavy-tailed Cauchy prior, leaving p-values unchanged.

Returns

If pvalues were already computed, return the results DataFrame with MAP LFCs, but unmodified stats and pvalues.

Return type

pandas.DataFrame or None

run_wald_test()

Perform a Wald test.

Get gene-wise p-values for gene over/under-expression.`

summary()

Run the statistical analysis.

The results are stored in the results_df attribute.