pydeseq2.DeseqStats.DeseqStats

class pydeseq2.DeseqStats.DeseqStats(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)

Bases: object

PyDESeq2 statistical tests for differential expression.

Implements p-value estimation for differential gene expression according to the DESeq2 pipeline 1.

Also supports apeGLM log-fold change shrinkage 2.

Parameters

dds (DeseqDataSet) – DeseqDataSet for which dispersion and LFCs were already estimated.
contrast (list[str] or None) – A list of three strings, in the following format: [‘variable_of_interest’, ‘tested_level’, ‘reference_level’]. Names must correspond to the clinical data passed to the DeseqDataSet. E.g., [‘condition’, ‘B’, ‘A’] will measure the LFC of ‘condition B’ compared to ‘condition A’. If None, the last variable from the design matrix is chosen as the variable of interest, and the reference level is picked alphabetically. (default: None).
alpha (float) – P-value and adjusted p-value significance threshold (usually 0.05). (default: 0.05).
cooks_filter (bool) – Whether to filter p-values based on cooks outliers. (default: True).
independent_filter (bool) – Whether to perform independent filtering to correct p-value trends. (default: True).
n_cpus (int) – Number of cpus to use for multiprocessing. If None, all available CPUs will be used. (default: None).
prior_disp_var (ndarray) – Prior variance for LFCs, used for ridge regularization. (default: None).
batch_size (int) – Number of tasks to allocate to each joblib parallel worker. (default: 128).
joblib_verbosity (int) – The verbosity level for joblib tasks. The higher the value, the more updates are reported. (default: 0).

base_mean

Genewise means of normalized counts.

Type: pandas.Series

contrast_idx

Index of the LFC column corresponding to the variable being tested.

Type: int

design_matrix

A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Depending on the contrast that is provided to the DeseqStats object, it may differ from the DeseqDataSet design matrix, as the reference level may need to be adapted.

Type: pandas.DataFrame

LFCs

Estimated log-fold change between conditions and intercept, in natural log scale.

Type: pandas.DataFrame

SE

Standard LFC error.

Type: pandas.Series

statistics

Wald statistics.

Type: pandas.Series

p_values

P-values estimated from Wald statistics.

Type: pandas.Series

padj

P-values adjusted for multiple testing.

Type: pandas.Series

results_df

Summary of the statistical analysis.

Type: pandas.DataFrame

shrunk_LFCs

Whether LFCs are shrunk.

Type: bool

n_processes

Number of threads to use for multiprocessing.

Type: int

References

1: Love, M. I., Huber, W., & Anders, S. (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology, 15(12), 1-21. <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8>
2(1,2,3): Zhu, A., Ibrahim, J. G., & Love, M. I. (2019). “Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences.” Bioinformatics, 35(12), 2084-2092. <https://academic.oup.com/bioinformatics/article/35/12/2084/5159452>

__init__(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)

Methods

`__init__`(dds[, contrast, alpha, ...])
`lfc_shrink`()	LFC shrinkage with an apeGLM prior 2.
`run_wald_test`()	Perform a Wald test.
`summary`()	Run the statistical analysis.

lfc_shrink()

LFC shrinkage with an apeGLM prior 2.

Shrinks LFCs using a heavy-tailed Cauchy prior, leaving p-values unchanged.

Returns: If pvalues were already computed, return the results DataFrame with MAP LFCs, but unmodified stats and pvalues.
Return type: pandas.DataFrame or None

run_wald_test()

Perform a Wald test.

Get gene-wise p-values for gene over/under-expression.`

summary()

Run the statistical analysis.

The results are stored in the results_df attribute.