pydeseq2.DeseqStats.DeseqStats
- class pydeseq2.DeseqStats.DeseqStats(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)
Bases:
object
PyDESeq2 statistical tests for differential expression.
Implements p-value estimation for differential gene expression according to the DESeq2 pipeline 1.
Also supports apeGLM log-fold change shrinkage 2.
- Parameters
dds (DeseqDataSet) – DeseqDataSet for which dispersion and LFCs were already estimated.
contrast (list[str] or None) – A list of three strings, in the following format: [‘variable_of_interest’, ‘tested_level’, ‘reference_level’]. Names must correspond to the clinical data passed to the DeseqDataSet. E.g., [‘condition’, ‘B’, ‘A’] will measure the LFC of ‘condition B’ compared to ‘condition A’. If None, the last variable from the design matrix is chosen as the variable of interest, and the reference level is picked alphabetically. (default: None).
alpha (float) – P-value and adjusted p-value significance threshold (usually 0.05). (default: 0.05).
cooks_filter (bool) – Whether to filter p-values based on cooks outliers. (default: True).
independent_filter (bool) – Whether to perform independent filtering to correct p-value trends. (default: True).
n_cpus (int) – Number of cpus to use for multiprocessing. If None, all available CPUs will be used. (default: None).
prior_disp_var (ndarray) – Prior variance for LFCs, used for ridge regularization. (default: None).
batch_size (int) – Number of tasks to allocate to each joblib parallel worker. (default: 128).
joblib_verbosity (int) – The verbosity level for joblib tasks. The higher the value, the more updates are reported. (default: 0).
- base_mean
Genewise means of normalized counts.
- Type
- design_matrix
A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Depending on the contrast that is provided to the DeseqStats object, it may differ from the DeseqDataSet design matrix, as the reference level may need to be adapted.
- Type
- LFCs
Estimated log-fold change between conditions and intercept, in natural log scale.
- Type
- SE
Standard LFC error.
- Type
- statistics
Wald statistics.
- Type
- p_values
P-values estimated from Wald statistics.
- Type
- padj
P-values adjusted for multiple testing.
- Type
- results_df
Summary of the statistical analysis.
- Type
References
- 1
Love, M. I., Huber, W., & Anders, S. (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology, 15(12), 1-21. <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8>
- 2(1,2,3)
Zhu, A., Ibrahim, J. G., & Love, M. I. (2019). “Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences.” Bioinformatics, 35(12), 2084-2092. <https://academic.oup.com/bioinformatics/article/35/12/2084/5159452>
- __init__(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, n_cpus=None, prior_disp_var=None, batch_size=128, joblib_verbosity=0)
Methods
__init__
(dds[, contrast, alpha, ...])LFC shrinkage with an apeGLM prior 2.
Perform a Wald test.
summary
()Run the statistical analysis.
- lfc_shrink()
LFC shrinkage with an apeGLM prior 2.
Shrinks LFCs using a heavy-tailed Cauchy prior, leaving p-values unchanged.
- Returns
If pvalues were already computed, return the results DataFrame with MAP LFCs, but unmodified stats and pvalues.
- Return type
pandas.DataFrame or None
- run_wald_test()
Perform a Wald test.
Get gene-wise p-values for gene over/under-expression.`
- summary()
Run the statistical analysis.
The results are stored in the results_df attribute.