pydeseq2.ds.DeseqStats

class DeseqStats(dds, contrast=None, alpha=0.05, cooks_filter=True, independent_filter=True, prior_LFC_var=None, lfc_null=0.0, alt_hypothesis=None, inference=None, quiet=False)

Bases: object

PyDESeq2 statistical tests for differential expression.

Implements p-value estimation for differential gene expression according to the DESeq2 pipeline [LHA14].

Also supports apeGLM log-fold change shrinkage [ZIL19].

Parameters:
  • dds (DeseqDataSet) – DeseqDataSet for which dispersion and LFCs were already estimated.

  • contrast (list or None) – A list of three strings, in the following format: ['variable_of_interest', 'tested_level', 'ref_level']. Names must correspond to the metadata data passed to the DeseqDataSet. E.g., ['condition', 'B', 'A'] will measure the LFC of ‘condition B’ compared to ‘condition A’. For continuous variables, the last two strings should be left empty, e.g. ['measurement', '', '']. If None, the last variable from the design matrix is chosen as the variable of interest, and the reference level is picked alphabetically. (default: ``None).

  • alpha (float) – P-value and adjusted p-value significance threshold (usually 0.05). (default: 0.05).

  • cooks_filter (bool) – Whether to filter p-values based on cooks outliers. (default: True).

  • independent_filter (bool) – Whether to perform independent filtering to correct p-value trends. (default: True).

  • prior_LFC_var (ndarray) – Prior variance for LFCs, used for ridge regularization. (default: None).

  • lfc_null (float) – The (log2) log fold change under the null hypothesis. (default: 0).

  • alt_hypothesis (str or None) – The alternative hypothesis for computing wald p-values. By default, the normal Wald test assesses deviation of the estimated log fold change from the null hypothesis, as given by lfc_null. One of ["greaterAbs", "lessAbs", "greater", "less"] or None. The alternative hypothesis corresponds to what the user wants to find rather than the null hypothesis. (default: None).

  • inference (Inference) – Implementation of inference routines object instance. (default: DefaultInference).

  • quiet (bool) – Suppress deseq2 status updates during fit.

base_mean

Genewise means of normalized counts.

Type:

pandas.Series

lfc_null

The (log2) log fold change under the null hypothesis.

Type:

float

alt_hypothesis

The alternative hypothesis for computing wald p-values.

Type:

str or None

contrast_vector

Vector encoding the contrast (variable being tested).

Type:

ndarray

contrast_idx

Index of the LFC column corresponding to the variable being tested.

Type:

int

design_matrix

A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Depending on the contrast that is provided to the DeseqStats object, it may differ from the DeseqDataSet design matrix, as the reference level may need to be adapted.

Type:

pandas.DataFrame

LFC

Estimated log-fold change between conditions and intercept, in natural log scale.

Type:

pandas.DataFrame

SE

Standard LFC error.

Type:

pandas.Series

statistics

Wald statistics.

Type:

pandas.Series

p_values

P-values estimated from Wald statistics.

Type:

pandas.Series

padj

P-values adjusted for multiple testing.

Type:

pandas.Series

results_df

Summary of the statistical analysis.

Type:

pandas.DataFrame

shrunk_LFCs

Whether LFCs are shrunk.

Type:

bool

n_processes

Number of threads to use for multiprocessing.

Type:

int

quiet

Suppress deseq2 status updates during fit.

Type:

bool

References

[LHA14]

Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology, 15(12):1–21, 2014. doi:10.1186/s13059-014-0550-8.

[ZIL19] (1,2,3,4,5,6)

Anqi Zhu, Joseph G Ibrahim, and Michael I Love. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics, 35(12):2084–2092, 2019. doi:10.1093/bioinformatics/bty895.

Methods

lfc_shrink([coeff, adapt])

LFC shrinkage with an apeGLM prior [ZIL19].

run_wald_test()

Perform a Wald test.

summary(**kwargs)

Run the statistical analysis.

lfc_shrink(coeff=None, adapt=True)

LFC shrinkage with an apeGLM prior [ZIL19].

Shrinks LFCs using a heavy-tailed Cauchy prior, leaving p-values unchanged.

Parameters:
  • coeff (str or None) – The LFC coefficient to shrink. If set to None, the method will try to shrink the coefficient corresponding to the contrast attribute. If the desired coefficient is not available, it may be set from the pydeseq2.dds.DeseqDataSet argument ref_level. (default: None).

  • adapt (bool) – Whether to use the MLE estimates of LFC to adapt the prior. If False, the prior scale is set to 1. (default=True)

Return type:

None

plot_MA(log=True, save_path=None, **kwargs)

Create an log ratio (M)-average (A) plot using matplotlib.

Useful for looking at log fold-change versus mean expression between two groups/samples/etc. Uses matplotlib to emulate the make_MA() function in DESeq2 in R.

Parameters:
  • log (bool) – Whether or not to log scale x and y axes (default=True).

  • save_path (str or None) – The path where to save the plot. If left None, the plot won’t be saved (default=None).

  • **kwargs – Matplotlib keyword arguments for the scatter plot.

run_wald_test()

Perform a Wald test.

Get gene-wise p-values for gene over/under-expression.

Return type:

None

summary(**kwargs)

Run the statistical analysis.

The results are stored in the results_df attribute.

Parameters:

**kwargs – Keyword arguments: providing new values for lfc_null or alt_hypothesis will override the corresponding DeseqStat attributes.

Return type:

None