pydeseq2.inference.Inference

class Inference

Bases: ABC

Abstract class with DESeq2-related inference methods.

Methods

`lin_reg_mu`(counts, size_factors, ...)	Estimate mean of negative binomial model using a linear regression.
`irls`(counts, size_factors, design_matrix, ...)	Fit a NB GLM wit log-link to predict counts from the design matrix.
`alpha_mle`(counts, design_matrix, mu, ...[, ...])	Estimate the dispersion parameter of a negative binomial GLM.
`wald_test`(design_matrix, disp, lfc, mu, ...)	Run Wald test for differential expression.
`fit_rough_dispersions`(normed_counts, ...)	'Rough dispersion' estimates from linear model, as per the R code.
`fit_moments_dispersions`(normed_counts, ...)	Dispersion estimates based on moments, as per the R code.
`dispersion_trend_gamma_glm`(covariates, targets)	Fit a gamma glm on gene dispersions.
`lfc_shrink_nbinom_glm`(design_matrix, counts, ...)	Fit a negative binomial MAP LFC using an apeGLM prior.

abstract alpha_mle(counts, design_matrix, mu, alpha_hat, min_disp, max_disp, prior_disp_var=None, cr_reg=True, prior_reg=False, optimizer='L-BFGS-B')

Estimate the dispersion parameter of a negative binomial GLM.

Parameters:

counts (ndarray) – Raw counts.
design_matrix (ndarray) – Design matrix.
mu (ndarray) – Mean estimation for the NB model.
alpha_hat (ndarray) – Initial dispersion estimate.
min_disp (float) – Lower threshold for dispersion parameters.
max_disp (float) – Upper threshold for dispersion parameters.
prior_disp_var (float) – Prior dispersion variance.
cr_reg (bool) – Whether to use Cox-Reid regularization. (default: True).
prior_reg (bool) – Whether to use prior log-residual regularization. (default: False).
optimizer (str) – Optimizing method to use. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. (default: 'L-BFGS-B').

Return type:

Tuple[ndarray, ndarray]

Returns:

ndarray – Dispersion estimate.
ndarray – Whether L-BFGS-B converged. If not, dispersion is estimated using grid search.

abstract dispersion_trend_gamma_glm(covariates, targets)

Fit a gamma glm on gene dispersions.

The intercept should be concatenated in this method and the first returned coefficient should be the intercept.

Parameters:

covariates (pd.Series) – Covariates for the regression (num_genes,).
targets (pd.Series) – Targets for the regression (num_genes,).

Return type:

Tuple[ndarray, ndarray, bool]

Returns:

coeffs (ndarray) – Coefficients of the regression.
predictions (ndarray) – Predictions of the regression.
converged (bool) – Whether the optimization converged.

abstract fit_moments_dispersions(normed_counts, size_factors)

Dispersion estimates based on moments, as per the R code.

Used as initial estimates in DeseqDataSet.fit_genewise_dispersions().

Parameters:

normed_counts (ndarray) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.
size_factors (ndarray) – DESeq2 normalization factors.

Returns:

Estimated dispersion parameter for each gene.

Return type:

ndarray

abstract fit_rough_dispersions(normed_counts, design_matrix)

‘Rough dispersion’ estimates from linear model, as per the R code.

Used as initial estimates in DeseqDataSet.fit_genewise_dispersions().

Parameters:

normed_counts (ndarray) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.
design_matrix (pandas.DataFrame) – A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Unexpanded, with intercept.

Returns:

Estimated dispersion parameter for each gene.

Return type:

ndarray

abstract irls(counts, size_factors, design_matrix, disp, min_mu, beta_tol, min_beta=-30, max_beta=30, optimizer='L-BFGS-B', maxiter=250)

Fit a NB GLM wit log-link to predict counts from the design matrix.

See equations (1-2) in the DESeq2 paper.

Parameters:

counts (ndarray) – Raw counts.
size_factors (ndarray) – Sample-wise scaling factors (obtained from median-of-ratios).
design_matrix (ndarray) – Design matrix.
disp (ndarray) – Gene-wise dispersion prior.
min_mu (ndarray) – Lower bound on estimated means, to ensure numerical stability. (default: 0.5).
beta_tol (float) – Stopping criterion for IRWLS: \(\vert dev - dev_{old}\vert / \vert dev + 0.1 \vert < \beta_{tol}\). (default: 1e-8).
min_beta (float) – Lower-bound on LFC. (default: -30).
max_beta (float) – Upper-bound on LFC. (default: -30).
optimizer (str) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. NB: only ‘L-BFGS-B’ ensures that LFCS will lay in the [min_beta, max_beta] range. (default: 'L-BFGS-B').
maxiter (int) – Maximum number of IRLS iterations to perform before switching to L-BFGS-B. (default: 250).

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

beta (ndarray) – Fitted (basemean, lfc) coefficients of negative binomial GLM.
mu (ndarray) – Means estimated from size factors and beta: \(\mu = s_{ij} \exp(\beta^t X)\).
H (ndarray) – Diagonal of the \(W^{1/2} X (X^t W X)^-1 X^t W^{1/2}\) covariance matrix.
converged (ndarray) – Whether IRLS or the optimizer converged. If not and if dimension allows it, perform grid search.

abstract lfc_shrink_nbinom_glm(design_matrix, counts, size, offset, prior_no_shrink_scale, prior_scale, optimizer, shrink_index)

Fit a negative binomial MAP LFC using an apeGLM prior.

Only the LFC is shrinked, and not the intercept.

Parameters:

design_matrix (ndarray) – Design matrix.
counts (ndarray) – Raw counts.
size (ndarray) – Size parameter of NB family (inverse of dispersion).
offset (ndarray) – Natural logarithm of size factor.
prior_no_shrink_scale (float) – Prior variance for the intercept.
prior_scale (float) – Prior variance for the LFC parameter.
optimizer (str) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘L-BFGS-B’, ‘BFGS’ or ‘Newton-CG’.
shrink_index (int) – Index of the LFC coordinate to shrink. (default: 1).

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

beta (ndarray) – 2-element array, containing the intercept (first) and the LFC (second).
inv_hessian (ndarray) – Inverse of the Hessian of the objective at the estimated MAP LFC.
converged (ndarray) – Whether L-BFGS-B converged for each optimization problem.

abstract lin_reg_mu(counts, size_factors, design_matrix, min_mu)

Estimate mean of negative binomial model using a linear regression.

Used to initialize genewise dispersion models.

Parameters:

counts (ndarray) – Raw counts.
size_factors (ndarray) – Sample-wise scaling factors (obtained from median-of-ratios).
design_matrix (ndarray) – Design matrix.
min_mu (float) – Lower threshold for fitted means, for numerical stability. (default: 0.5).

Returns:

Estimated mean.

Return type:

ndarray

abstract wald_test(design_matrix, disp, lfc, mu, ridge_factor, contrast, lfc_null, alt_hypothesis=None)

Run Wald test for differential expression.

Computes Wald statistics, standard error and p-values from dispersion and LFC estimates.

Parameters:

design_matrix (ndarray) – Design matrix.
disp (float) – Dispersion estimate.
lfc (ndarray) – Log-fold change estimate (in natural log scale).
mu (float) – Mean estimation for the NB model.
ridge_factor (ndarray) – Regularization factors.
contrast (ndarray) – Vector encoding the contrast that is being tested.
lfc_null (float) – The (log2) log fold change under the null hypothesis.
alt_hypothesis (str or None) – The alternative hypothesis for computing wald p-values.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

wald_p_value (ndarray) – Estimated p-value.
wald_statistic (ndarray) – Wald statistic.
wald_se (ndarray) – Standard error of the Wald statistic.