pydeseq2.inference.Inference

class Inference

Bases: ABC

Abstract class with DESeq2-related inference methods.

Methods

lin_reg_mu(counts, size_factors, ...)

Estimate mean of negative binomial model using a linear regression.

irls(counts, size_factors, design_matrix, ...)

Fit a NB GLM wit log-link to predict counts from the design matrix.

alpha_mle(counts, design_matrix, mu, ...[, ...])

Estimate the dispersion parameter of a negative binomial GLM.

wald_test(design_matrix, disp, lfc, mu, ...)

Run Wald test for differential expression.

fit_rough_dispersions(normed_counts, ...)

'Rough dispersion' estimates from linear model, as per the R code.

fit_moments_dispersions(normed_counts, ...)

Dispersion estimates based on moments, as per the R code.

dispersion_trend_gamma_glm(covariates, targets)

Fit a gamma glm on gene dispersions.

lfc_shrink_nbinom_glm(design_matrix, counts, ...)

Fit a negative binomial MAP LFC using an apeGLM prior.

abstract alpha_mle(counts, design_matrix, mu, alpha_hat, min_disp, max_disp, prior_disp_var=None, cr_reg=True, prior_reg=False, optimizer='L-BFGS-B')

Estimate the dispersion parameter of a negative binomial GLM.

Parameters:
  • counts (ndarray) – Raw counts.

  • design_matrix (ndarray) – Design matrix.

  • mu (ndarray) – Mean estimation for the NB model.

  • alpha_hat (ndarray) – Initial dispersion estimate.

  • min_disp (float) – Lower threshold for dispersion parameters.

  • max_disp (float) – Upper threshold for dispersion parameters.

  • prior_disp_var (float) – Prior dispersion variance.

  • cr_reg (bool) – Whether to use Cox-Reid regularization. (default: True).

  • prior_reg (bool) – Whether to use prior log-residual regularization. (default: False).

  • optimizer (str) – Optimizing method to use. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. (default: 'L-BFGS-B').

Return type:

Tuple[ndarray, ndarray]

Returns:

  • ndarray – Dispersion estimate.

  • ndarray – Whether L-BFGS-B converged. If not, dispersion is estimated using grid search.

abstract dispersion_trend_gamma_glm(covariates, targets)

Fit a gamma glm on gene dispersions.

The intercept should be concatenated in this method and the first returned coefficient should be the intercept.

Parameters:
  • covariates (pd.Series) – Covariates for the regression (num_genes,).

  • targets (pd.Series) – Targets for the regression (num_genes,).

Return type:

Tuple[ndarray, ndarray, bool]

Returns:

  • coeffs (ndarray) – Coefficients of the regression.

  • predictions (ndarray) – Predictions of the regression.

  • converged (bool) – Whether the optimization converged.

abstract fit_moments_dispersions(normed_counts, size_factors)

Dispersion estimates based on moments, as per the R code.

Used as initial estimates in DeseqDataSet.fit_genewise_dispersions().

Parameters:
  • normed_counts (ndarray) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.

  • size_factors (ndarray) – DESeq2 normalization factors.

Returns:

Estimated dispersion parameter for each gene.

Return type:

ndarray

abstract fit_rough_dispersions(normed_counts, design_matrix)

‘Rough dispersion’ estimates from linear model, as per the R code.

Used as initial estimates in DeseqDataSet.fit_genewise_dispersions().

Parameters:
  • normed_counts (ndarray) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.

  • design_matrix (pandas.DataFrame) – A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Unexpanded, with intercept.

Returns:

Estimated dispersion parameter for each gene.

Return type:

ndarray

abstract irls(counts, size_factors, design_matrix, disp, min_mu, beta_tol, min_beta=-30, max_beta=30, optimizer='L-BFGS-B', maxiter=250)

Fit a NB GLM wit log-link to predict counts from the design matrix.

See equations (1-2) in the DESeq2 paper.

Parameters:
  • counts (ndarray) – Raw counts.

  • size_factors (ndarray) – Sample-wise scaling factors (obtained from median-of-ratios).

  • design_matrix (ndarray) – Design matrix.

  • disp (ndarray) – Gene-wise dispersion prior.

  • min_mu (ndarray) – Lower bound on estimated means, to ensure numerical stability. (default: 0.5).

  • beta_tol (float) – Stopping criterion for IRWLS: \(\vert dev - dev_{old}\vert / \vert dev + 0.1 \vert < \beta_{tol}\). (default: 1e-8).

  • min_beta (float) – Lower-bound on LFC. (default: -30).

  • max_beta (float) – Upper-bound on LFC. (default: -30).

  • optimizer (str) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. NB: only ‘L-BFGS-B’ ensures that LFCS will lay in the [min_beta, max_beta] range. (default: 'L-BFGS-B').

  • maxiter (int) – Maximum number of IRLS iterations to perform before switching to L-BFGS-B. (default: 250).

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

  • beta (ndarray) – Fitted (basemean, lfc) coefficients of negative binomial GLM.

  • mu (ndarray) – Means estimated from size factors and beta: \(\mu = s_{ij} \exp(\beta^t X)\).

  • H (ndarray) – Diagonal of the \(W^{1/2} X (X^t W X)^-1 X^t W^{1/2}\) covariance matrix.

  • converged (ndarray) – Whether IRLS or the optimizer converged. If not and if dimension allows it, perform grid search.

abstract lfc_shrink_nbinom_glm(design_matrix, counts, size, offset, prior_no_shrink_scale, prior_scale, optimizer, shrink_index)

Fit a negative binomial MAP LFC using an apeGLM prior.

Only the LFC is shrinked, and not the intercept.

Parameters:
  • design_matrix (ndarray) – Design matrix.

  • counts (ndarray) – Raw counts.

  • size (ndarray) – Size parameter of NB family (inverse of dispersion).

  • offset (ndarray) – Natural logarithm of size factor.

  • prior_no_shrink_scale (float) – Prior variance for the intercept.

  • prior_scale (float) – Prior variance for the LFC parameter.

  • optimizer (str) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘L-BFGS-B’, ‘BFGS’ or ‘Newton-CG’.

  • shrink_index (int) – Index of the LFC coordinate to shrink. (default: 1).

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

  • beta (ndarray) – 2-element array, containing the intercept (first) and the LFC (second).

  • inv_hessian (ndarray) – Inverse of the Hessian of the objective at the estimated MAP LFC.

  • converged (ndarray) – Whether L-BFGS-B converged for each optimization problem.

abstract lin_reg_mu(counts, size_factors, design_matrix, min_mu)

Estimate mean of negative binomial model using a linear regression.

Used to initialize genewise dispersion models.

Parameters:
  • counts (ndarray) – Raw counts.

  • size_factors (ndarray) – Sample-wise scaling factors (obtained from median-of-ratios).

  • design_matrix (ndarray) – Design matrix.

  • min_mu (float) – Lower threshold for fitted means, for numerical stability. (default: 0.5).

Returns:

Estimated mean.

Return type:

ndarray

abstract wald_test(design_matrix, disp, lfc, mu, ridge_factor, contrast, lfc_null, alt_hypothesis=None)

Run Wald test for differential expression.

Computes Wald statistics, standard error and p-values from dispersion and LFC estimates.

Parameters:
  • design_matrix (ndarray) – Design matrix.

  • disp (float) – Dispersion estimate.

  • lfc (ndarray) – Log-fold change estimate (in natural log scale).

  • mu (float) – Mean estimation for the NB model.

  • ridge_factor (ndarray) – Regularization factors.

  • contrast (ndarray) – Vector encoding the contrast that is being tested.

  • lfc_null (float) – The (log2) log fold change under the null hypothesis.

  • alt_hypothesis (str or None) – The alternative hypothesis for computing wald p-values.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

  • wald_p_value (ndarray) – Estimated p-value.

  • wald_statistic (ndarray) – Wald statistic.

  • wald_se (ndarray) – Standard error of the Wald statistic.