pydeseq2.inference.Inference
- class Inference
Bases:
ABC
Abstract class with DESeq2-related inference methods.
Methods
lin_reg_mu
(counts, size_factors, ...)Estimate mean of negative binomial model using a linear regression.
irls
(counts, size_factors, design_matrix, ...)Fit a NB GLM wit log-link to predict counts from the design matrix.
alpha_mle
(counts, design_matrix, mu, ...[, ...])Estimate the dispersion parameter of a negative binomial GLM.
wald_test
(design_matrix, disp, lfc, mu, ...)Run Wald test for differential expression.
fit_rough_dispersions
(normed_counts, ...)'Rough dispersion' estimates from linear model, as per the R code.
fit_moments_dispersions
(normed_counts, ...)Dispersion estimates based on moments, as per the R code.
dispersion_trend_gamma_glm
(covariates, targets)Fit a gamma glm on gene dispersions.
lfc_shrink_nbinom_glm
(design_matrix, counts, ...)Fit a negative binomial MAP LFC using an apeGLM prior.
- abstract alpha_mle(counts, design_matrix, mu, alpha_hat, min_disp, max_disp, prior_disp_var=None, cr_reg=True, prior_reg=False, optimizer='L-BFGS-B')
Estimate the dispersion parameter of a negative binomial GLM.
- Parameters:
counts (
ndarray
) – Raw counts.design_matrix (
ndarray
) – Design matrix.mu (
ndarray
) – Mean estimation for the NB model.alpha_hat (
ndarray
) – Initial dispersion estimate.min_disp (
float
) – Lower threshold for dispersion parameters.max_disp (
float
) – Upper threshold for dispersion parameters.prior_disp_var (
float
) – Prior dispersion variance.cr_reg (
bool
) – Whether to use Cox-Reid regularization. (default:True
).prior_reg (
bool
) – Whether to use prior log-residual regularization. (default:False
).optimizer (
str
) – Optimizing method to use. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. (default:'L-BFGS-B'
).
- Return type:
- Returns:
ndarray
– Dispersion estimate.ndarray
– Whether L-BFGS-B converged. If not, dispersion is estimated using grid search.
- abstract dispersion_trend_gamma_glm(covariates, targets)
Fit a gamma glm on gene dispersions.
The intercept should be concatenated in this method and the first returned coefficient should be the intercept.
- Parameters:
covariates (
pd.Series
) – Covariates for the regression (num_genes,).targets (
pd.Series
) – Targets for the regression (num_genes,).
- Return type:
- Returns:
coeffs (
ndarray
) – Coefficients of the regression.predictions (
ndarray
) – Predictions of the regression.converged (
bool
) – Whether the optimization converged.
- abstract fit_moments_dispersions(normed_counts, size_factors)
Dispersion estimates based on moments, as per the R code.
Used as initial estimates in
DeseqDataSet.fit_genewise_dispersions()
.- Parameters:
normed_counts (
ndarray
) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.size_factors (
ndarray
) – DESeq2 normalization factors.
- Returns:
Estimated dispersion parameter for each gene.
- Return type:
ndarray
- abstract fit_rough_dispersions(normed_counts, design_matrix)
‘Rough dispersion’ estimates from linear model, as per the R code.
Used as initial estimates in
DeseqDataSet.fit_genewise_dispersions()
.- Parameters:
normed_counts (
ndarray
) – Array of deseq2-normalized read counts. Rows: samples, columns: genes.design_matrix (
pandas.DataFrame
) – A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. Unexpanded, with intercept.
- Returns:
Estimated dispersion parameter for each gene.
- Return type:
ndarray
- abstract irls(counts, size_factors, design_matrix, disp, min_mu, beta_tol, min_beta=-30, max_beta=30, optimizer='L-BFGS-B', maxiter=250)
Fit a NB GLM wit log-link to predict counts from the design matrix.
See equations (1-2) in the DESeq2 paper.
- Parameters:
counts (
ndarray
) – Raw counts.size_factors (
ndarray
) – Sample-wise scaling factors (obtained from median-of-ratios).design_matrix (
ndarray
) – Design matrix.disp (
ndarray
) – Gene-wise dispersion prior.min_mu (
ndarray
) – Lower bound on estimated means, to ensure numerical stability. (default:0.5
).beta_tol (
float
) – Stopping criterion for IRWLS: \(\vert dev - dev_{old}\vert / \vert dev + 0.1 \vert < \beta_{tol}\). (default:1e-8
).min_beta (
float
) – Lower-bound on LFC. (default:-30
).max_beta (
float
) – Upper-bound on LFC. (default:-30
).optimizer (
str
) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘BFGS’ or ‘L-BFGS-B’. NB: only ‘L-BFGS-B’ ensures that LFCS will lay in the [min_beta, max_beta] range. (default:'L-BFGS-B'
).maxiter (
int
) – Maximum number of IRLS iterations to perform before switching to L-BFGS-B. (default:250
).
- Return type:
- Returns:
beta (
ndarray
) – Fitted (basemean, lfc) coefficients of negative binomial GLM.mu (
ndarray
) – Means estimated from size factors and beta: \(\mu = s_{ij} \exp(\beta^t X)\).H (
ndarray
) – Diagonal of the \(W^{1/2} X (X^t W X)^-1 X^t W^{1/2}\) covariance matrix.converged (
ndarray
) – Whether IRLS or the optimizer converged. If not and if dimension allows it, perform grid search.
- abstract lfc_shrink_nbinom_glm(design_matrix, counts, size, offset, prior_no_shrink_scale, prior_scale, optimizer, shrink_index)
Fit a negative binomial MAP LFC using an apeGLM prior.
Only the LFC is shrinked, and not the intercept.
- Parameters:
design_matrix (
ndarray
) – Design matrix.counts (
ndarray
) – Raw counts.size (
ndarray
) – Size parameter of NB family (inverse of dispersion).offset (
ndarray
) – Natural logarithm of size factor.prior_no_shrink_scale (
float
) – Prior variance for the intercept.prior_scale (
float
) – Prior variance for the LFC parameter.optimizer (
str
) – Optimizing method to use in case IRLS starts diverging. Accepted values: ‘L-BFGS-B’, ‘BFGS’ or ‘Newton-CG’.shrink_index (
int
) – Index of the LFC coordinate to shrink. (default:1
).
- Return type:
- Returns:
beta (
ndarray
) – 2-element array, containing the intercept (first) and the LFC (second).inv_hessian (
ndarray
) – Inverse of the Hessian of the objective at the estimated MAP LFC.converged (
ndarray
) – Whether L-BFGS-B converged for each optimization problem.
- abstract lin_reg_mu(counts, size_factors, design_matrix, min_mu)
Estimate mean of negative binomial model using a linear regression.
Used to initialize genewise dispersion models.
- Parameters:
counts (
ndarray
) – Raw counts.size_factors (
ndarray
) – Sample-wise scaling factors (obtained from median-of-ratios).design_matrix (
ndarray
) – Design matrix.min_mu (
float
) – Lower threshold for fitted means, for numerical stability. (default:0.5
).
- Returns:
Estimated mean.
- Return type:
ndarray
- abstract wald_test(design_matrix, disp, lfc, mu, ridge_factor, contrast, lfc_null, alt_hypothesis=None)
Run Wald test for differential expression.
Computes Wald statistics, standard error and p-values from dispersion and LFC estimates.
- Parameters:
design_matrix (
ndarray
) – Design matrix.disp (
float
) – Dispersion estimate.lfc (
ndarray
) – Log-fold change estimate (in natural log scale).mu (
float
) – Mean estimation for the NB model.ridge_factor (
ndarray
) – Regularization factors.contrast (
ndarray
) – Vector encoding the contrast that is being tested.lfc_null (
float
) – The (log2) log fold change under the null hypothesis.alt_hypothesis (
str
orNone
) – The alternative hypothesis for computing wald p-values.
- Return type:
- Returns:
wald_p_value (
ndarray
) – Estimated p-value.wald_statistic (
ndarray
) – Wald statistic.wald_se (
ndarray
) – Standard error of the Wald statistic.