Simultaneous inference for generalized linear models with unmeasured confoundersTens of thousands of simultaneous hypothesis tests are routinely performed in
genomic studies to identify differentially expressed genes. However, due to
unmeasured confounders, many standard statistical approaches may be
substantially biased. This paper investigates the large-scale hypothesis
testing problem for multivariate generalized linear models in the presence of
confounding effects. Under arbitrary confounding mechanisms, we propose a
unified statistical estimation and inference framework that harnesses
orthogonal structures and integrates linear projections into three key stages.
It first leverages multivariate responses to separate marginal and uncorrelated
confounding effects, recovering the confounding coefficients' column space.
Subsequently, latent factors and primary effects are jointly estimated,
utilizing $\ell_1$-regularization for sparsity while imposing orthogonality
onto confounding coefficients. Finally, we incorporate projected and weighted
bias-correction steps for hypothesis testing. Theoretically, we establish
various effects' identification conditions and non-asymptotic error bounds. We
show effective Type-I error control of asymptotic $z$-tests as sample and
response sizes approach infinity. Numerical experiments demonstrate that the
proposed method controls the false discovery rate by the Benjamini-Hochberg
procedure and is more powerful than alternative methods. By comparing
single-cell RNA-seq counts from two groups of samples, we demonstrate the
suitability of adjusting confounding effects when significant covariates are
absent from the model.
arxiv.org