Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCAThis paper is concerned with estimating the column subspace of a low-rank
matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated
data. How to obtain optimal statistical accuracy while accommodating the widest
range of signal-to-noise ratios (SNRs) becomes particularly challenging in the
presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg
n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a
powerful solution for solving this problem, it suffers from "the curse of
ill-conditioning," namely, its performance degrades as the condition number of
$\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without
compromising the range of allowable SNRs, we propose a novel algorithm, called
$\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and
condition-number-free theoretical guarantees in terms of both $\ell_2$ and
$\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the
spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually
well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each
subblock successively. Further, an application of our algorithm and theory to
two canonical examples -- the factor model and tensor PCA -- leads to
remarkable improvement for each application.
arxiv.org