Likelihood adjusted semidefinite programs for clustering heterogeneous dataClustering is a widely deployed unsupervised learning tool. Model-based
clustering is a flexible framework to tackle data heterogeneity when the
clusters have different shapes. Likelihood-based inference for mixture
distributions often involves non-convex and high-dimensional objective
functions, imposing difficult computational and statistical challenges. The
classic expectation-maximization (EM) algorithm is a computationally thrifty
iterative method that maximizes a surrogate function minorizing the
log-likelihood of observed data in each iteration, which however suffers from
bad local maxima even in the special case of the standard Gaussian mixture
model with common isotropic covariance matrices. On the other hand, recent
studies reveal that the unique global solution of a semidefinite programming
(SDP) relaxed $K$-means achieves the information-theoretically sharp threshold
for perfectly recovering the cluster labels under the standard Gaussian mixture
model. In this paper, we extend the SDP approach to a general setting by
integrating cluster labels as model parameters and propose an iterative
likelihood adjusted SDP (iLA-SDP) method that directly maximizes the
\emph{exact} observed likelihood in the presence of data heterogeneity. By
lifting the cluster assignment to group-specific membership matrices, iLA-SDP
avoids centroids estimation -- a key feature that allows exact recovery under
well-separateness of centroids without being trapped by their adversarial
configurations. Thus iLA-SDP is less sensitive than EM to initialization and
more stable on high-dimensional data. Our numeric experiments demonstrate that
iLA-SDP can achieve lower mis-clustering errors over several widely used
clustering methods including $K$-means, SDP and EM algorithms.
arxiv.org