Show newer

An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling. (arXiv:2303.08979v1 [stat.ME]) arxiv.org/abs/2303.08979

An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling

Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $α$-Rényi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.

arxiv.org

Generalized Score Matching: Beyond The IID Case. (arXiv:2303.08987v1 [stat.ME]) arxiv.org/abs/2303.08987

Generalized Score Matching: Beyond The IID Case

Score matching is an estimation procedure that has been developed for statistical models whose probability density function is known up to proportionality but whose normalizing constant is intractable. For such models, maximum likelihood estimation will be difficult or impossible to implement. To date, nearly all applications of score matching have focused on continuous IID (independent and identically distributed) models. Motivated by various data modelling problems for which the continuity assumption and/or the IID assumption are not appropriate, this article proposes three novel extensions of score matching: (i) to univariate and multivariate ordinal data (including count data); (ii) to INID (independent but not necessarily identically distributed) data models, including regression models with either a continuous or a discrete ordinal response; and (iii) to a class of dependent data models known as auto models. Under the INID assumption, a unified asymptotic approach to settings (i) and (ii) is developed and, under mild regularity conditions, it is proved that the proposed score matching estimators are consistent and asymptotically normal. These theoretical results provide a sound basis for score-matching-based inference and are supported by strong performance in simulation studies and a real data example involving doctoral publication data. Regarding (iii), motivated by a spatial geochemical dataset, we develop a novel auto model for spatially dependent spherical data and propose a score-matching-based Wald statistic to test for the presence of spatial dependence. Our proposed auto model exhibits a way to model spatial dependence of directions, is computationally convenient to use and is expected to be superior to composite likelihood approaches for reasons that are explained.

arxiv.org

A Spatially Varying Hierarchical Random Effects Model for Longitudinal Macular Structural Data in Glaucoma Patients. (arXiv:2303.09018v1 [stat.AP]) arxiv.org/abs/2303.09018

A Spatially Varying Hierarchical Random Effects Model for Longitudinal Macular Structural Data in Glaucoma Patients

We model longitudinal macular thickness measurements to monitor the course of glaucoma and prevent vision loss due to disease progression. The macular thickness varies over a 6$\times$6 grid of locations on the retina with additional variability arising from the imaging process at each visit. Currently, ophthalmologists estimate slopes using repeated simple linear regression for each subject and location. To estimate slopes more precisely, we develop a novel Bayesian hierarchical model for multiple subjects with spatially varying population-level and subject-level coefficients, borrowing information over subjects and measurement locations. We augment the model with visit effects to account for observed spatially correlated visit-specific errors. We model spatially varying (a) intercepts, (b) slopes, and (c) log residual standard deviations (SD) with multivariate Gaussian process priors with Matérn cross-covariance functions. Each marginal process assumes an exponential kernel with its own SD and spatial correlation matrix. We develop our models for and apply them to data from the Advanced Glaucoma Progression Study. We show that including visit effects in the model reduces error in predicting future thickness measurements and greatly improves model fit.

arxiv.org

High-Dimensional Penalized Bernstein Support Vector Machines. (arXiv:2303.09066v1 [stat.ML]) arxiv.org/abs/2303.09066

High-Dimensional Penalized Bernstein Support Vector Machines

The support vector machines (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. To overcome this problem, we rely on Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVM), which is suitable for the high dimension $p >> n$ regime. As the BernSVM objective loss function is of the class $C^2$, we propose two efficient algorithms for computing the solution of the penalized BernSVM. The first algorithm is based on coordinate descent with maximization-majorization (MM) principle and the second one is IRLS-type algorithm (iterative re-weighted least squares). Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted Lasso BernSVM estimator. Using a local linear approximation, we extend the latter result to penalized BernSVM with non convex penalties SCAD and MCP. Our bound holds with high probability and achieves a rate of order $\sqrt{s\log(p)/n}$, where $s$ is the number of active features. Simulation studies are considered to illustrate the prediction accuracy of BernSVM to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.

arxiv.org

On a fundamental problem in the analysis of cancer registry data. (arXiv:2303.09141v1 [stat.ME]) arxiv.org/abs/2303.09141

On a fundamental problem in the analysis of cancer registry data

In epidemiology research with cancer registry data, it is often of primary interest to make inference on cancer death, not overall survival. Since cause of death is not easy to collect or is not necessarily reliable in cancer registries, some special methodologies have been introduced and widely used by using the concepts of the relative survival ratio and the net survival. In making inference of those measures, external life tables of the general population are utilized to adjust the impact of non-cancer death on overall survival. The validity of this adjustment relies on the assumption that mortality in the external life table approximates non-cancer mortality of cancer patients. However, the population used to calculate a life table may include cancer death and cancer patients. Sensitivity analysis proposed by Talbäck and Dickman to address it requires additional information which is often not easily available. We propose a method to make inference on the net survival accounting for potential presence of cancer patients and cancer death in the life table for the general population. The idea of adjustment is to consider correspondence of cancer mortality in the life table and that in the cancer registry. We realize a novel method to adjust cancer mortality in the cancer registry without any additional information to the standard analyses of cancer registries. Our simulation study revealed that the proposed method successfully removed the bias. We illustrate the proposed method with the cancer registry data in England.

arxiv.org

Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask Formulation. (arXiv:2303.09154v1 [stat.ML]) arxiv.org/abs/2303.09154

Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask Formulation

Concept bottleneck model (CBM) is a ubiquitous method that can interpret neural networks using concepts. In CBM, concepts are inserted between the output layer and the last intermediate layer as observable values. This helps in understanding the reason behind the outputs generated by the neural networks: the weights corresponding to the concepts from the last hidden layer to the output layer. However, it has not yet been possible to understand the behavior of the generalization error in CBM since a neural network is a singular statistical model in general. When the model is singular, a one to one map from the parameters to probability distributions cannot be created. This non-identifiability makes it difficult to analyze the generalization performance. In this study, we mathematically clarify the Bayesian generalization error and free energy of CBM when its architecture is three-layered linear neural networks. We also consider a multitask problem where the neural network outputs not only the original output but also the concepts. The results show that CBM drastically changes the behavior of the parameter region and the Bayesian generalization error in three-layered linear neural networks as compared with the standard version, whereas the multitask formulation does not.

arxiv.org

Identifiability Results for Multimodal Contrastive Learning. (arXiv:2303.09166v1 [cs.LG]) arxiv.org/abs/2303.09166

Identifiability Results for Multimodal Contrastive Learning

Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.

arxiv.org

Adaptive Testing for High-dimensional Data. (arXiv:2303.08197v1 [math.ST]) arxiv.org/abs/2303.08197

Adaptive Testing for High-dimensional Data

In this article, we propose a class of $L_q$-norm based U-statistics for a family of global testing problems related to high-dimensional data. This includes testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. Under the null hypothesis, we derive asymptotic normality and independence between $L_q$-norm based U-statistics for several $q$s under mild moment and cumulant conditions. A simple combination of two studentized $L_q$-based test statistics via their $p$-values is proposed and is shown to attain great power against alternatives of different sparsity. Our work is a substantial extension of He et al. (2021), which is mostly focused on mean and covariance testing, and we manage to provide a general treatment of asymptotic independence of $L_q$-norm based U-statistics for a wide class of kernels. To alleviate the computation burden, we introduce a variant of the proposed U-statistics by using the monotone indices in the summation, resulting in a U-statistic with asymmetric kernel. A dynamic programming method is introduced to reduce the computational cost from $O(n^{qr})$, which is required for the calculation of the full U-statistic, to $O(n^r)$ where $r$ is the order of the kernel. Numerical studies further corroborate the advantage of the proposed adaptive test as compared to some existing competitors.

arxiv.org

Spatial causal inference in the presence of unmeasured confounding and interference. (arXiv:2303.08218v1 [stat.ME]) arxiv.org/abs/2303.08218

Spatial causal inference in the presence of unmeasured confounding and interference

Causal inference in spatial settings is met with unique challenges and opportunities. On one hand, a unit's outcome can be affected by the exposure at many locations, leading to interference. On the other hand, unmeasured spatial variables can confound the effect of interest. Our work has two overarching goals. First, using causal diagrams, we illustrate that spatial confounding and interference can manifest as each other, meaning that investigating the presence of one can lead to wrongful conclusions in the presence of the other, and that statistical dependencies in the exposure variable can render standard analyses invalid. This can have crucial implications for analyzing data with spatial or other dependencies, and for understanding the effect of interventions on dependent units. Secondly, we propose a parametric approach to mitigate bias from local and neighborhood unmeasured spatial confounding and account for interference simultaneously. This approach is based on simultaneous modeling of the exposure and the outcome while accounting for the presence of spatially-structured unmeasured predictors of both variables. We illustrate our approach with a simulation study and with an analysis of the local and interference effects of sulfur dioxide emissions from power plants on cardiovascular mortality.

arxiv.org

Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data. (arXiv:2303.08242v1 [stat.ML]) arxiv.org/abs/2303.08242

Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.

arxiv.org

Estimating Parameters of Large CTMP from Single Trajectory with Application to Stochastic Network Epidemics Models. (arXiv:2303.08323v1 [stat.AP]) arxiv.org/abs/2303.08323

Estimating Parameters of Large CTMP from Single Trajectory with Application to Stochastic Network Epidemics Models

Graph dynamical systems (GDS) model dynamic processes on a (static) graph. Stochastic GDS has been used for network-based epidemics models such as the contact process and the reversible contact process. In this paper, we consider stochastic GDS that are also continuous-time Markov processes (CTMP), whose transition rates are linear functions of some dynamics parameters $θ$ of interest (i.e., healing, exogeneous, and endogeneous infection rates). Our goal is to estimate $θ$ from a single, finite-time, continuously observed trajectory of the CTMP. Parameter estimation of CTMP is challenging when the state space is large; for GDS, the number of Markov states are \emph{exponential} in the number of nodes of the graph. We showed that holding classes (i.e., Markov states with the same holding time distribution) give efficient partitions of the state space of GDS. We derived an upperbound on the number of holding classes for the contact process, which is polynomial in the number of nodes. We utilized holding classes to solve a smaller system of linear equations to find $θ$. Experimental results show that finding reasonable results can be achieved even for short trajectories, particularly for the contact process. In fact, trajectory length does not significantly affect estimation error.

arxiv.org

The Benefits of Mixup for Feature Learning. (arXiv:2303.08433v1 [cs.LG]) arxiv.org/abs/2303.08433

The Benefits of Mixup for Feature Learning

Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using different linear interpolation parameters for features and labels can still achieve similar performance to the standard Mixup. This indicates that the intuitive linearity explanation in Zhang et al., (2018) may not fully explain the success of Mixup. Then we perform a theoretical study of Mixup from the feature learning perspective. We consider a feature-noise data model and show that Mixup training can effectively learn the rare features (appearing in a small fraction of data) from its mixture with the common features (appearing in a large fraction of data). In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad generalization performance. Moreover, our theoretical analysis also shows that the benefits of Mixup for feature learning are mostly gained in the early training phase, based on which we propose to apply early stopping in Mixup. Experimental results verify our theoretical findings and demonstrate the effectiveness of the early-stopped Mixup training.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.