Show newer

"Stochastic Inverse Problems" and Changes-of-Variables. (arXiv:2211.15730v1 [stat.ME]) arxiv.org/abs/2211.15730

"Stochastic Inverse Problems" and Changes-of-Variables

Over the last decade, a series of applied mathematics papers have explored a type of inverse problem--called by a variety of names including "inverse sensitivity", "pushforward based inference", "consistent Bayesian inference", or "data-consistent inversion"--wherein a solution is a probability density whose pushforward takes a given form. The formulation of such a stochastic inverse problem can be unexpected or confusing to those familiar with traditional Bayesian or otherwise statistical inference. To date, two classes of solutions have been proposed, and these have only been justified through applications of measure theory and its disintegration theorem. In this work we show that, under mild assumptions, the formulation of and solution to all stochastic inverse problems can be more clearly understood using basic probability theory: a stochastic inverse problem is simply a change-of-variables or approximation thereof. For the two existing classes of solutions, we derive the relationship to change(s)-of-variables and illustrate using analytic examples where none had previously existed. Our derivations use neither Bayes' theorem nor the disintegration theorem explicitly. Our final contribution is a careful comparison of changes-of-variables to more traditional statistical inference. While taking stochastic inverse problems at face value for the majority of the paper, our final comparative discussion gives a critique of the framework.

arxiv.org

Towards Reliable Item Sampling for Recommendation Evaluation. (arXiv:2211.15743v1 [cs.IR]) arxiv.org/abs/2211.15743

Towards Reliable Item Sampling for Recommendation Evaluation

Since Rendle and Krichene argued that commonly used sampling-based evaluation metrics are ``inconsistent'' with respect to the global metrics (even in expectation), there have been a few studies on the sampling-based recommender system evaluation. Existing methods try either mapping the sampling-based metrics to their global counterparts or more generally, learning the empirical rank distribution to estimate the top-$K$ metrics. However, despite existing efforts, there is still a lack of rigorous theoretical understanding of the proposed metric estimators, and the basic item sampling also suffers from the ``blind spot'' issue, i.e., estimation accuracy to recover the top-$K$ metrics when $K$ is small can still be rather substantial. In this paper, we provide an in-depth investigation into these problems and make two innovative contributions. First, we propose a new item-sampling estimator that explicitly optimizes the error with respect to the ground truth, and theoretically highlight its subtle difference against prior work. Second, we propose a new adaptive sampling method which aims to deal with the ``blind spot'' problem and also demonstrate the expectation-maximization (EM) algorithm can be generalized for such a setting. Our experimental results confirm our statistical analysis and the superiority of the proposed works. This study helps lay the theoretical foundation for adopting item sampling metrics for recommendation evaluation, and provides strong evidence towards making item sampling a powerful and reliable tool for recommendation evaluation.

arxiv.org

Understanding the Impact of Adversarial Robustness on Accuracy Disparity. (arXiv:2211.15762v1 [cs.LG]) arxiv.org/abs/2211.15762

Understanding the Impact of Adversarial Robustness on Accuracy Disparity

While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also extend our model to the general family of stable distributions. We demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets. The empirical results not only corroborate our theoretical findings, but also suggest that the implications may extend to nonlinear models over real-world datasets.

arxiv.org

Unraveling heterogeneity of ADNI's time-to-event data using conditional entropy Part-I: Cross-sectional study. (arXiv:2211.15763v1 [stat.AP]) arxiv.org/abs/2211.15763

Unraveling heterogeneity of ADNI's time-to-event data using conditional entropy Part-I: Cross-sectional study

Through Alzheimer's Disease Neuroimaging Initiative (ADNI), time-to-event data: from the pre-dementia state of mild cognitive impairment (MCI) to the diagnosis of Alzheimer's disease (AD), is collected and analyzed by explicitly unraveling prognostic heterogeneity among 346 uncensored and 557 right censored subjects under structural dependency among covariate features. The non-informative censoring mechanism is tested and confirmed based on conditional-vs-marginal entropies evaluated upon contingency tables built by the Redistribute-to-the-right algorithm. The Categorical Exploratory Data Analysis (CEDA) paradigm is applied to evaluate conditional entropy-based associative patterns between the categorized response variable against 16 categorized covariable variables all having 4 categories. Two order-1 global major factors: V9 (MEM-mean) and V8 (ADAS13.bl) are selected sharing the highest amounts of mutual information with the response variable. This heavily censored data set is analyzed by Cox's proportional hazard (PH) modeling. Comparisons of PH and CEDA results on a global scale are complicated under the structural dependency of covariate features. To alleviate such complications, V9 and V8 are taken as two potential perspectives of heterogeneity and the entire collections of subjects are divided into two sets of four sub-collections. CEDA major factor selection protocol is applied to all sub-collections to figure out which features provide extra information. Graphic displays are developed to explicitly unravel conditional entropy expansions upon perspectives of heterogeneity in ADNI data. On the local scale, PH analysis is carried out and results are compared with CEDA's. We conclude that, when facing structural dependency among covariates and heterogeneity in data, CEDA and its major factor selection provide significant merits for manifesting data's multiscale information content.

arxiv.org

Accelerated Nonnegative Tensor Completion via Integer Programming. (arXiv:2211.15770v1 [cs.LG]) arxiv.org/abs/2211.15770

Accelerated Nonnegative Tensor Completion via Integer Programming

The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical algorithms. A recent approach, based on integer programming, resolves this tension for nonnegative tensor completion. It achieves the information-theoretic sample complexity rate and deploys the Blended Conditional Gradients algorithm, which requires a linear (in numerical tolerance) number of oracle steps to converge to the global optimum. The tradeoff in this approach is that, in the worst case, the oracle step requires solving an integer linear program. Despite this theoretical limitation, numerical experiments show that this algorithm can, on certain instances, scale up to 100 million entries while running on a personal computer. The goal of this paper is to further enhance this algorithm, with the intention to expand both the breadth and scale of instances that can be solved. We explore several variants that can maintain the same theoretical guarantees as the algorithm, but offer potentially faster computation. We consider different data structures, acceleration of gradient descent steps, and the use of the Blended Pairwise Conditional Gradients algorithm. We describe the original approach and these variants, and conduct numerical experiments in order to explore various tradeoffs in these algorithmic design choices.

arxiv.org

Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data. (arXiv:2211.15771v1 [cs.LG]) arxiv.org/abs/2211.15771

Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data

Hierarchical Bayesian Poisson regression models (HBPRMs) provide a flexible modeling approach of the relationship between predictors and count response variables. The applications of HBPRMs to large-scale datasets require efficient inference algorithms due to the high computational cost of inferring many model parameters based on random sampling. Although Markov Chain Monte Carlo (MCMC) algorithms have been widely used for Bayesian inference, sampling using this class of algorithms is time-consuming for applications with large-scale data and time-sensitive decision-making, partially due to the non-conjugacy of many models. To overcome this limitation, this research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy. In the proposed sampler, the data likelihood is approximated with Gaussian distribution such that the conditional posterior of the coefficients has a closed-form solution. Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS in comparison to the state-of-the-art sampling algorithm, especially for large datasets.

arxiv.org

Information Geometry of Dynamics on Graphs and Hypergraphs. (arXiv:2211.14455v1 [cs.IT]) arxiv.org/abs/2211.14455

Information Geometry of Dynamics on Graphs and Hypergraphs

We introduce a new information-geometric structure of dynamics on discrete objects such as graphs and hypergraphs. The setup consists of two dually flat structures built on the vertex and edge spaces, respectively. The former is the conventional duality between density and potential, e.g., the probability density and its logarithmic form induced by a convex thermodynamic function. The latter is the duality between flux and force induced by a convex and symmetric dissipation function, which drives the dynamics on the manifold. These two are connected topologically by the homological algebraic relation induced by the underlying discrete objects. The generalized gradient flow in this doubly dual flat structure is an extension of the gradient flows on Riemannian manifolds, which include Markov jump processes and nonlinear chemical reaction dynamics as well as the natural gradient and mirror descent. The information-geometric projections on this doubly dual flat structure lead to the information-geometric generalizations of Helmholtz-Hodge-Kodaira decomposition and Otto structure in $L^{2}$ Wasserstein geometry. The structure can be extended to non-gradient nonequilibrium flow, from which we also obtain the induced dually flat structure on cycle spaces. This abstract but general framework can extend the applicability of information geometry to various problems of linear and nonlinear dynamics.

arxiv.org

Multiple imputation for logistic regression models: incorporating an interaction. (arXiv:2211.14556v1 [stat.ME]) arxiv.org/abs/2211.14556

Multiple imputation for logistic regression models: incorporating an interaction

Background: Multiple imputation is often used to reduce bias and gain efficiency when there is missing data. The most appropriate imputation method depends on the model the analyst is interested in fitting. Several imputation approaches have been proposed for when this model is a logistic regression model with an interaction term that contains a binary partially observed variable; however, it is not clear which performs best under certain parameter settings. Methods: Using 1000 simulations, each with 10,000 observations, under six data-generating mechanisms (DGM), we investigate the performance of four methods: (i) 'passive imputation', (ii) 'just another variable' (JAV), (iii) 'stratify-impute-append' (SIA), and (iv) 'substantive model compatible fully conditional specifica-tion' (SMCFCS). The application of each method is shown in an empirical example using England-based cancer registry data. Results: SMCFCS and SIA showed the least biased estimate of the coefficients for the fully, and partially, observed variable and the interaction term. SMCFCS and SIA showed good coverage and low relative error for all DGMs. SMCFCS had a large bias when there was a low prevalence of the fully observed variable in the interaction. SIA performed poorly when the fully observed variable in the interaction had a continuous underlying form. Conclusion: SMCFCS and SIA give consistent estimation for logistic regression models with an interaction term when data are missing at random, and either can be used in most analyses. SMCFCS performed better than SIA when the fully observed variable in the interaction had an underlying continuous form. Researchers should be cautious when using SMCFCS when there is a low prevalence of the fully observed variable in the interaction.

arxiv.org

Valid and efficient imprecise-probabilistic inference with partial priors, II. General framework. (arXiv:2211.14567v1 [stat.ME]) arxiv.org/abs/2211.14567

Valid and efficient imprecise-probabilistic inference with partial priors, II. General framework

Bayesian inference requires specification of a single, precise prior distribution, whereas frequentist inference only accommodates a vacuous prior. Since virtually every real-world application falls somewhere in between these two extremes, a new approach is needed. This series of papers develops a new framework that provides valid and efficient statistical inference, prediction, etc., while accommodating partial prior information and imprecisely-specified models more generally. This paper fleshes out a general inferential model construction that not only yields tests, confidence intervals, etc.~with desirable error rate control guarantees, but also facilitates valid probabilistic reasoning with de~Finetti-style no-sure-loss guarantees. The key technical novelty here is a so-called outer consonant approximation of a general imprecise probability which returns a data- and partial prior-dependent possibility measure to be used for inference and prediction. Despite some potentially unfamiliar imprecise-probabilistic concepts in the development, the result is an intuitive, likelihood-driven framework that will, as expected, agree with the familiar Bayesian and frequentist solutions in the respective extreme cases. More importantly, the proposed framework accommodates partial prior information where available and, therefore, leads to new solutions that were previously out of reach for both Bayesians and frequentists. Details are presented here for a wide range of practical situations, including cases involving nuisance parameters and non-/semi-parametric structure, along with a number of numerical illustrations.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.