arXiv Statistics @arxiv_stats@qoto.org

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019

2 Following 606 Followers

Posts Posts and replies Media

arXiv Statistics @arxiv_stats@qoto.org

Combining low-dose CT-based radiomics and metabolomics for early lung cancer screening support. (arXiv:2311.12810v1 [cs.CV]) http://arxiv.org/abs/2311.12810

Combining low-dose CT-based radiomics and metabolomics for early lung cancer screening support

Due to its predominantly asymptomatic or mildly symptomatic progression, lung cancer is often diagnosed in advanced stages, resulting in poorer survival rates for patients. As with other cancers, early detection significantly improves the chances of successful treatment. Early diagnosis can be facilitated through screening programs designed to detect lung tissue tumors when they are still small, typically around 3mm in size. However, the analysis of extensive screening program data is hampered by limited access to medical experts. In this study, we developed a procedure for identifying potential malignant neoplastic lesions within lung parenchyma. The system leverages machine learning (ML) techniques applied to two types of measurements: low-dose Computed Tomography-based radiomics and metabolomics. Using data from two Polish screening programs, two ML algorithms were tested, along with various integration methods, to create a final model that combines both modalities to support lung cancer screening.

arXiv Statistics @arxiv_stats@qoto.org

A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data. (arXiv:2311.12825v1 [cs.AI]) http://arxiv.org/abs/2311.12825

A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data

Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based on particle swarm optimization (PSO). We propose a simple objective function for the optimization of the instance-centric CF generation problem. The PSO brings in a lot of flexibility in terms of carrying out multi-objective optimization in large dimensions, capability for multiple CF generation, and setting box constraints or immutability of data attributes. An algorithm is proposed that incorporates these features and it enables greater control over the proximity and sparsity properties over the generated CFs. The proposed algorithm is evaluated with a set of action-ability metrics in real-world datasets, and the results were superior compared to that of the state-of-the-arts.

arXiv Statistics @arxiv_stats@qoto.org

Adaptive Bayesian Learning with Action and State-Dependent Signal Variance. (arXiv:2311.12878v1 [stat.ME]) http://arxiv.org/abs/2311.12878

Adaptive Bayesian Learning with Action and State-Dependent Signal Variance

This manuscript presents an advanced framework for Bayesian learning by incorporating action and state-dependent signal variances into decision-making models. This framework is pivotal in understanding complex data-feedback loops and decision-making processes in various economic systems. Through a series of examples, we demonstrate the versatility of this approach in different contexts, ranging from simple Bayesian updating in stable environments to complex models involving social learning and state-dependent uncertainties. The paper uniquely contributes to the understanding of the nuanced interplay between data, actions, outcomes, and the inherent uncertainty in economic models.

arXiv Statistics @arxiv_stats@qoto.org

Acceleration and Implicit Regularization in Gaussian Phase Retrieval. (arXiv:2311.12888v1 [math.OC]) http://arxiv.org/abs/2311.12888

Acceleration and Implicit Regularization in Gaussian Phase Retrieval

We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit regularization ensures that the algorithms remain in a nice region, where the cost function is strongly convex and smooth despite being nonconvex in general. This ensures that these accelerated methods achieve faster rates of convergence than gradient descent. Experimental evidence demonstrates that the accelerated methods converge faster than gradient descent in practice.

arXiv Statistics @arxiv_stats@qoto.org

Non-Sequential Ensemble Kalman Filtering using Distributed Arrays. (arXiv:2311.12909v1 [stat.ML]) http://arxiv.org/abs/2311.12909

Non-Sequential Ensemble Kalman Filtering using Distributed Arrays

This work introduces a new, distributed implementation of the Ensemble Kalman Filter (EnKF) that allows for non-sequential assimilation of large datasets in high-dimensional problems. The traditional EnKF algorithm is computationally intensive and exhibits difficulties in applications requiring interaction with the background covariance matrix, prompting the use of methods like sequential assimilation which can introduce unwanted consequences, such as dependency on observation ordering. Our implementation leverages recent advancements in distributed computing to enable the construction and use of the full model error covariance matrix in distributed memory, allowing for single-batch assimilation of all observations and eliminating order dependencies. Comparative performance assessments, involving both synthetic and real-world paleoclimatic reconstruction applications, indicate that the new, non-sequential implementation outperforms the traditional, sequential one.

arXiv Statistics @arxiv_stats@qoto.org

An $\mathcal{O}(\log_2N)$ SMC$^2$ Algorithm on Distributed Memory with an Approx. Optimal L-Kernel. (arXiv:2311.12973v1 [stat.AP]) http://arxiv.org/abs/2311.12973

An $\mathcal{O}(\log_2N)$ SMC$^2$ Algorithm on Distributed Memory with an Approx. Optimal L-Kernel

Calibrating statistical models using Bayesian inference often requires both accurate and timely estimates of parameters of interest. Particle Markov Chain Monte Carlo (p-MCMC) and Sequential Monte Carlo Squared (SMC$^2$) are two methods that use an unbiased estimate of the log-likelihood obtained from a particle filter (PF) to evaluate the target distribution. P-MCMC constructs a single Markov chain which is sequential by nature so cannot be readily parallelized using Distributed Memory (DM) architectures. This is in contrast to SMC$^2$ which includes processes, such as importance sampling, that are described as \textit{embarrassingly parallel}. However, difficulties arise when attempting to parallelize resampling. None-the-less, the choice of backward kernel, recycling scheme and compatibility with DM architectures makes SMC$^2$ an attractive option when compared with p-MCMC. In this paper, we present an SMC$^2$ framework that includes the following features: an optimal (in terms of time complexity) $\mathcal{O}(\log_2N)$ parallelization for DM architectures, an approximately optimal (in terms of accuracy) backward kernel, and an efficient recycling scheme. On a cluster of $128$ DM processors, the results on a biomedical application show that SMC$^2$ achieves up to a $70\times$ speed-up vs its sequential implementation. It is also more accurate and roughly $54\times$ faster than p-MCMC. A GitHub link is given which provides access to the code.

arXiv Statistics @arxiv_stats@qoto.org

Physics-Informed Priors with Application to Boundary Layer Velocity. (arXiv:2311.12978v1 [physics.flu-dyn]) http://arxiv.org/abs/2311.12978

Physics-Informed Priors with Application to Boundary Layer Velocity

One of the most popular recent areas of machine learning predicates the use of neural networks augmented by information about the underlying process in the form of Partial Differential Equations (PDEs). These physics-informed neural networks are obtained by penalizing the inference with a PDE, and have been cast as a minimization problem currently lacking a formal approach to quantify the uncertainty. In this work, we propose a novel model-based framework which regards the PDE as a prior information of a deep Bayesian neural network. The prior is calibrated without data to resemble the PDE solution in the prior mean, while our degree in confidence on the PDE with respect to the data is expressed in terms of the prior variance. The information embedded in the PDE is then propagated to the posterior yielding physics-informed forecasts with uncertainty quantification. We apply our approach to a simulated viscous fluid and to experimentally-obtained turbulent boundary layer velocity in a wind tunnel using an appropriately simplified Navier-Stokes equation. Our approach requires very few observations to produce physically-consistent forecasts as opposed to non-physical forecasts stemming from non-informed priors, thereby allowing forecasting complex systems where some amount of data as well as some contextual knowledge is available.

arXiv Statistics @arxiv_stats@qoto.org

Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation. (arXiv:2311.13010v1 [math.ST]) http://arxiv.org/abs/2311.13010

Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation

We study the fundamental problem of estimating the mean of a $d$-dimensional distribution with covariance $Σ\preccurlyeq σ^2 I_d$ given $n$ samples. When $d = 1$, Catoni \cite{catoni} showed an estimator with error $(1+o(1)) \cdot σ\sqrt{\frac{2 \log \frac{1}δ}{n}}$, with probability $1 - δ$, matching the Gaussian error rate. For $d>1$, a natural estimator outputs the center of the minimum enclosing ball of one-dimensional confidence intervals to achieve a $1-δ$ confidence radius of $\sqrt{\frac{2 d}{d+1}} \cdot σ\left(\sqrt{\frac{d}{n}} + \sqrt{\frac{2 \log \frac{1}δ}{n}}\right)$, incurring a $\sqrt{\frac{2d}{d+1}}$-factor loss over the Gaussian rate. When the $\sqrt{\frac{d}{n}}$ term dominates by a $\sqrt{\log \frac{1}δ}$ factor, \cite{lee2022optimal-highdim} showed an improved estimator matching the Gaussian rate. This raises a natural question: is the Gaussian rate achievable in general? Or is the $\sqrt{\frac{2 d}{d+1}}$ loss \emph{necessary} when the $\sqrt{\frac{2 \log \frac{1}δ}{n}}$ term dominates? We show that the answer to both these questions is \emph{no} -- we show that \emph{some} constant-factor loss over the Gaussian rate is necessary, but construct an estimator that improves over the above naive estimator by a constant factor. We also consider robust estimation, where an adversary is allowed to corrupt an $ε$-fraction of samples arbitrarily: in this case, we show that the above strategy of combining one-dimensional estimates and incurring the $\sqrt{\frac{2d}{d+1}}$-factor \emph{is} optimal in the infinite-sample limit.

arXiv Statistics @arxiv_stats@qoto.org

W-kernel and essential subspace for frequencist's evaluation of Bayesian estimators. (arXiv:2311.13017v1 [stat.ME]) http://arxiv.org/abs/2311.13017

W-kernel and essential subspace for frequencist's evaluation of Bayesian estimators

The posterior covariance matrix W defined by the log-likelihood of each observation plays important roles both in the sensitivity analysis and frequencist's evaluation of the Bayesian estimators. This study focused on the matrix W and its principal space; we term the latter as an essential subspace. First, it is shown that they appear in various statistical settings, such as the evaluation of the posterior sensitivity, assessment of the frequencist's uncertainty from posterior samples, and stochastic expansion of the loss; a key tool to treat frequencist's properties is the recently proposed Bayesian infinitesimal jackknife approximation (Giordano and Broderick (2023)). In the following part, we show that the matrix W can be interpreted as a reproducing kernel; it is named as W-kernel. Using the W-kernel, the essential subspace is expressed as a principal space given by the kernel PCA. A relation to the Fisher kernel and neural tangent kernel is established, which elucidates the connection to the classical asymptotic theory; it also leads to a sort of Bayesian-frequencist's duality. Finally, two applications, selection of a representative set of observations and dimensional reduction in the approximate bootstrap, are discussed. In the former, incomplete Cholesky decomposition is introduced as an efficient method to compute the essential subspace. In the latter, different implementations of the approximate bootstrap for posterior means are compared.

arXiv Statistics @arxiv_stats@qoto.org

Favour: FAst Variance Operator for Uncertainty Rating. (arXiv:2311.13036v1 [cs.LG]) http://arxiv.org/abs/2311.13036

Favour: FAst Variance Operator for Uncertainty Rating

Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Unfortunately many inference samples are often needed, the overhead of which greatly hinder BNN's wide adoption. To mitigate this, previous work proposed propagating the first and second moments of the posterior directly through the network. However, on its own this method is even slower than sampling, so the propagated variance needs to be approximated such as assuming independence between neural nodes. The resulting trade-off between quality and inference time did not match even plain Monte Carlo sampling. Our contribution is a more principled variance propagation framework based on "spiked covariance matrices", which smoothly interpolates between quality and inference time. This is made possible by a new fast algorithm for updating a diagonal-plus-low-rank matrix approximation under various operations. We tested our algorithm against sampling based MC Dropout and Variational Inference on a number of downstream uncertainty themed tasks, such as calibration and out-of-distribution testing. We find that Favour is as fast as performing 2-3 inference samples, while matching the performance of 10-100 samples. In summary, this work enables the use of BNN in the realm of performance critical tasks where they have previously been out of reach.

arXiv Statistics @arxiv_stats@qoto.org

Quantum Inception Score. (arXiv:2311.12163v1 [quant-ph]) http://arxiv.org/abs/2311.12163

Quantum Inception Score

Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such examples is the inception score. In this paper, we propose the quantum inception score, which relates the quality to the classical capacity of the quantum channel that classifies a given dataset. We prove that, under this proposed measure, the quantum generative models provide better quality than their classical counterparts because of the presence of quantum coherence and entanglement. Finally, we harness the quantum fluctuation theorem to characterize the physical limitation of the quality of quantum generative models.

arXiv Statistics @arxiv_stats@qoto.org

Optimal Transport Divergences induced by Scoring Functions. (arXiv:2311.12183v1 [q-fin.RM]) http://arxiv.org/abs/2311.12183

Optimal Transport Divergences induced by Scoring Functions

We employ scoring functions, used in statistics for eliciting risk functionals, as cost functions in the Monge-Kantorovich (MK) optimal transport problem. This gives raise to a rich variety of novel asymmetric MK divergences, which subsume the family of Bregman-Wasserstein divergences. We show that for distributions on the real line, the comonotonic coupling is optimal for the majority the new divergences. Specifically, we derive the optimal coupling of the MK divergences induced by functionals including the mean, generalised quantiles, expectiles, and shortfall measures. Furthermore, we show that while any elicitable law-invariant convex risk measure gives raise to infinitely many MK divergences, the comonotonic coupling is simultaneously optimal. The novel MK divergences, which can be efficiently calculated, open an array of applications in robust stochastic optimisation. We derive sharp bounds on distortion risk measures under a Bregman-Wasserstein divergence constraint, and solve for cost-efficient portfolio strategies under benchmark constraints.

arXiv Statistics @arxiv_stats@qoto.org

ChatGPT and post-test probability. (arXiv:2311.12188v1 [cs.AI]) http://arxiv.org/abs/2311.12188

ChatGPT and post-test probability

Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a "posterior probability") to queries that use terminology from the medical diagnosis literature (e.g., requests for a "post-test probability"). We show how the introduction of medical variable names leads to an increase in the number of errors that ChatGPT makes. Given our results, we also show how one can use prompt engineering to facilitate ChatGPT's partial avoidance of these errors. We discuss our results in light of recent commentaries on sensitivity and specificity. We also discuss how our results might inform new research directions for large language models.

arXiv Statistics @arxiv_stats@qoto.org

Random Fourier Signature Features. (arXiv:2311.12214v1 [stat.ML]) http://arxiv.org/abs/2311.12214

Random Fourier Signature Features

Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.

arXiv Statistics @arxiv_stats@qoto.org

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning. (arXiv:2311.12244v1 [cs.LG]) http://arxiv.org/abs/2311.12244

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

In real-world reinforcement learning problems, the state information is often only partially observable, which breaks the basic assumption in Markov decision processes, and thus, leads to inferior performances. Partially Observable Markov Decision Processes have been introduced to explicitly take the issue into account for learning, exploration, and planning, but presenting significant computational and statistical challenges. To address these difficulties, we exploit the representation view, which leads to a coherent design framework for a practically tractable reinforcement learning algorithm upon partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm. We also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, therefore, pushing reliable reinforcement learning towards more practical applications.

arXiv Statistics @arxiv_stats@qoto.org

Sensitivity analysis with multiple treatments and multiple outcomes with applications to air pollution mixtures. (arXiv:2311.12252v1 [stat.ME]) http://arxiv.org/abs/2311.12252

Sensitivity analysis with multiple treatments and multiple outcomes with applications to air pollution mixtures

Understanding the health impacts of air pollution is vital in public health research. Numerous studies have estimated negative health effects of a variety of pollutants, but accurately gauging these impacts remains challenging due to the potential for unmeasured confounding bias that is ubiquitous in observational studies. In this study, we develop a framework for sensitivity analysis in settings with both multiple treatments and multiple outcomes simultaneously. This setting is of particular interest because one can identify the strength of association between the unmeasured confounders and both the treatment and outcome, under a factor confounding assumption. This provides informative bounds on the causal effect leading to partial identification regions for the effects of multivariate treatments that account for the maximum possible bias from unmeasured confounding. We also show that when negative controls are available, we are able to refine the partial identification regions substantially, and in certain cases, even identify the causal effect in the presence of unmeasured confounding. We derive partial identification regions for general estimands in this setting, and develop a novel computational approach to finding these regions.

arXiv Statistics @arxiv_stats@qoto.org

The limitation of neural nets for approximation and optimization. (arXiv:2311.12253v1 [cs.LG]) http://arxiv.org/abs/2311.12253

The limitation of neural nets for approximation and optimization

We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.

arXiv Statistics @arxiv_stats@qoto.org

Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity. (arXiv:2311.12267v1 [cs.LG]) http://arxiv.org/abs/2311.12267

Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity

This paper studies causal representation learning, the task of recovering high-level latent variables and their causal relationships from low-level data that we observe, assuming access to observations generated from multiple environments. While existing works are able to prove full identifiability of the underlying data generating process, they typically assume access to single-node, hard interventions which is rather unrealistic in practice. The main contribution of this paper is characterize a notion of identifiability which is provably the best one can achieve when hard interventions are not available. First, for linear causal models, we provide identifiability guarantee for data observed from general environments without assuming any similarities between them. While the causal graph is shown to be fully recovered, the latent variables are only identified up to an effect-domination ambiguity (EDA). We then propose an algorithm, LiNGCReL which is guaranteed to recover the ground-truth model up to EDA, and we demonstrate its effectiveness via numerical experiments. Moving on to general non-parametric causal models, we prove the same idenfifiability guarantee assuming access to groups of soft interventions. Finally, we provide counterparts of our identifiability results, indicating that EDA is basically inevitable in our setting.

arXiv Statistics @arxiv_stats@qoto.org

Sample size calculation based on the difference in restricted mean time lost for clinical trials with competing risks. (arXiv:2311.12293v1 [stat.ME]) http://arxiv.org/abs/2311.12293

Sample size calculation based on the difference in restricted mean time lost for clinical trials with competing risks

Computation of sample size is important when designing clinical trials. The presence of competing risks makes the design of clinical trials with time-to-event endpoints cumbersome. A model based on the subdistribution hazard ratio (SHR) is commonly used for trials under competing risks. However, this approach has some limitations related to model assumptions and clinical interpretation. Considering such limitations, the difference in restricted mean time lost (RMTLd) is recommended as an alternative indicator. In this paper, we propose a sample size calculation method based on the RMTLd for the Weibull distribution (RMTLdWeibull) for clinical trials, which considers experimental conditions such as equal allocation, uniform accrual, uniform loss to follow-up, and administrative censoring. Simulation results show that sample size calculation based on the RMTLdWeibull can generally achieve a predefined power level and maintain relative robustness. Moreover, the performance of the sample size calculation based on the RMTLdWeibull is similar or superior to that based on the SHR. Even if the event time does not follow the Weibull distribution, the sample size calculation based on the RMTLdWeibull still performs well. The results also verify the performance of the sample size calculation method based on the RMTLdWeibull. From the perspective of the results of this study, clinical interpretation, application conditions and statistical performance, we recommend that when designing clinical trials in the presence of competing risks, the RMTLd indicator be applied for sample size calculation and subsequent effect size measurement.

arXiv Statistics @arxiv_stats@qoto.org

A unified consensus-based parallel ADMM algorithm for high-dimensional regression with combined regularizations. (arXiv:2311.12319v1 [stat.ML]) http://arxiv.org/abs/2311.12319

A unified consensus-based parallel ADMM algorithm for high-dimensional regression with combined regularizations

The parallel alternating direction method of multipliers (ADMM) algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solving statistical learning models. However, there is currently limited research on parallel algorithms specifically designed for high-dimensional regression with combined (composite) regularization terms. These terms, such as elastic-net, sparse group lasso, sparse fused lasso, and their nonconvex variants, have gained significant attention in various fields due to their ability to incorporate prior information and promote sparsity within specific groups or fused variables. The scarcity of parallel algorithms for combined regularizations can be attributed to the inherent nonsmoothness and complexity of these terms, as well as the absence of closed-form solutions for certain proximal operators associated with them. In this paper, we propose a unified constrained optimization formulation based on the consensus problem for these types of convex and nonconvex regression problems and derive the corresponding parallel ADMM algorithms. Furthermore, we prove that the proposed algorithm not only has global convergence but also exhibits linear convergence rate. Extensive simulation experiments, along with a financial example, serve to demonstrate the reliability, stability, and scalability of our algorithm. The R package for implementing the proposed algorithms can be obtained at https://github.com/xfwu1016/CPADMM.

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019