arXiv Statistics @arxiv_stats@qoto.org

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019

2 Following 603 Followers

Posts Posts and replies Media

arXiv Statistics @arxiv_stats@qoto.org

Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis https://arxiv.org/abs/2411.13608 #stat.AP #cs.AI

Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis

This paper presents an innovative approach to Extreme Value Analysis (EVA) by introducing the Extreme Value Dynamic Benchmarking Method (EVDBM). EVDBM integrates extreme value theory to detect extreme events and is coupled with the novel Dynamic Identification of Significant Correlation (DISC)-Thresholding algorithm, which enhances the analysis of key variables under extreme conditions. By integrating return values predicted through EVA into the benchmarking scores, we are able to transform these scores to reflect anticipated conditions more accurately. This provides a more precise picture of how each case is projected to unfold under extreme conditions. As a result, the adjusted scores offer a forward-looking perspective, highlighting potential vulnerabilities and resilience factors for each case in a way that static historical data alone cannot capture. By incorporating both historical and probabilistic elements, the EVDBM algorithm provides a comprehensive benchmarking framework that is adaptable to a range of scenarios and contexts. The methodology is applied to real PV data, revealing critical low - production scenarios and significant correlations between variables, which aid in risk management, infrastructure design, and long-term planning, while also allowing for the comparison of different production plants. The flexibility of EVDBM suggests its potential for broader applications in other sectors where decision-making sensitivity is crucial, offering valuable insights to improve outcomes.

arXiv Statistics @arxiv_stats@qoto.org

Accounting carbon emissions from electricity generation: a review and comparison of emission factor-based methods https://arxiv.org/abs/2411.13663 #stat.AP

Accounting carbon emissions from electricity generation: a review and comparison of emission factor-based methods

Accurate estimation of greenhouse gas (GHG) is essential to meet carbon neutrality targets, particularly through the calculation of direct CO2 emissions from electricity generation. This work reviews and compares emission factor-based methods for accounting direct carbon emissions from electricity generation. The emission factor approach is commonly worldwide used. Empirical comparisons are based on emission factors computed using data from the Italian electricity market. The analyses reveal significant differences in the CO2 estimates according to different methods. This, in turn, highlights the need to select an appropriate method for reliable emissions, which could support effective regulatory compliance and informed policy-making. As concerns, in particular, the market zones of the Italian electricity market, the results underscore the importance of tailoring emission factors to accurately capture regional fuel variations.

arXiv Statistics @arxiv_stats@qoto.org

Randomized Basket Trial with an Interim Analysis (RaBIt) and Applications in Mental Health https://arxiv.org/abs/2411.13692 #stat.ME

Randomized Basket Trial with an Interim Analysis (RaBIt) and Applications in Mental Health

Basket trials can efficiently evaluate a single treatment across multiple diseases with a common shared target. Prior methods for randomized basket trials required baskets to have the same sample and effect sizes. To that end, we developed a general randomized basket trial with an interim analysis (RaBIt) that allows for unequal sample sizes and effect sizes per basket. RaBIt is characterized by pruning at an interim stage and then analyzing a pooling of the remaining baskets. We derived the analytical power and type 1 error for the design. We first show that our results are consistent with the prior methods when the sample and effect sizes were the same across baskets. As we adjust the sample allocation between baskets, our threshold for the final test statistic becomes more stringent in order to maintain the same overall type 1 error. Finally, we notice that if we fix a sample size for the baskets proportional to their accrual rate, then at the cost of an almost negligible amount of power, the trial overall is expected to take substantially less time than the non-generalized version.

arXiv Statistics @arxiv_stats@qoto.org

The Fast and the Furious: Tracking the Effect of the Tomoa Skip on Speed Climbing https://arxiv.org/abs/2411.13696 #stat.AP

The Fast and the Furious: Tracking the Effect of the Tomoa Skip on Speed Climbing

Sport climbing is an athletic discipline comprised of three sub-disciplines -- lead climbing, bouldering, and speed climbing. These three sub-disciplines have distinct goals, resulting in specialization of athletes into one of the three events. The year 2020 marked the first inclusion of sport climbing in the Olympic Games. While this decision was met with excitement from the climbing community, it was not without controversy. The International Olympic Committee had allocated one set of medals for the entire sport, necessitating the combination of sub-disciplines into one competition. As a result, athletes who specialized in lead and bouldering were forced to train and compete in speed for the first time in their careers. One such athlete was Tomoa Narasaki, a World Champion boulderer, who introduced a new method of approaching the speed event. This approach, deemed the Tomoa Skip (TS), was subsequently adopted by many of the top speed climbers. Concurrently, speed records fell rapidly (from 5.48s in 2017 to 4.90s in 2023). Speed climbing involves ascending a 15m wall containing the same pattern of obstacles. Thus, records can be compared across time. In this paper we investigate the effect of the TS on speed climbing by answering two questions: (1) Did the TS result in a decrease in speed times? and (2) Do climbers who utilize the TS show less consistency? The success of the TS highlights the potential of collaboration between different disciplines of sport, showing athletes of diverse backgrounds may contribute to the evolution of competition.

arXiv Statistics @arxiv_stats@qoto.org

An Economical Approach to Design Posterior Analyses https://arxiv.org/abs/2411.13748 #stat.ME

An Economical Approach to Design Posterior Analyses

To design Bayesian studies, criteria for the operating characteristics of posterior analyses - such as power and the type I error rate - are often assessed by estimating sampling distributions of posterior probabilities via simulation. In this paper, we propose an economical method to determine optimal sample sizes and decision criteria for such studies. Using our theoretical results that model posterior probabilities as a function of the sample size, we assess operating characteristics throughout the sample size space given simulations conducted at only two sample sizes. These theoretical results are used to construct bootstrap confidence intervals for the optimal sample sizes and decision criteria that reflect the stochastic nature of simulation-based design. We also repurpose the simulations conducted in our approach to efficiently investigate various sample sizes and decision criteria using contour plots. The broad applicability and wide impact of our methodology is illustrated using two clinical examples.

arXiv Statistics @arxiv_stats@qoto.org

Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data https://arxiv.org/abs/2411.13763 #math.ST #stat.ME #stat.ML #stat.TH

Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data

In the measurement-constrained problems, despite the availability of large datasets, we may be only affordable to observe the labels on a small portion of the large dataset. This poses a critical question that which data points are most beneficial to label given a budget constraint. In this paper, we focus on the estimation of the optimal individualized threshold in a measurement-constrained M-estimation framework. Our goal is to estimate a high-dimensional parameter $θ$ in a linear threshold $θ^T Z$ for a continuous variable $X$ such that the discrepancy between whether $X$ exceeds the threshold $θ^T Z$ and a binary outcome $Y$ is minimized. We propose a novel $K$-step active subsampling algorithm to estimate $θ$, which iteratively samples the most informative observations and solves a regularized M-estimator. The theoretical properties of our estimator demonstrate a phase transition phenomenon with respect to $β\geq 1$, the smoothness of the conditional density of $X$ given $Y$ and $Z$. For $β>(1+\sqrt{3})/2$, we show that the two-step algorithm yields an estimator with the parametric convergence rate $O_p((s \log d /N)^{1/2})$ in $l_2$ norm. The rate of our estimator is strictly faster than the minimax optimal rate with $N$ i.i.d. samples drawn from the population. For the other two scenarios $1<β\leq (1+\sqrt{3})/2$ and $β=1$, the estimator from the two-step algorithm is sub-optimal. The former requires to run $K>2$ steps to attain the same parametric rate, whereas in the latter case only a near parametric rate can be obtained. Furthermore, we formulate a minimax framework for the measurement-constrained M-estimation problem and prove that our estimator is minimax rate optimal up to a logarithmic factor. Finally, we demonstrate the performance of our method in simulation studies and apply the method to analyze a large diabetes dataset.

arXiv Statistics @arxiv_stats@qoto.org

Off-policy estimation with adaptively collected data: the power of online learning https://arxiv.org/abs/2411.12786 #stat.ML #math.OC #math.ST #stat.TH #cs.LG

Off-policy estimation with adaptively collected data: the power of online learning

We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (\romannumeral 1) the tabular case; (\romannumeral 2) the case of linear function approximation; and (\romannumeral 3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the \textsf{AIPW} estimator using no-regret online learning algorithms.

arXiv Statistics @arxiv_stats@qoto.org

The Aldous--Hoover Theorem in Categorical Probability https://arxiv.org/abs/2411.12840 #math.ST #math.CT #math.PR #stat.TH #cs.LO

The Aldous--Hoover Theorem in Categorical Probability

The Aldous-Hoover Theorem concerns an infinite matrix of random variables whose distribution is invariant under finite permutations of rows and columns. It states that, up to equality in distribution, each random variable in the matrix can be expressed as a function only depending on four key variables: one common to the entire matrix, one that encodes information about its row, one that encodes information about its column, and a fourth one specific to the matrix entry. We state and prove the theorem within a category-theoretic approach to probability, namely the theory of Markov categories. This makes the proof more transparent and intuitive when compared to measure-theoretic ones. A key role is played by a newly identified categorical property, the Cauchy--Schwarz axiom, which also facilitates a new synthetic de Finetti Theorem. We further provide a variant of our proof using the ordered Markov property and the d-separation criterion, both generalized from Bayesian networks to Markov categories. We expect that this approach will facilitate a systematic development of more complex results in the future, such as categorical approaches to hierarchical exchangeability.

arXiv Statistics @arxiv_stats@qoto.org

A new Input Convex Neural Network with application to options pricing https://arxiv.org/abs/2411.12854 #stat.ML #cs.LG

A new Input Convex Neural Network with application to options pricing

We introduce a new class of neural networks designed to be convex functions of their inputs, leveraging the principle that any convex function can be represented as the supremum of the affine functions it dominates. These neural networks, inherently convex with respect to their inputs, are particularly well-suited for approximating the prices of options with convex payoffs. We detail the architecture of this, and establish theoretical convergence bounds that validate its approximation capabilities. We also introduce a \emph{scrambling} phase to improve the training of these networks. Finally, we demonstrate numerically the effectiveness of these networks in estimating prices for three types of options with convex payoffs: Basket, Bermudan, and Swing options.

arXiv Statistics @arxiv_stats@qoto.org

Modelling Directed Networks with Reciprocity https://arxiv.org/abs/2411.12871 #stat.ME

Modelling Directed Networks with Reciprocity

Asymmetric relational data is increasingly prevalent across diverse fields, underscoring the need for directed network models to address the complex challenges posed by their unique structures. Unlike undirected models, directed models can capture reciprocity, the tendency of nodes to form mutual links. In this work, we address a fundamental question: what is the effective sample size for modeling reciprocity? We examine this by analyzing the Bernoulli model with reciprocity, allowing for varying sparsity levels between non-reciprocal and reciprocal effects. We then extend this framework to a model that incorporates node-specific heterogeneity and link-specific reciprocity using covariates. Our findings reveal intriguing interplays between non-reciprocal and reciprocal effects in sparse networks. We propose a straightforward inference procedure based on maximum likelihood estimation that operates without prior knowledge of sparsity levels, whether covariates are included or not.

arXiv Statistics @arxiv_stats@qoto.org

Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit https://arxiv.org/abs/2411.12878 #stat.ML #cs.LG

Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the \textit{Local Anti-Concentration} (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency. We show that the LAC condition is satisfied by a broad class of distributions, including Gaussian, exponential, uniform, Cauchy, and Student's~$t$ distributions, along with other exponential family distributions and their truncated variants. This significantly expands the class of distributions under which greedy algorithms can perform efficiently. Under our proposed LAC condition, we prove that the cumulative expected regret of the greedy algorithm for the linear contextual bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the widest range of distributions known to date that allow a sublinear regret bound for greedy algorithms, further achieving a sharp poly-logarithmic regret.

arXiv Statistics @arxiv_stats@qoto.org

Goodness-of-fit tests for generalized Poisson distributions https://arxiv.org/abs/2411.12889 #stat.ME

Goodness-of-fit tests for generalized Poisson distributions

This paper presents and examines computationally convenient goodness-of-fit tests for the family of generalized Poisson distributions, which encompasses notable distributions such as the Compound Poisson and the Katz distributions. The tests are consistent against fixed alternatives and their null distribution can be consistently approximated by a parametric bootstrap. The goodness of the bootstrap estimator and the power for finite sample sizes are numerically assessed through an extensive simulation experiment, including comparisons with other tests. In many cases, the novel tests either outperform or match the performance of existing ones. Real data applications are considered for illustrative purposes.

arXiv Statistics @arxiv_stats@qoto.org

Statistical inference for mean-field queueing systems https://arxiv.org/abs/2411.12936 #math.ST #math.PR #stat.TH

Statistical inference for mean-field queueing systems

Mean-field limits have been used now as a standard tool in approximations, including for networks with a large number of nodes. Statistical inference on mean-filed models has attracted more attention recently mainly due to the rapid emergence of data-driven systems. However, studies reported in the literature have been mainly limited to continuous models. In this paper, we initiate a study of statistical inference on discrete mean-field models (or jump processes) in terms of a well-known and extensively studied model, known as the power-of-L, or the supermarket model, to demonstrate how to deal with new challenges in discrete models. We focus on system parameter estimation based on the observations of system states at discrete time epochs over a finite period. We show that by harnessing the weak convergence results developed for the supermarket model in the literature, an asymptotic inference scheme based on an approximate least squares estimation can be obtained from the mean-field limiting equation. Also, by leveraging the law of large numbers alongside the central limit theorem, the consistency of the estimator and its asymptotic normality can be established when the number of servers and the number of observations go to infinity. Moreover, numerical results for the power-of-two model are provided to show the efficiency and accuracy of the proposed estimator.

arXiv Statistics @arxiv_stats@qoto.org

Probability distributions and calculations for Hake's ratio statistics in measuring effect size https://arxiv.org/abs/2411.12938 #physics.data-an #stat.CO

Probability distributions and calculations for Hake's ratio statistics in measuring effect size

Ratio statistics and distributions play a crucial role in various fields, including linear regression, metrology, nuclear physics, operations research, econometrics, biostatistics, genetics, and engineering. In this work, we examine the statistical properties and probability calculations of the Hake normalized gain as a measure of effect size and educational effectiveness in physics education. Leveraging existing knowledge about the Hake ratio as a ratio of normal variables and utilizing open data science tools, we developed two novel computational approaches for computing ratio distributions. Our pilot numerical study demonstrates the speed, accuracy, and reliability of calculating ratio distributions through (1) DE quadrature with/without barycentric interpolation, a very quick and efficient quadrature method, and (2) a 2D vectorized numerical inversion of characteristic functions, which offers broader applicability by not requiring knowledge of PDFs or the independence of ratio constituents. These numerical explorations not only deepen the understanding of the Hake ratio's distribution but also showcase the efficiency, precision, and versatility of our proposed methods, making them highly suitable for fast data analysis based on exact probability ratio distributions. This capability has potential applications in multidimensional statistics and uncertainty analysis in metrology, where precise and reliable data handling is essential.

arXiv Statistics @arxiv_stats@qoto.org

From Estimands to Robust Inference of Treatment Effects in Platform Trials https://arxiv.org/abs/2411.12944 #stat.ME

From Estimands to Robust Inference of Treatment Effects in Platform Trials

A platform trial is an innovative clinical trial design that uses a master protocol (i.e., one overarching protocol) to evaluate multiple treatments in an ongoing manner and can accelerate the evaluation of new treatments. However, the flexibility that marks the potential of platform trials also creates inferential challenges. Two key challenges are the precise definition of treatment effects and the robust and efficient inference on these effects. To address these challenges, we first define a clinically meaningful estimand that characterizes the treatment effect as a function of the expected outcomes under two given treatments among concurrently eligible patients. Then, we develop weighting and post-stratification methods for estimation of treatment effects with minimal assumptions. To fully leverage the efficiency potential of data from concurrently eligible patients, we also consider a model-assisted approach for baseline covariate adjustment to gain efficiency while maintaining robustness against model misspecification. We derive and compare asymptotic distributions of proposed estimators in theory and propose robust variance estimators. The proposed estimators are empirically evaluated in a simulation study and illustrated using the SIMPLIFY trial. Our methods are implemented in the R package RobinCID.

arXiv Statistics @arxiv_stats@qoto.org

On adaptivity and minimax optimality of two-sided nearest neighbors https://arxiv.org/abs/2411.12965 #stat.ML #math.ST #stat.ME #stat.TH #cs.LG

On adaptivity and minimax optimality of two-sided nearest neighbors

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of missingness. In particular, we consider matrix completion settings where the entries of the underlying matrix follow a latent non-linear factor model, with the non-linearity belonging to a \Holder function class that is less smooth than Lipschitz. Our results establish following favorable properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN adapts to the smoothness of the non-linearity, (2) under certain regularity conditions, the NN error rate matches the rate obtained by an oracle equipped with the knowledge of both the row and column latent factors, and finally (3) NN's MSE is non-trivial for a wide range of settings even when several matrix entries might be missing deterministically. We support our theoretical findings via extensive numerical simulations and a case study with data from a mobile health study, HeartSteps.

arXiv Statistics @arxiv_stats@qoto.org

The occlusion process: improving sampler performance with parallel computation and variational approximation https://arxiv.org/abs/2411.11983 #stat.CO

arXiv Statistics @arxiv_stats@qoto.org

US COVID-19 school closure was not cost-effective, but other measures were https://arxiv.org/abs/2411.12016 #stat.AP

arXiv Statistics @arxiv_stats@qoto.org

On the Efficiency of ERM in Feature Learning https://arxiv.org/abs/2411.12029 #stat.ML #math.ST #stat.TH #cs.LG

arXiv Statistics @arxiv_stats@qoto.org

Prediction-Guided Active Experiments https://arxiv.org/abs/2411.12036 #stat.ML #econ.EM #cs.LG

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019