Show newer

Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis arxiv.org/abs/2411.13608 .AP .AI

Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis

This paper presents an innovative approach to Extreme Value Analysis (EVA) by introducing the Extreme Value Dynamic Benchmarking Method (EVDBM). EVDBM integrates extreme value theory to detect extreme events and is coupled with the novel Dynamic Identification of Significant Correlation (DISC)-Thresholding algorithm, which enhances the analysis of key variables under extreme conditions. By integrating return values predicted through EVA into the benchmarking scores, we are able to transform these scores to reflect anticipated conditions more accurately. This provides a more precise picture of how each case is projected to unfold under extreme conditions. As a result, the adjusted scores offer a forward-looking perspective, highlighting potential vulnerabilities and resilience factors for each case in a way that static historical data alone cannot capture. By incorporating both historical and probabilistic elements, the EVDBM algorithm provides a comprehensive benchmarking framework that is adaptable to a range of scenarios and contexts. The methodology is applied to real PV data, revealing critical low - production scenarios and significant correlations between variables, which aid in risk management, infrastructure design, and long-term planning, while also allowing for the comparison of different production plants. The flexibility of EVDBM suggests its potential for broader applications in other sectors where decision-making sensitivity is crucial, offering valuable insights to improve outcomes.

arXiv.org

The Fast and the Furious: Tracking the Effect of the Tomoa Skip on Speed Climbing arxiv.org/abs/2411.13696 .AP

The Fast and the Furious: Tracking the Effect of the Tomoa Skip on Speed Climbing

Sport climbing is an athletic discipline comprised of three sub-disciplines -- lead climbing, bouldering, and speed climbing. These three sub-disciplines have distinct goals, resulting in specialization of athletes into one of the three events. The year 2020 marked the first inclusion of sport climbing in the Olympic Games. While this decision was met with excitement from the climbing community, it was not without controversy. The International Olympic Committee had allocated one set of medals for the entire sport, necessitating the combination of sub-disciplines into one competition. As a result, athletes who specialized in lead and bouldering were forced to train and compete in speed for the first time in their careers. One such athlete was Tomoa Narasaki, a World Champion boulderer, who introduced a new method of approaching the speed event. This approach, deemed the Tomoa Skip (TS), was subsequently adopted by many of the top speed climbers. Concurrently, speed records fell rapidly (from 5.48s in 2017 to 4.90s in 2023). Speed climbing involves ascending a 15m wall containing the same pattern of obstacles. Thus, records can be compared across time. In this paper we investigate the effect of the TS on speed climbing by answering two questions: (1) Did the TS result in a decrease in speed times? and (2) Do climbers who utilize the TS show less consistency? The success of the TS highlights the potential of collaboration between different disciplines of sport, showing athletes of diverse backgrounds may contribute to the evolution of competition.

arXiv.org

Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data arxiv.org/abs/2411.13763 .ST .ME .ML .TH

Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data

In the measurement-constrained problems, despite the availability of large datasets, we may be only affordable to observe the labels on a small portion of the large dataset. This poses a critical question that which data points are most beneficial to label given a budget constraint. In this paper, we focus on the estimation of the optimal individualized threshold in a measurement-constrained M-estimation framework. Our goal is to estimate a high-dimensional parameter $θ$ in a linear threshold $θ^T Z$ for a continuous variable $X$ such that the discrepancy between whether $X$ exceeds the threshold $θ^T Z$ and a binary outcome $Y$ is minimized. We propose a novel $K$-step active subsampling algorithm to estimate $θ$, which iteratively samples the most informative observations and solves a regularized M-estimator. The theoretical properties of our estimator demonstrate a phase transition phenomenon with respect to $β\geq 1$, the smoothness of the conditional density of $X$ given $Y$ and $Z$. For $β>(1+\sqrt{3})/2$, we show that the two-step algorithm yields an estimator with the parametric convergence rate $O_p((s \log d /N)^{1/2})$ in $l_2$ norm. The rate of our estimator is strictly faster than the minimax optimal rate with $N$ i.i.d. samples drawn from the population. For the other two scenarios $1<β\leq (1+\sqrt{3})/2$ and $β=1$, the estimator from the two-step algorithm is sub-optimal. The former requires to run $K>2$ steps to attain the same parametric rate, whereas in the latter case only a near parametric rate can be obtained. Furthermore, we formulate a minimax framework for the measurement-constrained M-estimation problem and prove that our estimator is minimax rate optimal up to a logarithmic factor. Finally, we demonstrate the performance of our method in simulation studies and apply the method to analyze a large diabetes dataset.

arXiv.org

Off-policy estimation with adaptively collected data: the power of online learning arxiv.org/abs/2411.12786 .ML .OC .ST .TH .LG

Off-policy estimation with adaptively collected data: the power of online learning

We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (\romannumeral 1) the tabular case; (\romannumeral 2) the case of linear function approximation; and (\romannumeral 3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the \textsf{AIPW} estimator using no-regret online learning algorithms.

arXiv.org

Statistical inference for mean-field queueing systems arxiv.org/abs/2411.12936 .ST .PR .TH

Statistical inference for mean-field queueing systems

Mean-field limits have been used now as a standard tool in approximations, including for networks with a large number of nodes. Statistical inference on mean-filed models has attracted more attention recently mainly due to the rapid emergence of data-driven systems. However, studies reported in the literature have been mainly limited to continuous models. In this paper, we initiate a study of statistical inference on discrete mean-field models (or jump processes) in terms of a well-known and extensively studied model, known as the power-of-L, or the supermarket model, to demonstrate how to deal with new challenges in discrete models. We focus on system parameter estimation based on the observations of system states at discrete time epochs over a finite period. We show that by harnessing the weak convergence results developed for the supermarket model in the literature, an asymptotic inference scheme based on an approximate least squares estimation can be obtained from the mean-field limiting equation. Also, by leveraging the law of large numbers alongside the central limit theorem, the consistency of the estimator and its asymptotic normality can be established when the number of servers and the number of observations go to infinity. Moreover, numerical results for the power-of-two model are provided to show the efficiency and accuracy of the proposed estimator.

arXiv.org

Probability distributions and calculations for Hake's ratio statistics in measuring effect size arxiv.org/abs/2411.12938 .data-an .CO

Probability distributions and calculations for Hake's ratio statistics in measuring effect size

Ratio statistics and distributions play a crucial role in various fields, including linear regression, metrology, nuclear physics, operations research, econometrics, biostatistics, genetics, and engineering. In this work, we examine the statistical properties and probability calculations of the Hake normalized gain as a measure of effect size and educational effectiveness in physics education. Leveraging existing knowledge about the Hake ratio as a ratio of normal variables and utilizing open data science tools, we developed two novel computational approaches for computing ratio distributions. Our pilot numerical study demonstrates the speed, accuracy, and reliability of calculating ratio distributions through (1) DE quadrature with/without barycentric interpolation, a very quick and efficient quadrature method, and (2) a 2D vectorized numerical inversion of characteristic functions, which offers broader applicability by not requiring knowledge of PDFs or the independence of ratio constituents. These numerical explorations not only deepen the understanding of the Hake ratio's distribution but also showcase the efficiency, precision, and versatility of our proposed methods, making them highly suitable for fast data analysis based on exact probability ratio distributions. This capability has potential applications in multidimensional statistics and uncertainty analysis in metrology, where precise and reliable data handling is essential.

arXiv.org

From Estimands to Robust Inference of Treatment Effects in Platform Trials arxiv.org/abs/2411.12944 .ME

From Estimands to Robust Inference of Treatment Effects in Platform Trials

A platform trial is an innovative clinical trial design that uses a master protocol (i.e., one overarching protocol) to evaluate multiple treatments in an ongoing manner and can accelerate the evaluation of new treatments. However, the flexibility that marks the potential of platform trials also creates inferential challenges. Two key challenges are the precise definition of treatment effects and the robust and efficient inference on these effects. To address these challenges, we first define a clinically meaningful estimand that characterizes the treatment effect as a function of the expected outcomes under two given treatments among concurrently eligible patients. Then, we develop weighting and post-stratification methods for estimation of treatment effects with minimal assumptions. To fully leverage the efficiency potential of data from concurrently eligible patients, we also consider a model-assisted approach for baseline covariate adjustment to gain efficiency while maintaining robustness against model misspecification. We derive and compare asymptotic distributions of proposed estimators in theory and propose robust variance estimators. The proposed estimators are empirically evaluated in a simulation study and illustrated using the SIMPLIFY trial. Our methods are implemented in the R package RobinCID.

arXiv.org

On adaptivity and minimax optimality of two-sided nearest neighbors arxiv.org/abs/2411.12965 .ML .ST .ME .TH .LG

On adaptivity and minimax optimality of two-sided nearest neighbors

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of missingness. In particular, we consider matrix completion settings where the entries of the underlying matrix follow a latent non-linear factor model, with the non-linearity belonging to a \Holder function class that is less smooth than Lipschitz. Our results establish following favorable properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN adapts to the smoothness of the non-linearity, (2) under certain regularity conditions, the NN error rate matches the rate obtained by an oracle equipped with the knowledge of both the row and column latent factors, and finally (3) NN's MSE is non-trivial for a wide range of settings even when several matrix entries might be missing deterministically. We support our theoretical findings via extensive numerical simulations and a case study with data from a mobile health study, HeartSteps.

arXiv.org

The occlusion process: improving sampler performance with parallel computation and variational approximation arxiv.org/abs/2411.11983 .CO

US COVID-19 school closure was not cost-effective, but other measures were arxiv.org/abs/2411.12016 .AP

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.