Show newer

A non-parametric proportional risk model to assess a treatment effect in time-to-event data. (arXiv:2303.07479v1 [stat.ME]) arxiv.org/abs/2303.07479

A non-parametric proportional risk model to assess a treatment effect in time-to-event data

Time-to-event analysis often relies on prior parametric assumptions, or, if a non-parametric approach is chosen, Cox's model. This is inherently tied to the assumption of proportional hazards, with the analysis potentially invalidated if this assumption is not fulfilled. In addition, most interpretations focus on the hazard ratio, that is often misinterpreted as the relative risk. In this paper, we introduce an alternative to current methodology for assessing a treatment effect in a two-group situation, not relying on the proportional hazards assumption but assuming proportional risks. Precisely, we propose a new non-parametric model to directly estimate the relative risk of two groups to experience an event under the assumption that the risk ratio is constant over time. In addition to this relative measure, our model allows for calculating the number needed to treat as an absolute measure, providing the possibility of an easy and holistic interpretation of the data. We demonstrate the validity of the approach by means of a simulation study and present an application to data from a large randomized controlled trial investigating the effect of dapagliflozin on the risk of first hospitalization for heart failure.

arxiv.org

Comparing the Robustness of Simple Network Scale-Up Method (NSUM) Estimators. (arXiv:2303.07490v1 [stat.ME]) arxiv.org/abs/2303.07490

Comparing the Robustness of Simple Network Scale-Up Method (NSUM) Estimators

The network scale-up method (NSUM) is a cost-effective approach to estimating the size or prevalence of a group of people that is hard to reach through a standard survey. The basic NSUM involves two steps: estimating respondents' degrees by one of various methods (in this paper we focus on the probe group method which uses the number of people a respondent knows in various groups of known size), and estimating the prevalence of the hard-to-reach population of interest using respondents' estimated degrees and the number of people they report knowing in the hard-to-reach group. Each of these two steps involves taking either an average of ratios or a ratio of averages. Using the ratio of averages for each step has so far been the most common approach. However, we present theoretical arguments that using the average of ratios at the second, prevalence-estimation step often has lower mean squared error when a main model assumption is violated, which happens frequently in practice; this estimator which uses the ratio of averages for degree estimates and the average of ratios for prevalence was proposed early in NSUM development but has largely been unexplored and unused. Simulation results using an example network data set also support these findings. Based on this theoretical and empirical evidence, we suggest that future surveys that use a simple estimator may want to use this mixed estimator, and estimation methods based on this estimator may produce new improvements.

arxiv.org

High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection. (arXiv:2303.07570v1 [stat.ME]) arxiv.org/abs/2303.07570

High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection

We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in $\mathbb R^d$ that encodes products and consumer information. To achieve optimal revenue (i.e., least regret), the firm needs to learn and exploit the unknown GLMs while monitoring for potential change-points. To tackle such a problem, we first design a novel penalized likelihood-based online change-point detection algorithm for high-dimensional GLMs, which is the first algorithm in the change-point literature that achieves optimal minimax localization error rate for high-dimensional GLMs. A change-point detection assisted dynamic pricing (CPDP) policy is further proposed and achieves a near-optimal regret of order $O(s\sqrt{Υ_T T}\log(Td))$, where $s$ is the sparsity level and $Υ_T$ is the number of change-points. This regret is accompanied with a minimax lower bound, demonstrating the optimality of CPDP (up to logarithmic factors). In particular, the optimality with respect to $Υ_T$ is seen for the first time in the dynamic pricing literature, and is achieved via a novel accelerated exploration mechanism. Extensive simulation experiments and a real data application on online lending illustrate the efficiency of the proposed policy and the importance and practical value of handling non-stationarity in dynamic pricing.

arxiv.org

Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers. (arXiv:2303.07597v1 [cs.LG]) arxiv.org/abs/2303.07597

Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers

Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one of the domains will have class labels such as unsupervised domain adaptation. In this kind of problem setting, a group-sparse regularizer is frequently leveraged as a regularization term to handle class labels. In particular, it can preserve the label structure on the data samples by corresponding the data samples with the same class label to one group-sparse regularization term. As a result, we can measure the distance while utilizing label information by solving the regularized optimization problem with gradient-based algorithms. However, the gradient computation is expensive when the number of classes or data samples is large because the number of regularization terms and their respective sizes also turn out to be large. This paper proposes fast discrete OT with group-sparse regularizers. Our method is based on two ideas. The first is to safely skip the computations of the gradients that must be zero. The second is to efficiently extract the gradients that are expected to be nonzero. Our method is guaranteed to return the same value of the objective function as that of the original method. Experiments show that our method is up to 8.6 times faster than the original method without degrading accuracy.

arxiv.org

DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference. (arXiv:2303.06171v1 [cs.LG]) arxiv.org/abs/2303.06171

DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference

Bayesian inference provides a principled framework for learning from complex data and reasoning under uncertainty. It has been widely applied in machine learning tasks such as medical diagnosis, drug design, and policymaking. In these common applications, the data can be highly sensitive. Differential privacy (DP) offers data analysis tools with powerful worst-case privacy guarantees and has been developed as the leading approach in privacy-preserving data analysis. In this paper, we study Metropolis-Hastings (MH), one of the most fundamental MCMC methods, for large-scale Bayesian inference under differential privacy. While most existing private MCMC algorithms sacrifice accuracy and efficiency to obtain privacy, we provide the first exact and fast DP MH algorithm, using only a minibatch of data in most iterations. We further reveal, for the first time, a three-way trade-off among privacy, scalability (i.e. the batch size), and efficiency (i.e. the convergence rate), theoretically characterizing how privacy affects the utility and computational cost in Bayesian inference. We empirically demonstrate the effectiveness and efficiency of our algorithm in various experiments.

arxiv.org

Joint Optimization of Maintenance and Production in Offshore Wind Farms: Balancing the Short- and Long-Term Needs of Wind Energy Operation. (arXiv:2303.06174v1 [eess.SY]) arxiv.org/abs/2303.06174

Joint Optimization of Maintenance and Production in Offshore Wind Farms: Balancing the Short- and Long-Term Needs of Wind Energy Operation

The rapid increase in scale and sophistication of offshore wind (OSW) farms poses a critical challenge related to the cost-effective operation and management of wind energy assets. A defining characteristic of this challenge is the economic trade-off between two concomitant processes: power production (the primary driver of short-term revenues), and asset degradation (the main determinant of long-term expenses). Traditionally, approaches to optimize production and maintenance in wind farms have been conducted in isolation. In this paper, we conjecture that a joint optimization of those two processes, achieved by rigorously modeling their short- and long-term dependencies, can unlock significant economic benefits for wind farm operators. In specific, we propose a decision-theoretic framework, rooted in stochastic optimization, which seeks a sensible balance of how wind loads are leveraged to harness short-term electricity generation revenues, versus alleviated to hedge against longer-term maintenance expenses. Extensive numerical experiments using real-world data confirm the superior performance of our approach, in terms of several operational performance metrics, relative to methods that tackle the two problems in isolation.

arxiv.org

The impacts of remote work on travel: insights from nearly three years of monthly surveys. (arXiv:2303.06186v1 [stat.AP]) arxiv.org/abs/2303.06186

The impacts of remote work on travel: insights from nearly three years of monthly surveys

Remote work has expanded dramatically since 2020, upending longstanding travel patterns and behavior. More fundamentally, the flexibility for remote workers to choose when and where to work has created much stronger connections between travel behavior and organizational behavior. This paper uses a large and comprehensive monthly longitudinal survey over nearly three years to identify new trends in work location choice, mode choice and departure time of remote workers. The travel behavior of remote workers is found to be highly associated with employer characteristics, task characteristics, employer remote work policies, coordination between colleagues and attitudes towards remote work. Approximately one third of all remote work hours are shown to take place outside of the home, accounting for over one third of all commuting trips. These commutes to "third places" are shorter, less likely to occur during peak periods, and more likely to use sustainable travel modes than commutes to an employer's primary workplace. Hybrid work arrangements are also associated with a greater number of non-work trips than fully remote and fully in-person arrangements. Implications of this research for policy makers, shared mobility provides and land use planning are discussed.

arxiv.org

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA. (arXiv:2303.06198v1 [math.ST]) arxiv.org/abs/2303.06198

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

This paper is concerned with estimating the column subspace of a low-rank matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of $\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called $\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both $\ell_2$ and $\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

arxiv.org

Fast computation of permutation equivariant layers with the partition algebra. (arXiv:2303.06208v1 [cs.LG]) arxiv.org/abs/2303.06208

Fast computation of permutation equivariant layers with the partition algebra

Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant subspace of a certain symmetric group representation, and recent work parameterized this space by exhibiting a basis whose vectors are sums over orbits of standard basis elements with respect to the symmetric group action. A parameterization opens up the possibility of learning the weights of permutation equivariant linear layers via gradient descent. The space of permutation equivariant linear layers is a generalization of the partition algebra, an object first discovered in statistical physics with deep connections to the representation theory of the symmetric group, and the basis described above generalizes the so-called orbit basis of the partition algebra. We exhibit an alternative basis, generalizing the diagram basis of the partition algebra, with computational benefits stemming from the fact that the tensors making up the basis are low rank in the sense that they naturally factorize into Kronecker products. Just as multiplication by a rank one matrix is far less expensive than multiplication by an arbitrary matrix, multiplication with these low rank tensors is far less expensive than multiplication with elements of the orbit basis. Finally, we describe an algorithm implementing multiplication with these basis elements.

arxiv.org

Policy effect evaluation under counterfactual neighborhood interventions in the presence of spillover. (arXiv:2303.06227v1 [stat.ME]) arxiv.org/abs/2303.06227

Policy effect evaluation under counterfactual neighborhood interventions in the presence of spillover

Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by asking what policy consequences would the treated units have experienced under hypothetical exposure settings. When we only observe treated unit(s) surrounded by controls -- as is common when a policy intervention is implemented in a single city or state -- this effect inquires about the policy effects under a counterfactual neighborhood policy status that we do not, in actuality, observe. In this work, we extend difference-in-differences (DiD) approaches to spillover settings and develop identification conditions required to evaluate policy effects in counterfactual treatment scenarios. These causal quantities are policy-relevant for designing effective policies for populations subject to various neighborhood statuses. We develop doubly robust estimators and use extensive numerical experiments to examine their performance under heterogeneous spillover effects. We apply our proposed method to investigate the effect of the Philadelphia beverage tax on unit sales.

arxiv.org

Stabilizing Transformer Training by Preventing Attention Entropy Collapse. (arXiv:2303.06296v1 [cs.LG]) arxiv.org/abs/2303.06296

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low attention entropy is accompanied by high training instability, which can take the form of oscillating loss or divergence. We denote the pathologically low attention entropy, corresponding to highly concentrated attention scores, as $\textit{entropy collapse}$. As a remedy, we propose $σ$Reparam, a simple and efficient solution where we reparametrize all linear layers with spectral normalization and an additional learned scalar. We demonstrate that the proposed reparameterization successfully prevents entropy collapse in the attention layers, promoting more stable training. Additionally, we prove a tight lower bound of the attention entropy, which decreases exponentially fast with the spectral norm of the attention logits, providing additional motivation for our approach. We conduct experiments with $σ$Reparam on image classification, image self-supervised learning, machine translation, automatic speech recognition, and language modeling tasks, across Transformer architectures. We show that $σ$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer to competitive performance without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.

arxiv.org

A Differential Effect Approach to Partial Identification of Treatment Effects. (arXiv:2303.06332v1 [stat.ME]) arxiv.org/abs/2303.06332

A Differential Effect Approach to Partial Identification of Treatment Effects

We consider identification and inference for the average treatment effect and heterogeneous treatment effect conditional on observable covariates in the presence of unmeasured confounding. Since point identification of average treatment effect and heterogeneous treatment effect is not achievable without strong assumptions, we obtain bounds on both average and heterogeneous treatment effects by leveraging differential effects, a tool that allows for using a second treatment to learn the effect of the first treatment. The differential effect is the effect of using one treatment in lieu of the other, and it could be identified in some observational studies in which treatments are not randomly assigned to units, where differences in outcomes may be due to biased assignments rather than treatment effects. With differential effects, we develop a flexible and easy-to-implement semi-parametric framework to estimate bounds and establish asymptotic properties over the support for conducting statistical inference. We provide conditions under which causal estimands are point identifiable as well in the proposed framework. The proposed method is examined by a simulation study and two case studies using datasets from National Health and Nutrition Examination Survey and Youth Risk Behavior Surveillance System.

arxiv.org

Analysing ecological dynamics with relational event models: the case of biological invasions. (arXiv:2303.06362v1 [stat.AP]) arxiv.org/abs/2303.06362

Analysing ecological dynamics with relational event models: the case of biological invasions

Aim: Spatio-temporal processes play a key role in ecology, from genes to large-scale macroecological and biogeographical processes. Existing methods studying such spatio-temporally structured data either simplify the dynamic structure or the complex interactions of ecological drivers. This paper aims to present a generic method for ecological research that allows analysing spatio-temporal patterns of biological processes at large spatial scales by including the time-varying variables that drive these dynamics. Methods: We introduce a method called relational event modelling (REM), which relies on temporal interaction dynamics, that encode sequences of relational events connecting a sender node to a recipient node at a specific point in time. We apply REM to the spread of alien species around the globe between 1880 and 2005, following accidental or deliberate introductions into geographical regions outside of their native range. In this context, a relational event represents the new occurrence of an alien species given its former distribution. Results: The application of REM to the first reported invasions of 4835 established alien species outside of their native ranges from four major taxonomic groups enables us to unravel the main drivers of the dynamics of the spread of invasive alien species. Combining the alien species first records data with other spatio-temporal information enables us to discover which factors have been responsible for the spread of species across the globe. Besides the usual drivers of species invasions, such as trade, land use and climatic conditions, we also find evidence for species-interconnectedness in alien species spread. Conclusions: REM offer the capacity to account for the temporal sequences of ecological events such as biological invasions and to investigate how relationships between these events and potential drivers change over time.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.