Show newer

A joint modeling approach to study the association between subject-level longitudinal marker variabilities and repeated outcomes. (arXiv:2309.08000v1 [stat.ME]) arxiv.org/abs/2309.08000

A joint modeling approach to study the association between subject-level longitudinal marker variabilities and repeated outcomes

Women are at increased risk of bone loss during the menopausal transition; in fact, nearly 50\% of women's lifetime bone loss occurs during this time. The longitudinal relationships between estradiol (E2) and follicle-stimulating hormone (FSH), two hormones that change have characteristic changes during the menopausal transition, and bone health outcomes are complex. However, in addition to level and rate of change in E2 and FSH, variability in these hormones across the menopausal transition may be an important predictor of bone health, but this question has yet to be well explored. We introduce a joint model that characterizes individual mean estradiol (E2) trajectories and the individual residual variances and links these variances to bone health trajectories. In our application, we found that higher FSH variability was associated with declines in bone mineral density (BMD) before menopause, but this association was moderated over time after the menopausal transition. Additionally, higher mean E2, but not E2 variability, was associated with slower decreases in during the menopausal transition. We also include a simulation study that shows that naive two-stage methods often fail to propagate uncertainty in the individual-level variance estimates, resulting in estimation bias and invalid interval coverage

arxiv.org

On Prediction Feature Assignment in the Heckman Selection Model. (arXiv:2309.08043v1 [cs.LG]) arxiv.org/abs/2309.08043

On Prediction Feature Assignment in the Heckman Selection Model

Under missing-not-at-random (MNAR) sample selection bias, the performance of a prediction model is often degraded. This paper focuses on one classic instance of MNAR sample selection bias where a subset of samples have non-randomly missing outcomes. The Heckman selection model and its variants have commonly been used to handle this type of sample selection bias. The Heckman model uses two separate equations to model the prediction and selection of samples, where the selection features include all prediction features. When using the Heckman model, the prediction features must be properly chosen from the set of selection features. However, choosing the proper prediction features is a challenging task for the Heckman model. This is especially the case when the number of selection features is large. Existing approaches that use the Heckman model often provide a manually chosen set of prediction features. In this paper, we propose Heckman-FA as a novel data-driven framework for obtaining prediction features for the Heckman model. Heckman-FA first trains an assignment function that determines whether or not a selection feature is assigned as a prediction feature. Using the parameters of the trained function, the framework extracts a suitable set of prediction features based on the goodness-of-fit of the prediction model given the chosen prediction features and the correlation between noise terms of the prediction and selection equations. Experimental results on real-world datasets show that Heckman-FA produces a robust regression model under MNAR sample selection bias.

arxiv.org

Permutation Capacity Region of Adder Multiple-Access Channels. (arXiv:2309.08054v1 [cs.IT]) arxiv.org/abs/2309.08054

Permutation Capacity Region of Adder Multiple-Access Channels

Point-to-point permutation channels are useful models of communication networks and biological storage mechanisms and have received theoretical attention in recent years. Propelled by relevant advances in this area, we analyze the permutation adder multiple-access channel (PAMAC) in this work. In the PAMAC network model, $d$ senders communicate with a single receiver by transmitting $p$-ary codewords through an adder multiple-access channel whose output is subsequently shuffled by a random permutation block. We define a suitable notion of permutation capacity region $\mathcal{C}_\mathsf{perm}$ for this model, and establish that $\mathcal{C}_\mathsf{perm}$ is the simplex consisting of all rate $d$-tuples that sum to $d(p - 1) / 2$ or less. We achieve this sum-rate by encoding messages as i.i.d. samples from categorical distributions with carefully chosen parameters, and we derive an inner bound on $\mathcal{C}_\mathsf{perm}$ by extending the concept of time sharing to the permutation channel setting. Our proof notably illuminates various connections between mixed-radix numerical systems and coding schemes for multiple-access channels. Furthermore, we derive an alternative inner bound on $\mathcal{C}_\mathsf{perm}$ for the binary PAMAC by analyzing the root stability of the probability generating function of the adder's output distribution. Using eigenvalue perturbation results, we obtain error bounds on the spectrum of the probability generating function's companion matrix, providing quantitative estimates of decoding performance. Finally, we obtain a converse bound on $\mathcal{C}_\mathsf{perm}$ matching our achievability result.

arxiv.org

Approximate co-sufficient sampling with regularization. (arXiv:2309.08063v1 [stat.ME]) arxiv.org/abs/2309.08063

Approximate co-sufficient sampling with regularization

In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).

arxiv.org

The Road to the Ideal Stent: A Review of Coronary Stent Design Optimisation Methods, Findings, and Opportunities. (arXiv:2309.08092v1 [physics.med-ph]) arxiv.org/abs/2309.08092

The Road to the Ideal Stent: A Review of Coronary Stent Design Optimisation Methods, Findings, and Opportunities

Coronary stent designs have undergone significant transformations in geometry, materials, and drug elution coatings, contributing to the continuous improvement of stenting success over recent decades. However, the increasing use of percutaneous coronary intervention techniques on complex coronary artery disease anatomy continues to be a challenge and pushes the boundary to improve stent designs. Design optimisation techniques especially are a unique set of tools used to assess and balance competing design objectives, thus unlocking the capacity to maximise the performance of stents. This review provides a brief history of metallic and bioresorbable stent design evolution, before exploring the latest developments in performance metrics and design optimisation techniques in detail. This includes insights on different contemporary stent designs, mechanical and haemodynamic performance metrics, shape and topology representation, and optimisation along with the use of surrogates to deal with the underlying computationally expensive nature of the problem. Finally, an exploration of current key gaps and future possibilities is provided that includes hybrid optimisation of clinically relevant metrics, non-geometric variables such as material properties, and the possibility of personalised stenting devices.

arxiv.org

Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee. (arXiv:2309.07157v1 [cs.LG]) arxiv.org/abs/2309.07157

Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee

Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.

arxiv.org

Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG]) arxiv.org/abs/2309.07176

Optimal and Fair Encouragement Policy Evaluation and Learning

In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.

arxiv.org

How do ASA Ethical Guidelines Support U.S. Guidelines for Official Statistics?. (arXiv:2309.07180v1 [stat.OT]) arxiv.org/abs/2309.07180

How do ASA Ethical Guidelines Support U.S. Guidelines for Official Statistics?

In 2022, the American Statistical Association revised its Ethical Guidelines for Statistical Practice. Originally issued in 1982, these Guidelines describe responsibilities of the 'ethical statistical practitioner' to their profession, to their research subjects, as well as to their community of practice. These guidelines are intended as a framework to assist decision-making by statisticians working across academic, research, and government environments. For the first time, the 2022 Guidelines describe the ethical obligations of organizations and institutions that use statistical practice. This paper examines alignment between the ASA Ethical Guidelines and other long-established normative guidelines for US official statistics: the OMB Statistical Policy Directives 1, 2, and 2a NASEM Principles and Practices, and the OMB Data Ethics Tenets. Our analyses ask how the recently updated ASA Ethical Guidelines can support these guidelines for federal statistics and data science. The analysis uses a form of qualitative content analysis, the alignment model, to identify patterns of alignment, and potential for tensions, within and across guidelines. The paper concludes with recommendations to policy makers when using ethical guidance to establish parameters for policy change and the administrative and technical controls that necessarily follow.

arxiv.org

All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph]) arxiv.org/abs/2309.07250

All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ansätze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.

arxiv.org

Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME]) arxiv.org/abs/2309.07261

Simultaneous inference for generalized linear models with unmeasured confounders

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.

arxiv.org

Real Effect or Bias? Best Practices for Evaluating the Robustness of Real-World Evidence through Quantitative Sensitivity Analysis for Unmeasured Confounding. (arXiv:2309.07273v1 [stat.ME]) arxiv.org/abs/2309.07273

Real Effect or Bias? Best Practices for Evaluating the Robustness of Real-World Evidence through Quantitative Sensitivity Analysis for Unmeasured Confounding

The assumption of no unmeasured confounders is a critical but unverifiable assumption required for causal inference yet quantitative sensitivity analyses to assess robustness of real-world evidence remains underutilized. The lack of use is likely in part due to complexity of implementation and often specific and restrictive data requirements required for application of each method. With the advent of sensitivity analyses methods that are broadly applicable in that they do not require identification of a specific unmeasured confounder, along with publicly available code for implementation, roadblocks toward broader use are decreasing. To spur greater application, here we present a best practice guidance to address the potential for unmeasured confounding at both the design and analysis stages, including a set of framing questions and an analytic toolbox for researchers. The questions at the design stage guide the research through steps evaluating the potential robustness of the design while encouraging gathering of additional data to reduce uncertainty due to potential confounding. At the analysis stage, the questions guide researchers to quantifying the robustness of the observed result and providing researchers with a clearer indication of the robustness of their conclusions. We demonstrate the application of the guidance using simulated data based on a real-world fibromyalgia study, applying multiple methods from our analytic toolbox for illustration purposes.

arxiv.org

Spatiotemporal modelling of PM$_{2.5}$ concentrations in Lombardy (Italy) -- A comparative study. (arXiv:2309.07285v1 [stat.AP]) arxiv.org/abs/2309.07285

Spatiotemporal modelling of PM$_{2.5}$ concentrations in Lombardy (Italy) -- A comparative study

This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM$_{2.5}$ concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM$_{2.5}$ concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.