Show newer

Estimand-based Inference in Presence of Long-Term Survivors arxiv.org/abs/2409.02209 .ME

Estimand-based Inference in Presence of Long-Term Survivors

In this article, we develop nonparametric inference methods for comparing survival data across two samples, which are beneficial for clinical trials of novel cancer therapies where long-term survival is a critical outcome. These therapies, including immunotherapies or other advanced treatments, aim to establish durable effects. They often exhibit distinct survival patterns such as crossing or delayed separation and potentially leveling-off at the tails of survival curves, clearly violating the proportional hazards assumption and rendering the hazard ratio inappropriate for measuring treatment effects. The proposed methodology utilizes the mixture cure framework to separately analyze the cure rates of long-term survivors and the survival functions of susceptible individuals. We evaluate a nonparametric estimator for the susceptible survival function in the one-sample setting. Under sufficient follow-up, it is expressed as a location-scale-shift variant of the Kaplan-Meier (KM) estimator. It retains several desirable features of the KM estimator, including inverse-probability-censoring weighting, product-limit estimation, self-consistency, and nonparametric efficiency. In scenarios of insufficient follow-up, it can easily be adapted by incorporating a suitable cure rate estimator. In the two-sample setting, besides using the difference in cure rates to measure the long-term effect, we propose a graphical estimand to compare the relative treatment effects on susceptible subgroups. This process, inspired by Kendall's tau, compares the order of survival times among susceptible individuals. The proposed methods' large-sample properties are derived for further inference, and the finite-sample properties are examined through extensive simulation studies. The proposed methodology is applied to analyze the digitized data from the CheckMate 067 immunotherapy clinical trial.

arxiv.org

Simulation-calibration testing for inference in Lasso regressions arxiv.org/abs/2409.02269 .ME .ST .CO .TH

Simulation-calibration testing for inference in Lasso regressions

We propose a test of the significance of a variable appearing on the Lasso path and use it in a procedure for selecting one of the models of the Lasso path, controlling the Family-Wise Error Rate. Our null hypothesis depends on a set A of already selected variables and states that it contains all the active variables. We focus on the regularization parameter value from which a first variable outside A is selected. As the test statistic, we use this quantity's conditional p-value, which we define conditional on the non-penalized estimated coefficients of the model restricted to A. We estimate this by simulating outcome vectors and then calibrating them on the observed outcome's estimated coefficients. We adapt the calibration heuristically to the case of generalized linear models in which it turns into an iterative stochastic procedure. We prove that the test controls the risk of selecting a false positive in linear models, both under the null hypothesis and, under a correlation condition, when A does not contain all active variables. We assess the performance of our procedure through extensive simulation studies. We also illustrate it in the detection of exposures associated with drug-induced liver injuries in the French pharmacovigilance database.

arxiv.org

Generative Principal Component Regression via Variational Inference arxiv.org/abs/2409.02327 .ML .LG

Generative Principal Component Regression via Variational Inference

The ability to manipulate complex systems, such as the brain, to modify specific outcomes has far-reaching implications, particularly in the treatment of psychiatric disorders. One approach to designing appropriate manipulations is to target key features of predictive models. While generative latent variable models, such as probabilistic principal component analysis (PPCA), is a powerful tool for identifying targets, they struggle incorporating information relevant to low-variance outcomes into the latent space. When stimulation targets are designed on the latent space in such a scenario, the intervention can be suboptimal with minimal efficacy. To address this problem, we develop a novel objective based on supervised variational autoencoders (SVAEs) that enforces such information is represented in the latent space. The novel objective can be used with linear models, such as PPCA, which we refer to as generative principal component regression (gPCR). We show in simulations that gPCR dramatically improves target selection in manipulation as compared to standard PCR and SVAEs. As part of these simulations, we develop a metric for detecting when relevant information is not properly incorporated into the loadings. We then show in two neural datasets related to stress and social behavior in which gPCR dramatically outperforms PCR in predictive performance and that SVAEs exhibit low incorporation of relevant information into the loadings. Overall, this work suggests that our method significantly improves target selection for manipulation using latent variable models over competitor inference schemes.

arxiv.org

Optimal sampling for least-squares approximation arxiv.org/abs/2409.02342 .ML .NA .LG .NA

Optimal sampling for least-squares approximation

Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this paper, we review recent progress on optimal sampling for (weighted) least-squares approximation in arbitrary linear spaces. We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples, then show how it can be used to construct sampling strategies that possess near-optimal sample complexity: namely, the number of samples scales log-linearly in $n$, the dimension of the approximation space. We discuss a series of variations, extensions and further topics, and throughout highlight connections to approximation theory, machine learning, information-based complexity and numerical linear algebra. Finally, motivated by various contemporary applications, we consider a generalization of the classical setting where the samples need not be pointwise samples of a scalar-valued function, and the approximation space need not be linear. We show that even in this significantly more general setting suitable generalizations of the Christoffel function still determine the sample complexity. This provides a unified procedure for designing improved sampling strategies for general recovery problems. This article is largely self-contained, and intended to be accessible to nonspecialists.

arxiv.org

Distribution-Based Sub-Population Selection (DSPS): A Method for in-Silico Reproduction of Clinical Trials Outcomes arxiv.org/abs/2409.00232 .AP

Distribution-Based Sub-Population Selection (DSPS): A Method for in-Silico Reproduction of Clinical Trials Outcomes

Background and Objective: Diabetes presents a significant challenge to healthcare due to the negative impact of poor blood sugar control on health and associated complications. Computer simulation platforms, notably exemplified by the UVA/Padova Type 1 Diabetes simulator, has emerged as a promising tool for advancing diabetes treatments by simulating patient responses in a virtual environment. The UVA Virtual Lab (UVLab) is a new simulation platform to mimic the metabolic behavior of people with Type 2 diabetes (T2D) with a large population of 6062 virtual subjects. Methods: The work introduces the Distribution-Based Population Selection (DSPS) method, a systematic approach to identifying virtual subsets that mimic the clinical behavior observed in real trials. The method transforms the sub-population selection task into a Linear Programing problem, enabling the identification of the largest representative virtual cohort. This selection process centers on key clinical outcomes in diabetes research, such as HbA1c and Fasting plasma Glucose (FPG), ensuring that the statistical properties (moments) of the selected virtual sub-population closely resemble those observed in real-word clinical trial. Results: DSPS method was applied to the insulin degludec (IDeg) arm of a phase 3 clinical trial. This method was used to select a sub-population of virtual subjects that closely mirrored the clinical trial data across multiple key metrics, including glycemic efficacy, insulin dosages, and cumulative hypoglycemia events over a 26-week period. Conclusion: The DSPS algorithm is able to select virtual sub-population within UVLab to reproduce and predict the outcomes of a clinical trial. This statistical method can bridge the gap between large population simulation platforms and previously conducted clinical trials.

arxiv.org

Response probability distribution estimation of expensive computer simulators: A Bayesian active learning perspective using Gaussian process regression arxiv.org/abs/2409.00407 .CO

Response probability distribution estimation of expensive computer simulators: A Bayesian active learning perspective using Gaussian process regression

Estimation of the response probability distributions of computer simulators in the presence of randomness is a crucial task in many fields. However, achieving this task with guaranteed accuracy remains an open computational challenge, especially for expensive-to-evaluate computer simulators. In this work, a Bayesian active learning perspective is presented to address the challenge, which is based on the use of the Gaussian process (GP) regression. First, estimation of the response probability distributions is conceptually interpreted as a Bayesian inference problem, as opposed to frequentist inference. This interpretation provides several important benefits: (1) it quantifies and propagates discretization error probabilistically; (2) it incorporates prior knowledge of the computer simulator, and (3) it enables the effective reduction of numerical uncertainty in the solution to a prescribed level. The conceptual Bayesian idea is then realized by using the GP regression, where we derive the posterior statistics of the response probability distributions in semi-analytical form and also provide a numerical solution scheme. Based on the practical Bayesian approach, a Bayesian active learning (BAL) method is further proposed for estimating the response probability distributions. In this context, the key contribution lies in the development of two crucial components for active learning, i.e., stopping criterion and learning function, by taking advantage of posterior statistics. It is empirically demonstrated by five numerical examples that the proposed BAL method can efficiently estimate the response probability distributions with desired accuracy.

arxiv.org

Bayesian nonparametric mixtures of categorical directed graphs for heterogeneous causal inference arxiv.org/abs/2409.00453 .ME .AP

Bayesian nonparametric mixtures of categorical directed graphs for heterogeneous causal inference

Quantifying causal effects of exposures on outcomes, such as a treatment and a disease respectively, is a crucial issue in medical science for the administration of effective therapies. Importantly, any related causal analysis should account for all those variables, e.g. clinical features, that can act as risk factors involved in the occurrence of a disease. In addition, the selection of targeted strategies for therapy administration requires to quantify such treatment effects at personalized level rather than at population level. We address these issues by proposing a methodology based on categorical Directed Acyclic Graphs (DAGs) which provide an effective tool to infer causal relationships and causal effects between variables. In addition, we account for population heterogeneity by considering a Dirichlet Process mixture of categorical DAGs, which clusters individuals into homogeneous groups characterized by common causal structures, dependence parameters and causal effects. We develop computational strategies for Bayesian posterior inference, from which a battery of causal effects at subject-specific level is recovered. Our methodology is evaluated through simulations and applied to a dataset of breast cancer patients to investigate cardiotoxic side effects that can be induced by the administrated anticancer therapies.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.