Nonparametric Estimation of Conditional Survival Function with Time-Varying Covariates Using DeepONet arxiv.org/abs/2505.22748 .ME

Nonparametric Estimation of Conditional Survival Function with Time-Varying Covariates Using DeepONet

Traditional survival models often rely on restrictive assumptions such as proportional hazards or instantaneous effects of time-varying covariates on the hazard function, which limit their applicability in real-world settings. We consider the nonparametric estimation of the conditional survival function, which leverages the flexibility of neural networks to capture the complex, potentially long-term non-instantaneous effects of time-varying covariates. In this work, we use Deep Operator Networks (DeepONet), a deep learning architecture designed for operator learning, to model the arbitrary effects of both time-varying and time-invariant covariates. Specifically, our method relaxes commonly used assumptions in hazard regressions by modeling the conditional hazard function as an unknown nonlinear operator of entire histories of time-varying covariates. The estimation is based on a loss function constructed from the nonparametric full likelihood for censored survival data. Simulation studies demonstrate that our method performs well, whereas the Cox model yields biased results when the assumption of instantaneous time-varying covariate effects is violated. We further illustrate its utility with the ADNI data, for which it yields a lower integrated Brier score than the Cox model.

arXiv.org

A Network-Guided Penalized Regression with Application to Proteomics Data arxiv.org/abs/2505.22986 -bio.QM .ME .AP

A Network-Guided Penalized Regression with Application to Proteomics Data

Network theory has proven invaluable in unraveling complex protein interactions. Previous studies have employed statistical methods rooted in network theory, including the Gaussian graphical model, to infer networks among proteins, identifying hub proteins based on key structural properties of networks such as degree centrality. However, there has been limited research examining a prognostic role of hub proteins on outcomes, while adjusting for clinical covariates in the context of high-dimensional data. To address this gap, we propose a network-guided penalized regression method. First, we construct a network using the Gaussian graphical model to identify hub proteins. Next, we preserve these identified hub proteins along with clinically relevant factors, while applying adaptive Lasso to non-hub proteins for variable selection. Our network-guided estimators are shown to have variable selection consistency and asymptotic normality. Simulation results suggest that our method produces better results compared to existing methods and demonstrates promise for advancing biomarker identification in proteomics research. Lastly, we apply our method to the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data and identified hub proteins that may serve as prognostic biomarkers for various diseases, including rare genetic disorders and immune checkpoint for cancer immunotherapy.

arXiv.org

Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features arxiv.org/abs/2505.22997 .ML .LG

Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features

Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning of each feature's marginal distribution from the modeling of their joint dependence structure via neural network-parameterized copulas. For each class, lightweight neural networks are used to flexibly and adaptively capture feature interactions, making DCC particularly effective when classification is driven by complex dependencies. We establish that DCC converges to the Bayes-optimal classifier under standard conditions and provide explicit convergence rates of O(n^{-r/(2r + d)}) for r-smooth copula densities. Beyond theoretical guarantees, we outline several practical extensions, including high-dimensional scalability through vine and factor copula architectures, semi-supervised learning via entropy regularization, and online adaptation using streaming gradient methods. By unifying statistical rigor with the representational power of neural networks, DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.

arXiv.org

Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence arxiv.org/abs/2505.23046 .ME .NA .ST .ML .TH .NA

Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence

Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.

arXiv.org

Non-Gaussian Simultaneous Autoregressive Models with Missing Data arxiv.org/abs/2505.23070 .ME

Non-Gaussian Simultaneous Autoregressive Models with Missing Data

Standard simultaneous autoregressive (SAR) models are usually assumed to have normally distributed errors, an assumption that is often violated in real-world datasets, which are frequently found to exhibit non-normal, skewed, and heavy-tailed characteristics. New SAR models are proposed to capture these non-Gaussian features. In this project, the spatial error model (SEM), a widely used SAR-type model, is considered. Three novel SEMs are introduced that extend the standard Gaussian SEM by incorporating Student's $t$-distributed errors after a one-to-one transformation is applied to the response variable. Variational Bayes (VB) estimation methods are developed for these models, and the framework is further extended to handle missing response data. Standard variational Bayes (VB) methods perform well with complete datasets; however, handling missing data requires a Hybrid VB (HVB) approach, which integrates a Markov chain Monte Carlo (MCMC) sampler to generate missing values. The proposed VB methods are evaluated using both simulated and real-world datasets, demonstrating their robustness and effectiveness in dealing with non-normal data and missing data in spatial models. Although the method is demonstrated using SAR models, the proposed model specifications and estimation approaches are widely applicable to various types of models for handling non-Gaussian data with missing values.

arXiv.org

Valid F-screening in linear regression arxiv.org/abs/2505.23113 .ME .ST .AP .TH

Valid F-screening in linear regression

Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, $H_0^{1:p}: β_1= β_2 = \ldots = β_p=0$, is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an $F$-statistic), is in fact common practice across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We show empirically that our selective procedure is preferable to an alternative approach that relies on sample splitting, and we demonstrate its performance via re-analysis of two datasets from the biomedical literature.

arXiv.org

Learning Probabilities of Causation from Finite Population Data arxiv.org/abs/2505.17133 .ML .AI .LG

Learning Probabilities of Causation from Finite Population Data

Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with \textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities of causation: the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unavailable or impractical to obtain with limited population-level data. Therefore, for most subgroups, the amount of data they have is not enough to guarantee the accuracy of their probabilities. Hence, to estimate these probabilities for subpopulations with \textbf{insufficient} data, we propose using machine learning models that draw insights from subpopulations with sufficient data. Our evaluation of multiple machine learning models indicates that, given the population-level data and an appropriate choice of machine learning model and activation function, PNS can be effectively predicted. Through simulation studies on multiple Structured Causal Models (SCMs), we show that our multilayer perceptron (MLP) model with the Mish activation function achieves a mean absolute error (MAE) of approximately $0.02$ in predicting PNS for $32,768$ subpopulations across most SCMs using data from only $2,000$ subpopulations with known PNS values.

arXiv.org

Transfer Faster, Price Smarter: Minimax Dynamic Pricing under Cross-Market Preference Shift arxiv.org/abs/2505.17203 .ME .AP .LG

Transfer Faster, Price Smarter: Minimax Dynamic Pricing under Cross-Market Preference Shift

We study contextual dynamic pricing when a target market can leverage K auxiliary markets -- offline logs or concurrent streams -- whose mean utilities differ by a structured preference shift. We propose Cross-Market Transfer Dynamic Pricing (CM-TDP), the first algorithm that provably handles such model-shift transfer and delivers minimax-optimal regret for both linear and non-parametric utility models. For linear utilities of dimension d, where the difference between source- and target-task coefficients is $s_{0}$-sparse, CM-TDP attains regret $\tilde{O}((d*K^{-1}+s_{0})\log T)$. For nonlinear demand residing in a reproducing kernel Hilbert space with effective dimension $α$, complexity $β$ and task-similarity parameter $H$, the regret becomes $\tilde{O}\!(K^{-2αβ/(2αβ+1)}T^{1/(2αβ+1)} + H^{2/(2α+1)}T^{1/(2α+1)})$, matching information-theoretic lower bounds up to logarithmic factors. The RKHS bound is the first of its kind for transfer pricing and is of independent interest. Extensive simulations show up to 50% lower cumulative regret and 5 times faster learning relative to single-market pricing baselines. By bridging transfer learning, robust aggregation, and revenue optimization, CM-TDP moves toward pricing systems that transfer faster, price smarter.

arXiv.org

On Fisher Consistency of Surrogate Losses for Optimal Dynamic Treatment Regimes with Multiple Categorical Treatments per Stage arxiv.org/abs/2505.17285 .ST .TH

On Fisher Consistency of Surrogate Losses for Optimal Dynamic Treatment Regimes with Multiple Categorical Treatments per Stage

Patients with chronic diseases often receive treatments at multiple time points, or stages. Our goal is to learn the optimal dynamic treatment regime (DTR) from longitudinal patient data. When both the number of stages and the number of treatment levels per stage are arbitrary, estimating the optimal DTR reduces to a sequential, weighted, multiclass classification problem (Kosorok and Laber, 2019). In this paper, we aim to solve this classification problem simultaneously across all stages using Fisher consistent surrogate losses. Although computationally feasible Fisher consistent surrogates exist in special cases, e.g., the binary treatment setting, a unified theory of Fisher consistency remains largely unexplored. We establish necessary and sufficient conditions for DTR Fisher consistency within the class of non-negative, stagewise separable surrogate losses. To our knowledge, this is the first result in the DTR literature to provide necessary conditions for Fisher consistency within a non-trivial surrogate class. Furthermore, we show that many convex surrogate losses fail to be Fisher consistent for the DTR classification problem, and we formally establish this inconsistency for smooth, permutation equivariant, and relative-margin-based convex losses. Building on this, we propose SDSS (Simultaneous Direct Search with Surrogates), which uses smooth, non-concave surrogate losses to learn the optimal DTR. We develop a computationally efficient, gradient-based algorithm for SDSS. When the optimization error is small, we establish a sharp upper bound on SDSS's regret decay rate. We evaluate the numerical performance of SDSS through simulations and demonstrate its real-world applicability by estimating optimal fluid resuscitation strategies for severe septic patients using electronic health record data.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.