Show newer

Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks arxiv.org/abs/2504.06470 .ML .LG

LassoRNet: Accurate dim-light melatonin onset time prediction from multiple blood tissue samples arxiv.org/abs/2504.06494 .AP .CO

LassoRNet: Accurate dim-light melatonin onset time prediction from multiple blood tissue samples

Research on chemotherapy, heart surgery, and vaccines has indicated that the risks and benefits of a treatment could vary depending on the time of day it is administered. A challenge with performing studies on timing treatment administration is that the optimal treatment time is different for each patient, as it would be based on a patient's internal clock time (ICT) rather than the 24-hour day-night cycle time. Prediction methods have been developed to determine a patient's ICT based on biomarker measurements, which can be leveraged to personalize treatment time. However, these methods face two limitations. First, these methods are designed to output predictions given biomarker measurements from a single tissue sample, when multiple tissue samples can be collected over time. Second, these methods are based on linear modelling frameworks, which would not capture the potentially complex relationships between biomarkers and a patient's ICT. To address these two limitations, this paper introduces a recurrent neural network framework, which we refer to as LassoRNet, for predicting the ICT at which a patient's biomarkers are measured as well as the underlying offset between a patient's ICT and the 24-hour day-night cycle time, or that patient's dim-light melatonin onset (DLMO) time. A novel feature of LassoRNet is a proposed variable selection scheme that minimizes the number of biomarkers needed to predict ICT. We evaluate LassoRNet on three longitudinal circadian transcriptome study data sets where DLMO time was determined for each study participant, and find that it consistently outperforms state-of-the art in both ICT and DLMO time prediction. Notably, LassoRNet obtains a median absolute error of approximately one hour in ICT prediction and 30 to 40 minutes in DLMO time prediction, where DLMO time prediction is performed using three samples collected at sequential time points.

arXiv.org

Microbial correlation: a semi-parametric model for investigating microbial co-metabolism arxiv.org/abs/2504.05450 .ME .AP

Microbial correlation: a semi-parametric model for investigating microbial co-metabolism

The gut microbiome plays a crucial role in human health, yet the mechanisms underlying host-microbiome interactions remain unclear, limiting its translational potential. Recent microbiome multiomics studies, particularly paired microbiome-metabolome studies (PM2S), provide valuable insights into gut metabolism as a key mediator of these interactions. Our preliminary data reveal strong correlations among certain gut metabolites, suggesting shared metabolic pathways and microbial co-metabolism. However, these findings are confounded by various factors, underscoring the need for a more rigorous statistical approach. Thus, we introduce microbial correlation, a novel metric that quantifies how two metabolites are co-regulated by the same gut microbes while accounting for confounders. Statistically, it is based on a partially linear model that isolates microbial-driven associations, and a consistent estimator is established based on semi-parametric theory. To improve efficiency, we develop a calibrated estimator with a parametric rate, maximizing the use of large external metagenomic datasets without paired metabolomic profiles. This calibrated estimator also enables efficient p-value calculation for identifying significant microbial co-metabolism signals. Through extensive numerical analysis, our method identifies important microbial co-metabolism patterns for healthy individuals, serving as a benchmark for future studies in diseased populations.

arXiv.org

Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study arxiv.org/abs/2504.05489 .ME .AP

Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study

High-dimensional vector autoregressive (VAR) models offer a versatile framework for multivariate time series analysis, yet face critical challenges from over-parameterization and uncertain lag order. In this paper, we systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and normal) and two frequentist regularization approaches (ridge and nonparametric shrinkage) under three carefully crafted simulation scenarios. These scenarios encompass (i) overfitting in a low-dimensional setting, (ii) sparse high-dimensional processes, and (iii) a combined scenario where both large dimension and overfitting complicate inference. We evaluate each method in quality of parameter estimation (root mean squared error, coverage, and interval length) and out-of-sample forecasting (one-step-ahead forecast RMSE). Our findings show that local-global Bayesian methods, particularly the horseshoe, dominate in maintaining accurate coverage and minimizing parameter error, even when the model is heavily over-parameterized. Frequentist ridge often yields competitive point forecasts but underestimates uncertainty, leading to sub-nominal coverage. A real-data application using macroeconomic variables from Canada illustrates how these methods perform in practice, reinforcing the advantages of local-global priors in stabilizing inference when dimension or lag order is inflated.

arXiv.org

Adaptive Design for Contour Estimation from Computer Experiments with Quantitative and Qualitative Inputs arxiv.org/abs/2504.05498 .ME

Adaptive Design for Contour Estimation from Computer Experiments with Quantitative and Qualitative Inputs

Computer experiments with quantitative and qualitative inputs are widely used to study many scientific and engineering processes. Much of the existing work has focused on design and modeling or process optimization for such experiments. This paper proposes an adaptive design approach for estimating a contour from computer experiments with quantitative and qualitative inputs. A new criterion is introduced to search for the follow-up inputs. The key features of the proposed criterion are (a) the criterion yields adaptive search regions; and (b) it is region-based cooperative in that for each stage of the sequential procedure, the candidate points in the design space is divided into two disjoint groups using confidence bounds, and within each group, an acquisition function is used to select a candidate point. Among the two selected points, a point that is closer to the contour level with the higher uncertainty or that has higher uncertainty when the distance between its prediction and the contour level is within a threshold is chosen. The proposed approach provides empirically more accurate contour estimation than existing approaches as illustrated in numerical examples and a real application. Theoretical justification of the proposed adaptive search region is given.

arXiv.org

Bayesian Modal Regression for Forecast Combinations arxiv.org/abs/2504.03859 .ME

Bayesian Modal Regression for Forecast Combinations

Forecast combination methods have traditionally emphasized symmetric loss functions, particularly squared error loss, with equally weighted combinations often justified as a robust approach under such criteria. However, these justifications do not extend to asymmetric loss functions, where optimally weighted combinations may provide superior predictive performance. This study introduces a novel contribution by incorporating modal regression into forecast combinations, offering a Bayesian hierarchical framework that models the conditional mode of the response through combinations of time-varying parameters and exponential discounting. The proposed approach utilizes error distributions characterized by asymmetry and heavy tails, specifically the asymmetric Laplace, asymmetric normal, and reverse Gumbel distributions. Simulated data validate the parameter estimation for the modal regression models, confirming the robustness of the proposed methodology. Application of these methodologies to a real-world analyst forecast dataset shows that modal regression with asymmetric Laplace errors outperforms mean regression based on two key performance metrics: the hit rate, which measures the accuracy of classifying the sign of revenue surprises, and the win rate, which assesses the proportion of forecasts surpassing the equally weighted consensus. These results underscore the presence of skewness and fat-tailed behavior in forecast combination errors for revenue forecasting, highlighting the advantages of modal regression in financial applications.

arXiv.org

MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis arxiv.org/abs/2504.03897 .ME .AT .CO

MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis

Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce the number of noisy holes, however, these robust methods have a tendency to also reduce the topological signal. This work introduces Maximal TDA (MaxTDA), a statistical framework addressing a limitation in TDA wherein robust inference techniques systematically underestimate the persistence of significant homological features. MaxTDA combines kernel density estimation with level-set thresholding via rejection sampling to generate consistent estimators for the maximal persistence features that minimizes bias while maintaining robustness to noise and outliers. We establish the consistency of the sampling procedure and the stability of the maximal persistence estimator. The framework also enables statistical inference on topological features through rejection bands, constructed from quantiles that bound the estimator's deviation probability. MaxTDA is particularly valuable in applications where precise quantification of statistically significant topological features is essential for revealing underlying structural properties in complex datasets. Numerical simulations across varied datasets, including an example from exoplanet astronomy, highlight the effectiveness of MaxTDA in recovering true topological signals.

arXiv.org

Confirmatory Biomarker Identification via Derandomized Knockoffs for Cox Regression with k-FWER Control arxiv.org/abs/2504.03907 .ME .AP

Confirmatory Biomarker Identification via Derandomized Knockoffs for Cox Regression with k-FWER Control

Selecting important features in high-dimensional survival analysis is critical for identifying confirmatory biomarkers while maintaining rigorous error control. In this paper, we propose a derandomized knockoffs procedure for Cox regression that enhances stability in feature selection while maintaining rigorous control over the k-familywise error rate (k-FWER). By aggregating across multiple randomized knockoff realizations, our approach mitigates the instability commonly observed with conventional knockoffs. Through extensive simulations, we demonstrate that our method consistently outperforms standard knockoffs in both selection power and error control. Moreover, we apply our procedure to a clinical dataset on primary biliary cirrhosis (PBC) to identify key prognostic biomarkers associated with patient survival. The results confirm the superior stability of the derandomized knockoffs method, allowing for a more reliable identification of important clinical variables. Additionally, our approach is applicable to datasets containing both continuous and categorical covariates, broadening its utility in real-world biomedical studies. This framework provides a robust and interpretable solution for high-dimensional survival analysis, making it particularly suitable for applications requiring precise and stable variable selection.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.