Towards Robust Foundation Models for Digital Pathology arxiv.org/abs/2507.17845 -bio.QM .IV .AI .CV .LG

Towards Robust Foundation Models for Digital Pathology

Biomedical Foundation Models (FMs) are rapidly transforming AI-enabled healthcare research and entering clinical validation. However, their susceptibility to learning non-biological technical features -- including variations in surgical/endoscopic techniques, laboratory procedures, and scanner hardware -- poses risks for clinical deployment. We present the first systematic investigation of pathology FM robustness to non-biological features. Our work (i) introduces measures to quantify FM robustness, (ii) demonstrates the consequences of limited robustness, and (iii) proposes a framework for FM robustification to mitigate these issues. Specifically, we developed PathoROB, a robustness benchmark with three novel metrics, including the robustness index, and four datasets covering 28 biological classes from 34 medical centers. Our experiments reveal robustness deficits across all 20 evaluated FMs, and substantial robustness differences between them. We found that non-robust FM representations can cause major diagnostic downstream errors and clinical blunders that prevent safe clinical adoption. Using more robust FMs and post-hoc robustification considerably reduced (but did not yet eliminate) the risk of such errors. This work establishes that robustness evaluation is essential for validating pathology FMs before clinical adoption and demonstrates that future FM development must integrate robustness as a core design principle. PathoROB provides a blueprint for assessing robustness across biomedical domains, guiding FM improvement efforts towards more robust, representative, and clinically deployable AI systems that prioritize biological information over technical artifacts.

arXiv.org

Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging arxiv.org/abs/2507.17869 .IV .CV .LG

Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging

Nitrogen (N) is one of the most crucial nutrients in vineyards, affecting plant growth and subsequent products such as wine and juice. Because soil N has high spatial and temporal variability, it is desirable to accurately estimate the N concentration of grapevine leaves and manage fertilization at the individual plant level to optimally meet plant needs. In this study, we used in-field hyperspectral images with wavelengths ranging from $400 to 1000nm of four different grapevine cultivars collected from distinct vineyards and over two growth stages during two growing seasons to develop models for predicting N concentration at the leaf-level and canopy-level. After image processing, two feature selection methods were employed to identify the optimal set of spectral bands that were responsive to leaf N concentrations. The selected spectral bands were used to train and test two different Machine Learning (ML) models, Gradient Boosting and XGBoost, for predicting nitrogen concentrations. The comparison of selected bands for both leaf-level and canopy-level datasets showed that most of the spectral regions identified by the feature selection methods were across both methods and the dataset types (leaf- and canopy-level datasets), particularly in the key regions, 500-525nm, 650-690nm, 750-800nm, and 900-950nm. These findings indicated the robustness of these spectral regions for predicting nitrogen content. The results for N prediction demonstrated that the ML model achieved an R square of 0.49 for canopy-level data and an R square of 0.57 for leaf-level data, despite using different sets of selected spectral bands for each analysis level. The study demonstrated the potential of using in-field hyperspectral imaging and the use of spectral data in integrated feature selection and ML techniques to monitor N status in vineyards.

arXiv.org

Time and Frequency Synchronization for Multiuser OTFS in Uplink arxiv.org/abs/2507.17966 .SP

Time and Frequency Synchronization for Multiuser OTFS in Uplink

In this paper, we propose time and frequency synchronization techniques for uplink multiuser OTFS (MU-OTFS) systems in high-mobility scenarios. This work focuses on accurately estimating and correcting timing offsets (TOs) and carrier frequency offsets (CFOs). Specifically, TO estimation is essential for locating users' pilots on the delay-time plane, while CFO estimation enhances channel estimation accuracy. First, we propose a TO estimation technique for an existing multiuser pilot structure in MU-OTFS. We replace the impulse pilot (IMP) in this pilot structure with a more practical pilot with a cyclic prefix (PCP), referred to as single-user-inspired PCP (SU-PCP). This structure employs different Zadoff-Chu (ZC) sequences, which enables pilot separation via correlation at the receiver side. Consequently, we introduce a correlation-based TO estimation technique for uplink MU-OTFS using this pilot structure. Next, a spectrally efficient and practical pilot pattern is proposed, where each user transmits a PCP within a shared pilot region on the delay-Doppler plane, referred to as MU-PCP. At the receiver, the second TO estimation technique utilizes a bank of filters to separate different users' signals and accurately estimate their TOs. Then, we derive a mathematical threshold range to enhance TO estimation accuracy by finding the first major peak in the correlation function rather than relying solely on the highest peak. After locating the received users' pilot signals using one of the proposed TO estimation techniques, our proposed CFO estimation technique reduces the multi-dimensional maximum likelihood (ML) search problem into multiple one-dimensional search problems. In this technique, we apply the Chebyshev polynomials of the first kind basis expansion model (CPF-BEM) to effectively handle the time-variations of the channel in obtaining the CFO estimates for all the users.

arXiv.org

Benchmarking of Deep Learning Methods for Generic MRI Multi-OrganAbdominal Segmentation arxiv.org/abs/2507.17971 .IV .CV

Benchmarking of Deep Learning Methods for Generic MRI Multi-OrganAbdominal Segmentation

Recent advances in deep learning have led to robust automated tools for segmentation of abdominal computed tomography (CT). Meanwhile, segmentation of magnetic resonance imaging (MRI) is substantially more challenging due to the inherent signal variability and the increased effort required for annotating training datasets. Hence, existing approaches are trained on limited sets of MRI sequences, which might limit their generalizability. To characterize the landscape of MRI abdominal segmentation tools, we present here a comprehensive benchmarking of the three state-of-the-art and open-source models: MRSegmentator, MRISegmentator-Abdomen, and TotalSegmentator MRI. Since these models are trained using labor-intensive manual annotation cycles, we also introduce and evaluate ABDSynth, a SynthSeg-based model purely trained on widely available CT segmentations (no real images). More generally, we assess accuracy and generalizability by leveraging three public datasets (not seen by any of the evaluated methods during their training), which span all major manufacturers, five MRI sequences, as well as a variety of subject conditions, voxel resolutions, and fields-of-view. Our results reveal that MRSegmentator achieves the best performance and is most generalizable. In contrast, ABDSynth yields slightly less accurate results, but its relaxed requirements in training data make it an alternative when the annotation budget is limited. The evaluation code and datasets are given for future benchmarking at https://github.com/deepakri201/AbdoBench, along with inference code and weights for ABDSynth.

arXiv.org

Optimizing VO2max Prediction in Gamified Cardiac Assessment: Leveraging Effective Feature Selection and Refined Protocols for Robust Models arxiv.org/abs/2507.14138 .SP

Optimizing VO2max Prediction in Gamified Cardiac Assessment: Leveraging Effective Feature Selection and Refined Protocols for Robust Models

VO2max is a critical indicator of cardiopulmonary fitness, reflecting the maximum amount of oxygen the body can utilize during intense exercise. Accurately measuring VO2max is essential for assessing cardiovascular health and predicting outcomes in clinical settings. However, current methods for VO2max estimation, such as Cardiopulmonary Exercise Testing (CPET), require expensive equipment and the supervision of trained personnel, limiting accessibility for large-scale screening. Preliminary efforts have been made to create a more accessible method, such as the Cardiopulmonary Spot Jog Test (CPSJT). Unfortunately, these early attempts yielded high error margins, rendering them unsuitable for widespread use. In our study, we address these shortcomings by refining the CPSJT protocol to improve prediction accuracy. A crucial contribution is improved feature extraction which include gender, body mass index, aerobic duration, and anaerobic duration. This targeted approach helps in streamlining the model to enhance prediction precision while minimizing the risk of overfitting. In a cohort of 44 participants from the Indian population, we assessed the performance of various machine learning models using these features. With Stratified 5-Fold Cross-Validation, the Root Mean Squared Error (RMSE) values were 5.78 for Linear Regression, 5.15 for Random Forest, and 5.17 for Support Vector Regression. All models demonstrated strong test correlations and low RMSE values, underscoring their robust and reliable performance.

arXiv.org

DIVER-0 : A Fully Channel Equivariant EEG Foundation Model arxiv.org/abs/2507.14141 .SP .AI .LG

DIVER-0 : A Fully Channel Equivariant EEG Foundation Model

Electroencephalography (EEG) is a non-invasive technique widely used in brain-computer interfaces and clinical applications, yet existing EEG foundation models face limitations in modeling spatio-temporal brain dynamics and lack channel permutation equivariance, preventing robust generalization across diverse electrode configurations. To address these challenges, we propose DIVER-0, a novel EEG foundation model that demonstrates how full spatio-temporal attention-rather than segregated spatial or temporal processing-achieves superior performance when properly designed with Rotary Position Embedding (RoPE) for temporal relationships and binary attention biases for channel differentiation. We also introduce Sliding Temporal Conditional Positional Encoding (STCPE), which improves upon existing conditional positional encoding approaches by maintaining both temporal translation equivariance and channel permutation equivariance, enabling robust adaptation to arbitrary electrode configurations unseen during pretraining. Experimental results demonstrate that DIVER-0 achieves competitive performance with only 10% of pretraining data while maintaining consistent results across all channel permutation conditions, validating its effectiveness for cross-dataset generalization and establishing key design principles for handling the inherent heterogeneity of neural recording setups.

arXiv.org

Graph Convolutional Neural Networks to Model the Brain for Insomnia arxiv.org/abs/2507.14147 -bio.NC .SP .LG

Graph Convolutional Neural Networks to Model the Brain for Insomnia

Insomnia affects a vast population of the world and can have a wide range of causes. Existing treatments for insomnia have been linked with many side effects like headaches, dizziness, etc. As such, there is a clear need for improved insomnia treatment. Brain modelling has helped with assessing the effects of brain pathology on brain network dynamics and with supporting clinical decisions in the treatment of Alzheimer's disease, epilepsy, etc. However, such models have not been developed for insomnia. Therefore, this project attempts to understand the characteristics of the brain of individuals experiencing insomnia using continuous long-duration EEG data. Brain networks are derived based on functional connectivity and spatial distance between EEG channels. The power spectral density of the channels is then computed for the major brain wave frequency bands. A graph convolutional neural network (GCNN) model is then trained to capture the functional characteristics associated with insomnia and configured for the classification task to judge performance. Results indicated a 50-second non-overlapping sliding window was the most suitable choice for EEG segmentation. This approach achieved a classification accuracy of 70% at window level and 68% at subject level. Additionally, the omission of EEG channels C4-P4, F4-C4 and C4-A1 caused higher degradation in model performance than the removal of other channels. These channel electrodes are positioned near brain regions known to exhibit atypical levels of functional connectivity in individuals with insomnia, which can explain such results.

arXiv.org

Visible Light Indoor Positioning with a Single LED and Distributed Single-Element OIRS: An Iterative Approach with Adaptive Beam Steering arxiv.org/abs/2507.14148 .SP .IT .IT

Visible Light Indoor Positioning with a Single LED and Distributed Single-Element OIRS: An Iterative Approach with Adaptive Beam Steering

The integration of Optical Intelligent Reflective Surfaces (OIRSs) into Visible Light Communication (VLC) systems is gaining momentum as a valid alternative to RF technologies, harnessing the existing lighting infrastructures and the vast unlicensed optical spectrum to enable higher spectral efficiency, improved resilience to Line-of-Sight (LoS) blockages, and enhanced positioning capabilities. This paper investigates the problem of localizing a low-cost Photo Detector (PD) in a VLC-based indoor environment consisting of only a single Light Emitting Diode (LED) as an active anchor, and multiple spatially distributed single-element OIRSs. We formulate the problem within an indirect, computationally efficient localization framework: first, the optimal Maximum Likelihood (ML) estimators of the LoS and Non-Line-of-Sight (NLoS) distances are derived, using a suitable OIRS activation strategy to prevent interferences. To overcome the grid-based optimization required by the ML NLoS estimator, we devise a novel algorithm based on an unstructured noise variance transformation, which admits a closed-form solution. The set of estimated LoS/NLoS distances are then used within a low-complexity localization algorithm combining an Iterative Weighted Least Squares (IWLS) procedure, whose weights are set according to the inverse of the Cramér-Rao Lower Bound (CRLB), with an adaptive beam steering strategy that allows the OIRSs network to dynamically align with the PD, without any prior knowledge of its position. Accordingly, we derive the CRLB for both LoS/NLoS distance estimation and PD position estimation. Simulation results demonstrate the effectiveness of our approach in terms of localization accuracy, robustness against OIRSs misalignment conditions, and low number of iterations required to attain the theoretical bounds.

arXiv.org

Self-DANA: A Resource-Efficient Channel-Adaptive Self-Supervised Approach for ECG Foundation Models arxiv.org/abs/2507.14151 .SP .AI .LG

Self-DANA: A Resource-Efficient Channel-Adaptive Self-Supervised Approach for ECG Foundation Models

Foundation Models (FMs) are large-scale machine learning models trained on extensive, diverse datasets that can be adapted to a wide range of downstream tasks with minimal fine-tuning. In the last two years, interest in FMs has also grown for applications in the cardiological field to analyze the electrocardiogram (ECG) signals. One of the key properties of FMs is their transferability to a wide range of downstream scenarios. With the spread of wearable and portable devices, keen interest in learning from reduced-channel configurations has arisen. However, the adaptation of ECG FMs to downstream scenarios with fewer available channels still has to be properly investigated. In this work, we propose Self-DANA, a novel, easy-to-integrate solution that makes self-supervised architectures adaptable to a reduced number of input channels, ensuring resource efficiency and high performance. We also introduce Random Lead Selection, a novel augmentation technique to pre-train models in a more robust and channel-agnostic way. Our experimental results on five reduced-channel configurations demonstrate that Self-DANA significantly enhances resource efficiency while reaching state-of-the-art performance. It requires up to 69.3% less peak CPU memory, 34.4% less peak GPU memory, about 17% less average epoch CPU time, and about 24% less average epoch GPU time.

arXiv.org

Machine learning-enabled river water quality monitoring using lithography-free 3D-printed sensors arxiv.org/abs/2507.14152 .ins-det .SP .SY .LG .SY

Machine learning-enabled river water quality monitoring using lithography-free 3D-printed sensors

River water quality monitoring is important for aquatic life, livestock, and humans because clean water is critical to meeting food demand during the global food crisis. Excessive contaminants, including phosphate, deplete dissolved oxygen and trigger eutrophication, leading to serious health and ecological problems. Continuous sensors that track phosphate levels can therefore help prevent eutrophication. In this work we present a lithography-free phosphate sensor (P-sensor) that detects phosphate in river water at parts-per-billion levels. The device uses a solid-state indicator electrode formed by 3D-printed periodic polymer patterns (8 um feature size) coated with a thin phosphate ion-selective membrane. The P-sensor detects as little as 1 ppb phosphate across 0 - 475 ppm with a response time under 30 seconds. We validated the sensor on Rappahannock River water, Virginia (less than 0.8 ppm phosphate) at sites upstream and downstream of a sewage treatment plant and benchmarked the results against a commercial phosphate meter. A feed-forward neural network was trained to predict phosphate levels, achieving a mean-squared error below 1e-3, zero standard deviation, and a Pearson correlation coefficient of 0.997 for river samples. These results demonstrate a practical tool for continuous water-quality monitoring that can inform stakeholders and policymakers and ultimately improve public health.

arXiv.org

Extreme Value Theory-based Distributed Interference Prediction for 6G Industrial Sub-networks arxiv.org/abs/2507.14155 .SP .IT .IT

Extreme Value Theory-based Distributed Interference Prediction for 6G Industrial Sub-networks

Interference prediction that accounts for extreme and rare events remains a key challenge for ultra-densely deployed sub-networks (SNs) requiring hyper-reliable low-latency communication (HRLLC), particularly under dynamic mobility, rapidly varying channel statistics, and sporadic traffic. This paper proposes a novel calibrated interference tail prediction framework, a hybrid statistical and machine learning (ML) approach that integrates an inverted quantile patch transformer (iQPTransformer) within extreme value theory (EVT). It captures interference dynamics and tail behavior while quantifying uncertainty to provide statistical coverage guarantees. Its effectiveness is demonstrated by leveraging the estimated interference tail distribution to design predictive, risk-aware resource allocation. In resource-constrained SN scenarios, we introduce the split-iQPTransformer, enabling collaborative training by distributing neural network components between sensor-actuator (SA) pairs and the SN controller, while maintaining minimal performance disparity compared to the centralized iQPTransformer. The framework effectively handles deep fading, random traffic, and time-division duplexing (TDD) misalignments and is resilient to rare and extreme interference events. Extensive evaluations are performed under two mobility models and two realistic SN traffic patterns, using a spatially consistent 3GPP channel model across all scenarios. Experimental results show consistent achievement of block error rate (BLER) targets beyond the 95th percentile in the hyper-reliable regime, significantly outperforming baseline approaches.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.