Show newer

Integrating electrocardiogram and fundus images for early detection of cardiovascular diseases arxiv.org/abs/2504.10493 .IV .CV

Integrating electrocardiogram and fundus images for early detection of cardiovascular diseases

Cardiovascular diseases (CVD) are a predominant health concern globally, emphasizing the need for advanced diagnostic techniques. In our research, we present an avant-garde methodology that synergistically integrates ECG readings and retinal fundus images to facilitate the early disease tagging as well as triaging of the CVDs in the order of disease priority. Recognizing the intricate vascular network of the retina as a reflection of the cardiovascular system, alongwith the dynamic cardiac insights from ECG, we sought to provide a holistic diagnostic perspective. Initially, a Fast Fourier Transform (FFT) was applied to both the ECG and fundus images, transforming the data into the frequency domain. Subsequently, the Earth Mover's Distance (EMD) was computed for the frequency-domain features of both modalities. These EMD values were then concatenated, forming a comprehensive feature set that was fed into a Neural Network classifier. This approach, leveraging the FFT's spectral insights and EMD's capability to capture nuanced data differences, offers a robust representation for CVD classification. Preliminary tests yielded a commendable accuracy of 84 percent, underscoring the potential of this combined diagnostic strategy. As we continue our research, we anticipate refining and validating the model further to enhance its clinical applicability in resource limited healthcare ecosystems prevalent across the Indian sub-continent and also the world at large.

arXiv.org

Remote Sensing Based Crop Health Classification Using NDVI and Fully Connected Neural Networks arxiv.org/abs/2504.10522 .IV .CV

Remote Sensing Based Crop Health Classification Using NDVI and Fully Connected Neural Networks

Accurate crop health monitoring is not only essential for improving agricultural efficiency but also for ensuring sustainable food production in the face of environmental challenges. Traditional approaches often rely on visual inspection or simple NDVI measurements, which, though useful, fall short in detecting nuanced variations in crop stress and disease conditions. In this research, we propose a more sophisticated method that leverages NDVI data combined with a Fully Connected Neural Network (FCNN) to classify crop health with greater precision. The FCNN, trained using satellite imagery from various agricultural regions, is capable of identifying subtle distinctions between healthy crops, rust-affected plants, and other stressed conditions. Our approach not only achieved a remarkable classification accuracy of 97.80% but it also significantly outperformed conventional models in terms of precision, recall, and F1-scores. The ability to map the relationship between NDVI values and crop health using deep learning presents new opportunities for real-time, large-scale monitoring of agricultural fields, reducing manual efforts, and offering a scalable solution to address global food security.

arXiv.org

Imaging Transformer for MRI Denoising: a Scalable Model Architecture that enables SNR << 1 Imaging arxiv.org/abs/2504.10534 .med-ph .IV .SP

Imaging Transformer for MRI Denoising: a Scalable Model Architecture that enables SNR << 1 Imaging

Purpose: To propose a flexible and scalable imaging transformer (IT) architecture with three attention modules for multi-dimensional imaging data and apply it to MRI denoising with very low input SNR. Methods: Three independent attention modules were developed: spatial local, spatial global, and frame attentions. They capture long-range signal correlation and bring back the locality of information in images. An attention-cell-block design processes 5D tensors ([B, C, F, H, W]) for 2D, 2D+T, and 3D image data. A High Resolution (HRNet) backbone was built to hold IT blocks. Training dataset consists of 206,677 cine series and test datasets had 7,267 series. Ten input SNR levels from 0.05 to 8.0 were tested. IT models were compared to seven convolutional and transformer baselines. To test scalability, four IT models 27m to 218m parameters were trained. Two senior cardiologists reviewed IT model outputs from which the EF was measured and compared against the ground-truth. Results: IT models significantly outperformed other models over the tested SNR levels. The performance gap was most prominent at low SNR levels. The IT-218m model had the highest SSIM and PSNR, restoring good image quality and anatomical details even at SNR 0.2. Two experts agreed at this SNR or above, the IT model output gave the same clinical interpretation as the ground-truth. The model produced images that had accurate EF measurements compared to ground-truth values. Conclusions: Imaging transformer model offers strong performance, scalability, and versatility for MR denoising. It recovers image quality suitable for confident clinical reading and accurate EF measurement, even at very low input SNR of 0.2.

arXiv.org

Secure Estimation of Battery Voltage Under Sensor Attacks: A Self-Learning Koopman Approach arxiv.org/abs/2504.10639 .SY .SY

Secure Estimation of Battery Voltage Under Sensor Attacks: A Self-Learning Koopman Approach

Cloud-based battery management system (BMS) requires accurate terminal voltage measurement data to ensure optimal and safe charging of Lithium-ion batteries. Unfortunately, an adversary can corrupt the battery terminal voltage data as it passes from the local-BMS to the cloud-BMS through the communication network, with the objective of under- or over-charging the battery. To ensure accurate terminal voltage data under such malicious sensor attacks, this paper investigates a Koopman-based secure terminal voltage estimation scheme using a two-stage error-compensated self-learning feedback. During the first stage of error correction, the potential Koopman prediction error is estimated to compensate for the error accumulation due to the linear approximation of Koopman operator. The second stage of error compensation aims to recover the error amassing from the higher-order dynamics of the Lithium-ion batteries missed by the self-learning strategy. Specifically, we have proposed two different methods for this second stage error compensation. First, an interpretable empirical correction strategy has been obtained using the open circuit voltage to state-of-charge mapping for the battery. Second, a Gaussian process regression-based data-driven method has been explored. Finally, we demonstrate the efficacy of the proposed secure estimator using both empirical and data-driven corrections.

arXiv.org

Correcting Domain Shifts in Electric Motor Vibration Data for Unseen Operating Conditions arxiv.org/abs/2504.10661 .SP

Correcting Domain Shifts in Electric Motor Vibration Data for Unseen Operating Conditions

This paper addresses the problem of domain shifts in electric motor vibration data created by new operating conditions in testing scenarios, focusing on bearing fault detection and diagnosis (FDD). The proposed method combines the Harmonic Feature Space (HFS) with regression to correct for frequency and energy differentials in steady-state data, enabling accurate FDD on unseen operating conditions within the range of the training conditions. The HFS aligns harmonics across different operating frequencies, while regression compensates for energy variations, preserving the relative magnitude of vibrations critical for fault detection. The proposed approach is evaluated on a detection problem using experimental data from a Belt-Starter Generator (BSG) electric motor, with test conditions having a minimum 1000 RPM and 5 Nm difference from training conditions. Results demonstrate that the method outperforms traditional analysis techniques, achieving high classification accuracy at a 94% detection rate and effectively reducing domain shifts. The approach is computationally efficient, requires only healthy data for training, and is well-suited for real-world applications where the exact application operating conditions cannot be predetermined.

arXiv.org

Spectrum Sharing in STAR-RIS-assisted UAV with NOMA for Cognitive Radio Networks arxiv.org/abs/2504.10691 .SY .SY

Spectrum Sharing in STAR-RIS-assisted UAV with NOMA for Cognitive Radio Networks

As an emerging technology, the simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) can improve the spectrum efficiency (SE) of primary users (PUs) and secondary users (SUs) in cognitive radio (CR) networks by mitigating the interference of the incident signals. The STAR-RIS-assisted unmanned aerial vehicle (UAV) can fully cover the dynamic environment through high mobility and fast deployment. According to the dynamic air-to-ground channels, the STAR-RIS-assisted UAV may face a challenge configuring their elements' coefficients (i.e., reflecting and transmitting the amplitude and phases). Hence, to meet the requirements of dynamic channel determination with the SE approach, this paper proposes the sum rate maximization of both PUs and SUs through non-orthogonal multiple access in CR network to jointly optimize the trajectory and transmission-reflection beamforming design of the STAR-RIS-assisted UAV, and power allocation. Since the non-convex joint optimization problem includes coupled optimization variables, we develop an alternative optimization algorithm. Simulation results study the impact of: 1) the significant parameters, 2) the performance of different intelligence surface modes and STAR-RIS operating protocols, 3) the joint trajectory and beamforming design with fixed and mobile users, and 4) STAR-RIS capabilities such as mitigating the interference, and how variations in the roles of elements dynamically.

arXiv.org

UAV-Assisted MEC for Disaster Response: Stackelberg Game-Based Resource Optimization arxiv.org/abs/2504.07119 .SP

GIGA: Generalizable Sparse Image-driven Gaussian Avatars arxiv.org/abs/2504.07144 .IV

GIGA: Generalizable Sparse Image-driven Gaussian Avatars

Driving a high-quality and photorealistic full-body human avatar, from only a few RGB cameras, is a challenging problem that has become increasingly relevant with emerging virtual reality technologies. To democratize such technology, a promising solution may be a generalizable method that takes sparse multi-view images of an unseen person and then generates photoreal free-view renderings of such identity. However, the current state of the art is not scalable to very large datasets and, thus, lacks in diversity and photorealism. To address this problem, we propose a novel, generalizable full-body model for rendering photoreal humans in free viewpoint, as driven by sparse multi-view video. For the first time in literature, our model can scale up training to thousands of subjects while maintaining high photorealism. At the core, we introduce a MultiHeadUNet architecture, which takes sparse multi-view images in texture space as input and predicts Gaussian primitives represented as 2D texels on top of a human body mesh. Importantly, we represent sparse-view image information, body shape, and the Gaussian parameters in 2D so that we can design a deep and scalable architecture entirely based on 2D convolutions and attention mechanisms. At test time, our method synthesizes an articulated 3D Gaussian-based avatar from as few as four input views and a tracked body template for unseen identities. Our method excels over prior works by a significant margin in terms of cross-subject generalization capability as well as photorealism.

arXiv.org

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns arxiv.org/abs/2504.07145 .IV

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns

Camera sensors have color filters arranged in a mosaic layout, traditionally following the Bayer pattern. Demosaicing is a critical step camera hardware applies to obtain a full-channel RGB image. Many smartphones now have multiple sensors with different patterns, such as Quad-Bayer or Nona-Bayer. Most modern deep network-based models perform joint demosaicing and denoising with the current strategy of training a separate network per pattern. Relying on individual models per pattern requires additional memory overhead and makes it challenging to switch quickly between cameras. In this work, we are interested in analyzing strategies for joint demosaicing and denoising for the three main mosaic layouts (1x1 Single-Bayer, 2x2 Quad-Bayer, and 3x3 Nona-Bayer). We found that concatenating a three-channel mosaic embedding to the input image and training with a unified demosaicing architecture yields results that outperform existing Quad-Bayer and Nona-Bayer models and are comparable to Single-Bayer models. Additionally, we describe a maskout strategy that enhances the model performance and facilitates dead pixel correction -- a step often overlooked by existing AI-based demosaicing models. As part of this effort, we captured a new demosaicing dataset of 638 RAW images that contain challenging scenes with patches annotated for training, validation, and testing.

arXiv.org

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing arxiv.org/abs/2504.07146 .IV

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we call Video SPatiotemporal Splines (VideoSPatS). Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields $D_s$ and $D_c$, which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing. Using multiple prediction branches, our VideoSPatS model also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method enables better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining an appropriate canonical space for editing. We also present general video modeling results on the DAVIS and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of $D_s$ and $D_c$ under neural splines can overcome motion and appearance ambiguities, paving the way for more advanced video editing models.

arXiv.org

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model arxiv.org/abs/2504.07148 .IV

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

Image restoration (IR) often faces various complex and unknown degradations in real-world scenarios, such as noise, blurring, compression artifacts, and low resolution, etc. Training specific models for specific degradation may lead to poor generalization. To handle multiple degradations simultaneously, All-in-One models might sacrifice performance on certain types of degradation and still struggle with unseen degradations during training. Existing IR agents rely on multimodal large language models (MLLM) and a time-consuming rolling-back selection strategy neglecting image quality. As a result, they may misinterpret degradations and have high time and computational costs to conduct unnecessary IR tasks with redundant order. To address these, we propose a Quality-Driven agent (Q-Agent) via Chain-of-Thought (CoT) restoration. Specifically, our Q-Agent consists of robust degradation perception and quality-driven greedy restoration. The former module first fine-tunes MLLM, and uses CoT to decompose multi-degradation perception into single-degradation perception tasks to enhance the perception of MLLMs. The latter employs objective image quality assessment (IQA) metrics to determine the optimal restoration sequence and execute the corresponding restoration algorithms. Experimental results demonstrate that our Q-Agent achieves superior IR performance compared to existing All-in-One models.

arXiv.org

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis arxiv.org/abs/2504.07248 .SY .SY

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis

An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we investigate the impact of carbon-sensitive consumers using equilibrium analysis. Specifically, we expand the commonly used equilibrium model for electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on an average carbon intensity signal. This analysis represents an idealized situation for carbon-sensitive loads, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice where carbon intensity signals only become known to consumers aposteriori (i.e. after the market has already been cleared). We include both illustrative examples and larger numerical simulations, including benchmarking with other methods, to illuminate the contributions and limitations of carbon-sensitive loads in power system emission reductions.

arXiv.org

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution arxiv.org/abs/2504.07308 .IV .CV

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.