arXiv EE and SS @arxiv_eess@qoto.org

Integrating electrocardiogram and fundus images for early detection of cardiovascular diseases

Integrating electrocardiogram and fundus images for early detection of cardiovascular diseases https://arxiv.org/abs/2504.10493 #eess.IV #cs.CV

Cardiovascular diseases (CVD) are a predominant health concern globally, emphasizing the need for advanced diagnostic techniques. In our research, we present an avant-garde methodology that synergistically integrates ECG readings and retinal fundus images to facilitate the early disease tagging as well as triaging of the CVDs in the order of disease priority. Recognizing the intricate vascular network of the retina as a reflection of the cardiovascular system, alongwith the dynamic cardiac insights from ECG, we sought to provide a holistic diagnostic perspective. Initially, a Fast Fourier Transform (FFT) was applied to both the ECG and fundus images, transforming the data into the frequency domain. Subsequently, the Earth Mover's Distance (EMD) was computed for the frequency-domain features of both modalities. These EMD values were then concatenated, forming a comprehensive feature set that was fed into a Neural Network classifier. This approach, leveraging the FFT's spectral insights and EMD's capability to capture nuanced data differences, offers a robust representation for CVD classification. Preliminary tests yielded a commendable accuracy of 84 percent, underscoring the potential of this combined diagnostic strategy. As we continue our research, we anticipate refining and validating the model further to enhance its clinical applicability in resource limited healthcare ecosystems prevalent across the Indian sub-continent and also the world at large.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Remote Sensing Based Crop Health Classification Using NDVI and Fully Connected Neural Networks

Remote Sensing Based Crop Health Classification Using NDVI and Fully Connected Neural Networks https://arxiv.org/abs/2504.10522 #eess.IV #cs.CV

Accurate crop health monitoring is not only essential for improving agricultural efficiency but also for ensuring sustainable food production in the face of environmental challenges. Traditional approaches often rely on visual inspection or simple NDVI measurements, which, though useful, fall short in detecting nuanced variations in crop stress and disease conditions. In this research, we propose a more sophisticated method that leverages NDVI data combined with a Fully Connected Neural Network (FCNN) to classify crop health with greater precision. The FCNN, trained using satellite imagery from various agricultural regions, is capable of identifying subtle distinctions between healthy crops, rust-affected plants, and other stressed conditions. Our approach not only achieved a remarkable classification accuracy of 97.80% but it also significantly outperformed conventional models in terms of precision, recall, and F1-scores. The ability to map the relationship between NDVI values and crop health using deep learning presents new opportunities for real-time, large-scale monitoring of agricultural fields, reducing manual efforts, and offering a scalable solution to address global food security.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2

PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2 https://arxiv.org/abs/2504.10526 #eess.IV #cs.CV

Current methods for pathology image segmentation typically treat 2D slices independently, ignoring valuable cross-slice information. We present PathSeqSAM, a novel approach that treats 2D pathology slices as sequential video frames using SAM2's memory mechanisms. Our method introduces a distance-aware attention mechanism that accounts for variable physical distances between slices and employs LoRA for domain adaptation. Evaluated on the KPI Challenge 2024 dataset for glomeruli segmentation, PathSeqSAM demonstrates improved segmentation quality, particularly in challenging cases that benefit from cross-slice context. We have publicly released our code at https://github.com/JackyyyWang/PathSeqSAM.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Imaging Transformer for MRI Denoising: a Scalable Model Architecture that enables SNR << 1 Imaging

Imaging Transformer for MRI Denoising: a Scalable Model Architecture that enables SNR << 1 Imaging https://arxiv.org/abs/2504.10534 #physics.med-ph #eess.IV #eess.SP

Purpose: To propose a flexible and scalable imaging transformer (IT) architecture with three attention modules for multi-dimensional imaging data and apply it to MRI denoising with very low input SNR. Methods: Three independent attention modules were developed: spatial local, spatial global, and frame attentions. They capture long-range signal correlation and bring back the locality of information in images. An attention-cell-block design processes 5D tensors ([B, C, F, H, W]) for 2D, 2D+T, and 3D image data. A High Resolution (HRNet) backbone was built to hold IT blocks. Training dataset consists of 206,677 cine series and test datasets had 7,267 series. Ten input SNR levels from 0.05 to 8.0 were tested. IT models were compared to seven convolutional and transformer baselines. To test scalability, four IT models 27m to 218m parameters were trained. Two senior cardiologists reviewed IT model outputs from which the EF was measured and compared against the ground-truth. Results: IT models significantly outperformed other models over the tested SNR levels. The performance gap was most prominent at low SNR levels. The IT-218m model had the highest SSIM and PSNR, restoring good image quality and anatomical details even at SNR 0.2. Two experts agreed at this SNR or above, the IT model output gave the same clinical interpretation as the ground-truth. The model produced images that had accurate EF measurements compared to ground-truth values. Conclusions: Imaging transformer model offers strong performance, scalability, and versatility for MR denoising. It recovers image quality suitable for confident clinical reading and accurate EF measurement, even at very low input SNR of 0.2.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Secure Estimation of Battery Voltage Under Sensor Attacks: A Self-Learning Koopman Approach

Secure Estimation of Battery Voltage Under Sensor Attacks: A Self-Learning Koopman Approach https://arxiv.org/abs/2504.10639 #eess.SY #cs.SY

Cloud-based battery management system (BMS) requires accurate terminal voltage measurement data to ensure optimal and safe charging of Lithium-ion batteries. Unfortunately, an adversary can corrupt the battery terminal voltage data as it passes from the local-BMS to the cloud-BMS through the communication network, with the objective of under- or over-charging the battery. To ensure accurate terminal voltage data under such malicious sensor attacks, this paper investigates a Koopman-based secure terminal voltage estimation scheme using a two-stage error-compensated self-learning feedback. During the first stage of error correction, the potential Koopman prediction error is estimated to compensate for the error accumulation due to the linear approximation of Koopman operator. The second stage of error compensation aims to recover the error amassing from the higher-order dynamics of the Lithium-ion batteries missed by the self-learning strategy. Specifically, we have proposed two different methods for this second stage error compensation. First, an interpretable empirical correction strategy has been obtained using the open circuit voltage to state-of-charge mapping for the battery. Second, a Gaussian process regression-based data-driven method has been explored. Finally, we demonstrate the efficacy of the proposed secure estimator using both empirical and data-driven corrections.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Secrecy Rate Maximization with Artificial Noise for Pinching-Antenna Systems

Secrecy Rate Maximization with Artificial Noise for Pinching-Antenna Systems https://arxiv.org/abs/2504.10656 #eess.SP

Security is emerging as a critical performance metric for next-generation wireless networks, but conventional multiple-input-multiple-output (MIMO) systems often suffer from severe path loss and are vulnerable to nearby eavesdroppers due to their fixed-antenna configurations. Pinching-antenna systems (PAS) offer a promising alternative, leveraging reconfigurable pinching antennas (PAs) positioned along low-loss dielectric waveguides to enhance channel conditions and dynamically mitigate security threats. In this paper, we propose an artificial noise (AN)-based beamforming scheme for downlink transmissions in PAS, with the goal of maximizing the secrecy rate. A closed-form solution is derived for the single-waveguide scenario, while an alternating optimization approach addresses more complex multiple waveguide setups. Numerical results show that the proposed scheme significantly outperforms conventional MIMO and existing PAS security schemes.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Transfer Learning Assisted XgBoost For Adaptable Cyberattack Detection In Battery Packs

Transfer Learning Assisted XgBoost For Adaptable Cyberattack Detection In Battery Packs https://arxiv.org/abs/2504.10658 #eess.SY #cs.LG #cs.SY

Optimal charging of electric vehicle (EVs) depends heavily on reliable sensor measurements from the battery pack to the cloud-controller of the smart charging station. However, an adversary could corrupt the voltage sensor data during transmission, potentially causing local to wide-scale disruptions. Therefore, it is essential to detect sensor cyberattacks in real-time to ensure secure EV charging, and the developed algorithms must be readily adaptable to variations, including pack configurations. To tackle these challenges, we propose adaptable fine-tuning of an XgBoost-based cell-level model using limited pack-level data to use for voltage prediction and residual generation. We used battery cell and pack data from high-fidelity charging experiments in PyBaMM and `liionpack' package to train and test the detection algorithm. The algorithm's performance has been evaluated for two large-format battery packs under sensor swapping and replay attacks. The simulation results also highlight the adaptability and efficacy of our proposed detection algorithm.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Correcting Domain Shifts in Electric Motor Vibration Data for Unseen Operating Conditions

Correcting Domain Shifts in Electric Motor Vibration Data for Unseen Operating Conditions https://arxiv.org/abs/2504.10661 #eess.SP

This paper addresses the problem of domain shifts in electric motor vibration data created by new operating conditions in testing scenarios, focusing on bearing fault detection and diagnosis (FDD). The proposed method combines the Harmonic Feature Space (HFS) with regression to correct for frequency and energy differentials in steady-state data, enabling accurate FDD on unseen operating conditions within the range of the training conditions. The HFS aligns harmonics across different operating frequencies, while regression compensates for energy variations, preserving the relative magnitude of vibrations critical for fault detection. The proposed approach is evaluated on a detection problem using experimental data from a Belt-Starter Generator (BSG) electric motor, with test conditions having a minimum 1000 RPM and 5 Nm difference from training conditions. Results demonstrate that the method outperforms traditional analysis techniques, achieving high classification accuracy at a 94% detection rate and effectively reducing domain shifts. The approach is computationally efficient, requires only healthy data for training, and is well-suited for real-world applications where the exact application operating conditions cannot be predetermined.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

GPS-Independent Localization Techniques for Disaster Rescue

GPS-Independent Localization Techniques for Disaster Rescue https://arxiv.org/abs/2504.10666 #eess.SP

In this article, we present the limitations of traditional localization techniques, such as those using Global Positioning Systems (GPS) and life detectors, in localizing victims during disaster rescue efforts. These techniques usually fall short in accuracy, coverage, and robustness to environmental interference. We then discuss the necessary requirements for developing GPS-independent localization techniques in disaster scenarios. Practical techniques should be passive, with straightforward hardware, low computational demands, low power, and high accuracy, while incorporating unknown environmental information. We review various implementation strategies for these techniques, categorized by measurements (time, angle, and signal strength) and operation manners (non-cooperative and cooperative). Case studies demonstrate trade-offs between localization accuracy and complexity, emphasizing the importance of choosing appropriate localization techniques based on resources and rescue needs for efficient disaster response.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 17

Spectrum Sharing in STAR-RIS-assisted UAV with NOMA for Cognitive Radio Networks

Spectrum Sharing in STAR-RIS-assisted UAV with NOMA for Cognitive Radio Networks https://arxiv.org/abs/2504.10691 #eess.SY #cs.SY

As an emerging technology, the simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) can improve the spectrum efficiency (SE) of primary users (PUs) and secondary users (SUs) in cognitive radio (CR) networks by mitigating the interference of the incident signals. The STAR-RIS-assisted unmanned aerial vehicle (UAV) can fully cover the dynamic environment through high mobility and fast deployment. According to the dynamic air-to-ground channels, the STAR-RIS-assisted UAV may face a challenge configuring their elements' coefficients (i.e., reflecting and transmitting the amplitude and phases). Hence, to meet the requirements of dynamic channel determination with the SE approach, this paper proposes the sum rate maximization of both PUs and SUs through non-orthogonal multiple access in CR network to jointly optimize the trajectory and transmission-reflection beamforming design of the STAR-RIS-assisted UAV, and power allocation. Since the non-convex joint optimization problem includes coupled optimization variables, we develop an alternative optimization algorithm. Simulation results study the impact of: 1) the significant parameters, 2) the performance of different intelligence surface modes and STAR-RIS operating protocols, 3) the joint trajectory and beamforming design with fixed and mobile users, and 4) STAR-RIS capabilities such as mitigating the interference, and how variations in the roles of elements dynamically.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

UAV-Assisted MEC for Disaster Response: Stackelberg Game-Based Resource Optimization https://arxiv.org/abs/2504.07119 #eess.SP

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

GIGA: Generalizable Sparse Image-driven Gaussian Avatars

GIGA: Generalizable Sparse Image-driven Gaussian Avatars https://arxiv.org/abs/2504.07144 #eess.IV

Driving a high-quality and photorealistic full-body human avatar, from only a few RGB cameras, is a challenging problem that has become increasingly relevant with emerging virtual reality technologies. To democratize such technology, a promising solution may be a generalizable method that takes sparse multi-view images of an unseen person and then generates photoreal free-view renderings of such identity. However, the current state of the art is not scalable to very large datasets and, thus, lacks in diversity and photorealism. To address this problem, we propose a novel, generalizable full-body model for rendering photoreal humans in free viewpoint, as driven by sparse multi-view video. For the first time in literature, our model can scale up training to thousands of subjects while maintaining high photorealism. At the core, we introduce a MultiHeadUNet architecture, which takes sparse multi-view images in texture space as input and predicts Gaussian primitives represented as 2D texels on top of a human body mesh. Importantly, we represent sparse-view image information, body shape, and the Gaussian parameters in 2D so that we can design a deep and scalable architecture entirely based on 2D convolutions and attention mechanisms. At test time, our method synthesizes an articulated 3D Gaussian-based avatar from as few as four input views and a tracked body template for unseen identities. Our method excels over prior works by a significant margin in terms of cross-subject generalization capability as well as photorealism.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns https://arxiv.org/abs/2504.07145 #eess.IV

Camera sensors have color filters arranged in a mosaic layout, traditionally following the Bayer pattern. Demosaicing is a critical step camera hardware applies to obtain a full-channel RGB image. Many smartphones now have multiple sensors with different patterns, such as Quad-Bayer or Nona-Bayer. Most modern deep network-based models perform joint demosaicing and denoising with the current strategy of training a separate network per pattern. Relying on individual models per pattern requires additional memory overhead and makes it challenging to switch quickly between cameras. In this work, we are interested in analyzing strategies for joint demosaicing and denoising for the three main mosaic layouts (1x1 Single-Bayer, 2x2 Quad-Bayer, and 3x3 Nona-Bayer). We found that concatenating a three-channel mosaic embedding to the input image and training with a unified demosaicing architecture yields results that outperform existing Quad-Bayer and Nona-Bayer models and are comparable to Single-Bayer models. Additionally, we describe a maskout strategy that enhances the model performance and facilitates dead pixel correction -- a step often overlooked by existing AI-based demosaicing models. As part of this effort, we captured a new demosaicing dataset of 638 RAW images that contain challenging scenes with patches annotated for training, validation, and testing.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing https://arxiv.org/abs/2504.07146 #eess.IV

We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we call Video SPatiotemporal Splines (VideoSPatS). Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields $D_s$ and $D_c$, which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing. Using multiple prediction branches, our VideoSPatS model also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method enables better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining an appropriate canonical space for editing. We also present general video modeling results on the DAVIS and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of $D_s$ and $D_c$ under neural splines can overcome motion and appearance ambiguities, paving the way for more advanced video editing models.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model https://arxiv.org/abs/2504.07148 #eess.IV

Image restoration (IR) often faces various complex and unknown degradations in real-world scenarios, such as noise, blurring, compression artifacts, and low resolution, etc. Training specific models for specific degradation may lead to poor generalization. To handle multiple degradations simultaneously, All-in-One models might sacrifice performance on certain types of degradation and still struggle with unseen degradations during training. Existing IR agents rely on multimodal large language models (MLLM) and a time-consuming rolling-back selection strategy neglecting image quality. As a result, they may misinterpret degradations and have high time and computational costs to conduct unnecessary IR tasks with redundant order. To address these, we propose a Quality-Driven agent (Q-Agent) via Chain-of-Thought (CoT) restoration. Specifically, our Q-Agent consists of robust degradation perception and quality-driven greedy restoration. The former module first fine-tunes MLLM, and uses CoT to decompose multi-degradation perception into single-degradation perception tasks to enhance the perception of MLLMs. The latter employs objective image quality assessment (IQA) metrics to determine the optimal restoration sequence and execute the corresponding restoration algorithms. Experimental results demonstrate that our Q-Agent achieves superior IR performance compared to existing All-in-One models.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Multi-Agent Trustworthy Consensus under Random Dynamic Attacks

Multi-Agent Trustworthy Consensus under Random Dynamic Attacks https://arxiv.org/abs/2504.07189 #eess.SY #cs.SY

In this work, we study the consensus problem in which legitimate agents send their values over an undirected communication network in the presence of an unknown subset of malicious or faulty agents. In contrast to former works, we generalize and characterize the properties of consensus dynamics with dependent sequences of malicious transmissions with dynamic (time-varying) rates, based on not necessarily independent trust observations. We consider a detection algorithm utilizing stochastic trust observations available to legitimate agents. Under these conditions, legitimate agents almost surely classify their neighbors and form their trusted neighborhoods correctly with decaying misclassification probabilities. We further prove that the consensus process converges almost surely despite the existence of malicious agents. For a given value of failure probability, we characterize the deviation from the nominal consensus value ideally occurring when there are no malicious agents in the system. We also examine the convergence rate of the process in finite time. Numerical simulations show the convergence among agents and indicate the deviation under different attack scenarios.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Compositional design for time-varying and nonlinear coordination

Compositional design for time-varying and nonlinear coordination https://arxiv.org/abs/2504.07226 #eess.SY #math.OC #cs.SY

This work addresses the design of multi-agent coordination through high-order consensus protocols. While first-order consensus strategies are well-studied -- with known robustness to uncertainties such as time delays, time-varying weights, and nonlinearities like saturations -- the theoretical guarantees for high-order consensus are comparatively limited. We propose a compositional control framework that generates high-order consensus protocols by serially connecting stable first-order consensus operators. Under mild assumptions, we establish that the resulting high-order system inherits stability properties from its components. The proposed design is versatile and supports a wide range of real-world constraints. This is demonstrated through applications inspired by vehicular formation control, including protocols with time-varying weights, bounded time-varying delays, and saturated inputs. We derive theoretical guarantees for these settings using the proposed compositional approach and demonstrate the advantages gained compared to conventional protocols in simulations.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Dual Deep Learning Approach for Non-invasive Renal Tumour Subtyping with VERDICT-MRI

Dual Deep Learning Approach for Non-invasive Renal Tumour Subtyping with VERDICT-MRI https://arxiv.org/abs/2504.07246 #eess.IV

This work aims to characterise renal tumour microstructure using diffusion MRI (dMRI); via the Vascular, Extracellular and Restricted Diffusion for Cytometry in Tumours (VERDICT)-MRI framework with self-supervised learning. Comprehensive datasets were acquired from 14 patients with 15 biopsy-confirmed renal tumours, with nine b-values in the range b=[0,2500]s/mm2. A three-compartment VERDICT model for renal tumours was fitted to the dMRI data using a self-supervised deep neural network, and ROIs were drawn by an experienced uroradiologist. An economical acquisition protocol for future studies with larger patient cohorts was optimised using a recursive feature selection approach. The VERDICT model described the diffusion data in renal tumours more accurately than IVIM or ADC. Combined with self-supervised deep learning, VERDICT identified significant differences in the intracellular volume fraction between cancerous and normal tissue, and in the vascular volume fraction between vascular and non-vascular. The feature selector yields a 4 b-value acquisition of b = [70,150,1000,2000], with a duration of 14 minutes.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis https://arxiv.org/abs/2504.07248 #eess.SY #cs.SY

An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we investigate the impact of carbon-sensitive consumers using equilibrium analysis. Specifically, we expand the commonly used equilibrium model for electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on an average carbon intensity signal. This analysis represents an idealized situation for carbon-sensitive loads, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice where carbon intensity signals only become known to consumers aposteriori (i.e. after the market has already been cleared). We include both illustrative examples and larger numerical simulations, including benchmarking with other methods, to illuminate the contributions and limitations of carbon-sensitive loads in power system emission reductions.

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

**arXiv EE and SS** @arxiv_eess@qoto.org · Apr 12

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution https://arxiv.org/abs/2504.07308 #eess.IV #cs.CV

Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.