UAV-Assisted MEC for Disaster Response: Stackelberg Game-Based Resource Optimization arxiv.org/abs/2504.07119 .SP

GIGA: Generalizable Sparse Image-driven Gaussian Avatars arxiv.org/abs/2504.07144 .IV

GIGA: Generalizable Sparse Image-driven Gaussian Avatars

Driving a high-quality and photorealistic full-body human avatar, from only a few RGB cameras, is a challenging problem that has become increasingly relevant with emerging virtual reality technologies. To democratize such technology, a promising solution may be a generalizable method that takes sparse multi-view images of an unseen person and then generates photoreal free-view renderings of such identity. However, the current state of the art is not scalable to very large datasets and, thus, lacks in diversity and photorealism. To address this problem, we propose a novel, generalizable full-body model for rendering photoreal humans in free viewpoint, as driven by sparse multi-view video. For the first time in literature, our model can scale up training to thousands of subjects while maintaining high photorealism. At the core, we introduce a MultiHeadUNet architecture, which takes sparse multi-view images in texture space as input and predicts Gaussian primitives represented as 2D texels on top of a human body mesh. Importantly, we represent sparse-view image information, body shape, and the Gaussian parameters in 2D so that we can design a deep and scalable architecture entirely based on 2D convolutions and attention mechanisms. At test time, our method synthesizes an articulated 3D Gaussian-based avatar from as few as four input views and a tracked body template for unseen identities. Our method excels over prior works by a significant margin in terms of cross-subject generalization capability as well as photorealism.

arXiv.org

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns arxiv.org/abs/2504.07145 .IV

Examining Joint Demosaicing and Denoising for Single-, Quad-, and Nona-Bayer Patterns

Camera sensors have color filters arranged in a mosaic layout, traditionally following the Bayer pattern. Demosaicing is a critical step camera hardware applies to obtain a full-channel RGB image. Many smartphones now have multiple sensors with different patterns, such as Quad-Bayer or Nona-Bayer. Most modern deep network-based models perform joint demosaicing and denoising with the current strategy of training a separate network per pattern. Relying on individual models per pattern requires additional memory overhead and makes it challenging to switch quickly between cameras. In this work, we are interested in analyzing strategies for joint demosaicing and denoising for the three main mosaic layouts (1x1 Single-Bayer, 2x2 Quad-Bayer, and 3x3 Nona-Bayer). We found that concatenating a three-channel mosaic embedding to the input image and training with a unified demosaicing architecture yields results that outperform existing Quad-Bayer and Nona-Bayer models and are comparable to Single-Bayer models. Additionally, we describe a maskout strategy that enhances the model performance and facilitates dead pixel correction -- a step often overlooked by existing AI-based demosaicing models. As part of this effort, we captured a new demosaicing dataset of 638 RAW images that contain challenging scenes with patches annotated for training, validation, and testing.

arXiv.org

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing arxiv.org/abs/2504.07146 .IV

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we call Video SPatiotemporal Splines (VideoSPatS). Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields $D_s$ and $D_c$, which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing. Using multiple prediction branches, our VideoSPatS model also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method enables better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining an appropriate canonical space for editing. We also present general video modeling results on the DAVIS and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of $D_s$ and $D_c$ under neural splines can overcome motion and appearance ambiguities, paving the way for more advanced video editing models.

arXiv.org

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model arxiv.org/abs/2504.07148 .IV

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

Image restoration (IR) often faces various complex and unknown degradations in real-world scenarios, such as noise, blurring, compression artifacts, and low resolution, etc. Training specific models for specific degradation may lead to poor generalization. To handle multiple degradations simultaneously, All-in-One models might sacrifice performance on certain types of degradation and still struggle with unseen degradations during training. Existing IR agents rely on multimodal large language models (MLLM) and a time-consuming rolling-back selection strategy neglecting image quality. As a result, they may misinterpret degradations and have high time and computational costs to conduct unnecessary IR tasks with redundant order. To address these, we propose a Quality-Driven agent (Q-Agent) via Chain-of-Thought (CoT) restoration. Specifically, our Q-Agent consists of robust degradation perception and quality-driven greedy restoration. The former module first fine-tunes MLLM, and uses CoT to decompose multi-degradation perception into single-degradation perception tasks to enhance the perception of MLLMs. The latter employs objective image quality assessment (IQA) metrics to determine the optimal restoration sequence and execute the corresponding restoration algorithms. Experimental results demonstrate that our Q-Agent achieves superior IR performance compared to existing All-in-One models.

arXiv.org

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis arxiv.org/abs/2504.07248 .SY .SY

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis

An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we investigate the impact of carbon-sensitive consumers using equilibrium analysis. Specifically, we expand the commonly used equilibrium model for electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on an average carbon intensity signal. This analysis represents an idealized situation for carbon-sensitive loads, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice where carbon intensity signals only become known to consumers aposteriori (i.e. after the market has already been cleared). We include both illustrative examples and larger numerical simulations, including benchmarking with other methods, to illuminate the contributions and limitations of carbon-sensitive loads in power system emission reductions.

arXiv.org

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution arxiv.org/abs/2504.07308 .IV .CV

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.

arXiv.org

Going beyond explainability in multi-modal stroke outcome prediction models arxiv.org/abs/2504.06299 .IV .AP .CV .LG

Going beyond explainability in multi-modal stroke outcome prediction models

Aim: This study aims to enhance interpretability and explainability of multi-modal prediction models integrating imaging and tabular patient data. Methods: We adapt the xAI methods Grad-CAM and Occlusion to multi-modal, partly interpretable deep transformation models (dTMs). DTMs combine statistical and deep learning approaches to simultaneously achieve state-of-the-art prediction performance and interpretable parameter estimates, such as odds ratios for tabular features. Based on brain imaging and tabular data from 407 stroke patients, we trained dTMs to predict functional outcome three months after stroke. We evaluated the models using different discriminatory metrics. The adapted xAI methods were used to generated explanation maps for identification of relevant image features and error analysis. Results: The dTMs achieve state-of-the-art prediction performance, with area under the curve (AUC) values close to 0.8. The most important tabular predictors of functional outcome are functional independence before stroke and NIHSS on admission, a neurological score indicating stroke severity. Explanation maps calculated from brain imaging dTMs for functional outcome highlighted critical brain regions such as the frontal lobe, which is known to be linked to age which in turn increases the risk for unfavorable outcomes. Similarity plots of the explanation maps revealed distinct patterns which give insight into stroke pathophysiology, support developing novel predictors of stroke outcome and enable to identify false predictions. Conclusion: By adapting methods for explanation maps to dTMs, we enhanced the explainability of multi-modal and partly interpretable prediction models. The resulting explanation maps facilitate error analysis and support hypothesis generation regarding the significance of specific image regions in outcome prediction.

arXiv.org

Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image Compression arxiv.org/abs/2504.06301 .IV .CV

Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image Compression

Learning-based image compression methods have recently emerged as promising alternatives to traditional codecs, offering improved rate-distortion performance and perceptual quality. JPEG AI represents the latest standardized framework in this domain, leveraging deep neural networks for high-fidelity image reconstruction. In this study, we present a comprehensive subjective visual quality assessment of JPEG AI-compressed images using the JPEG AIC-3 methodology, which quantifies perceptual differences in terms of Just Noticeable Difference (JND) units. We generated a dataset of 50 compressed images with fine-grained distortion levels from five diverse sources. A large-scale crowdsourced experiment collected 96,200 triplet responses from 459 participants. We reconstructed JND-based quality scales using a unified model based on boosted and plain triplet comparisons. Additionally, we evaluated the alignment of multiple objective image quality metrics with human perception in the high-fidelity range. The CVVDP metric achieved the overall highest performance; however, most metrics including CVVDP were overly optimistic in predicting the quality of JPEG AI-compressed images. These findings emphasize the necessity for rigorous subjective evaluations in the development and benchmarking of modern image codecs, particularly in the high-fidelity range. Another technical contribution is the introduction of the well-known Meng-Rosenthal-Rubin statistical test to the field of Quality of Experience research. This test can reliably assess the significance of difference in performance of quality metrics in terms of correlation between metrics and ground truth. The complete dataset, including all subjective scores, is publicly available at https://github.com/jpeg-aic/dataset-JPEG-AI-SDR25.

arXiv.org

Restoring Feasibility in Power Grid Optimization: A Counterfactual ML Approach arxiv.org/abs/2504.06369 .SY .SY

Restoring Feasibility in Power Grid Optimization: A Counterfactual ML Approach

Electric power grids are essential components of modern life, delivering reliable power to end-users while adhering to a multitude of engineering constraints and requirements. In grid operations, the Optimal Power Flow problem plays a key role in determining cost-effective generator dispatch that satisfies load demands and operational limits. However, due to stressed operating conditions, volatile demand profiles, and increased generation from intermittent energy sources, this optimization problem may become infeasible, posing risks such as voltage instability and line overloads. This study proposes a learning framework that combines machine learning with counterfactual explanations to automatically diagnose and restore feasibility in the OPF problem. Our method provides transparent and actionable insights by methodically identifying infeasible conditions and suggesting minimal demand response actions. We evaluate the proposed approach on IEEE 30-bus and 300-bus systems, demonstrating its capability to recover feasibility with high success rates and generating diverse corrective options, appropriate for real-time decision-making. These preliminary findings illustrate the potential of combining classical optimization with explainable AI techniques to enhance grid reliability and resilience.

arXiv.org

A Metropolis-Adjusted Langevin Algorithm for Sampling Jeffreys Prior arxiv.org/abs/2504.06372 .SY .ME .ML .SY

A Metropolis-Adjusted Langevin Algorithm for Sampling Jeffreys Prior

Inference and estimation are fundamental aspects of statistics, system identification and machine learning. For most inference problems, prior knowledge is available on the system to be modeled, and Bayesian analysis is a natural framework to impose such prior information in the form of a prior distribution. However, in many situations, coming out with a fully specified prior distribution is not easy, as prior knowledge might be too vague, so practitioners prefer to use a prior distribution that is as `ignorant' or `uninformative' as possible, in the sense of not imposing subjective beliefs, while still supporting reliable statistical analysis. Jeffreys prior is an appealing uninformative prior because it offers two important benefits: (i) it is invariant under any re-parameterization of the model, (ii) it encodes the intrinsic geometric structure of the parameter space through the Fisher information matrix, which in turn enhances the diversity of parameter samples. Despite these benefits, drawing samples from Jeffreys prior is a challenging task. In this paper, we propose a general sampling scheme using the Metropolis-Adjusted Langevin Algorithm that enables sampling of parameter values from Jeffreys prior, and provide numerical illustrations of our approach through several examples.

arXiv.org

Review, Definition and Challenges of Electrical Energy Hubs arxiv.org/abs/2504.06373 .SY .SY

Review, Definition and Challenges of Electrical Energy Hubs

To transition towards a carbon-neutral power system, considerable amounts of renewable energy generation capacity are being installed in the North Sea area. Consequently, projects aggregating many gigawatts of power generation capacity and transmitting renewable energy to the main load centers are being developed. Given the electrical challenges arising from having bulk power capacity in a compact geographical area with several connections to the main grid, and a lack of a robust definition identifying the type of system under study, this paper proposes a general technical definition of such projects introducing the term Electrical Energy Hub (EEH). The concept, purpose, and functionalities of EEHs are introduced in the text, emphasizing the importance of a clear technical definition for future planning procedures, grid codes, regulations, and support schemes for EEHs and multiterminal HVDC (MTDC) grids in general. Furthermore, the unique electrical challenges associated with integrating EEHs into the power system are discussed. Three research areas of concern are identified, namely control, planning, and protection. Through this analysis, insights are provided into the effective implementation of multi-GW scale EEH projects and their integration into the power grid through multiple interconnections. Finally, a list of ongoing and planned grid development projects is evaluated to assess whether they fall within the EEH category

arXiv.org

A Scalable Automatic Model Generation Tool for Cyber-Physical Network Topologies and Data Flows for Large-Scale Synthetic Power Grid Models arxiv.org/abs/2504.06396 .SY .SY

A Scalable Automatic Model Generation Tool for Cyber-Physical Network Topologies and Data Flows for Large-Scale Synthetic Power Grid Models

Power grids and their cyber infrastructure are classified as Critical Energy Infrastructure/Information (CEII) and are not publicly accessible. While realistic synthetic test cases for power systems have been developed in recent years, they often lack corresponding cyber network models. This work extends synthetic grid models by incorporating cyber-physical representations. To address the growing need for realistic and scalable models that integrate both cyber and physical layers in electric power systems, this paper presents the Scalable Automatic Model Generation Tool (SAM-GT). This tool enables the creation of large-scale cyber-physical topologies for power system models. The resulting cyber-physical network models include power system switches, routers, and firewalls while accounting for data flows and industrial communication protocols. Case studies demonstrate the tool's application to synthetic grid models of 500, 2,000, and 10,000 buses, considering three distinct network topologies. Results from these case studies include network metrics on critical nodes, hops, and generation times, showcasing effectiveness, adaptability, and scalability of SAM-GT.

arXiv.org

Panoptic: True Joint mmWave Communication and Sensing with Compressive Sidelobe Forming arxiv.org/abs/2504.06400 .SP

Panoptic: True Joint mmWave Communication and Sensing with Compressive Sidelobe Forming

The integration of communication and sensing functions within mmWave systems has gained attention due to the potential for enhanced passive sensing and improved communication reliability. State-of-the-art techniques separate these two functions in frequency, use of hardware, or time, i.e., sending known preambles for channel sensing or unknown symbols for communications. In this paper, we introduce Panoptic, a novel system architecture for integrated communication and sensing sharing the same hardware, frequency, and time resources. Panoptic jointly detects unknown symbols and channel components from data-modulated signals. The core idea is a new beam manipulation technique, which we call compressive sidelobe forming, that maintains a directional mainlobe toward the intended communication nodes while acquiring unique spatial information through pseudorandom sidelobe perturbations. We implemented Panoptic on 60 GHz mmWave radios and conducted extensive over-the-air experiments. Our results show that Panoptic achieves reflector angular localization error of less than 2°while at the same time supporting mmWave data communication with a negligible BER penalty when compared with conventional communication-only mmWave systems.

arXiv.org

Retuve: Automated Multi-Modality Analysis of Hip Dysplasia with Open Source AI arxiv.org/abs/2504.06422 .IV .CV

Retuve: Automated Multi-Modality Analysis of Hip Dysplasia with Open Source AI

Developmental dysplasia of the hip (DDH) poses significant diagnostic challenges, hindering timely intervention. Current screening methodologies lack standardization, and AI-driven studies suffer from reproducibility issues due to limited data and code availability. To address these limitations, we introduce Retuve, an open-source framework for multi-modality DDH analysis, encompassing both ultrasound (US) and X-ray imaging. Retuve provides a complete and reproducible workflow, offering open datasets comprising expert-annotated US and X-ray images, pre-trained models with training code and weights, and a user-friendly Python Application Programming Interface (API). The framework integrates segmentation and landmark detection models, enabling automated measurement of key diagnostic parameters such as the alpha angle and acetabular index. By adhering to open-source principles, Retuve promotes transparency, collaboration, and accessibility in DDH research. This initiative has the potential to democratize DDH screening, facilitate early diagnosis, and ultimately improve patient outcomes by enabling widespread screening and early intervention. The GitHub repository/code can be found here: https://github.com/radoss-org/retuve

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.