Show newer

Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis arxiv.org/abs/2507.00227 .AS .AI

When Every Symbol Counts: Resilient Wireless Systems Under Finite Blocklength Constraints arxiv.org/abs/2506.21664 .SP .SY .IT .IT .SY

When Every Symbol Counts: Resilient Wireless Systems Under Finite Blocklength Constraints

As 6G evolves, wireless networks become essential for critical operations and enable innovative applications that demand seamless adaptation to dynamic environments and disruptions. Because these vital services require uninterrupted operation, their resilience to unforeseen disruptions is essential. However, implementing resilience necessitates rapid recovery procedures, which operate in the finite blocklength (FBL) regime, where short packets and added error-correction overhead can severely degrade communication efficiency. Due to this performance loss, always attempting recovery can backfire and result in worse outcomes than simply enduring the disruption under longer blocklengths. In this work, we study these effects of FBL constraints within a resilience framework, incorporating reconfigurable intelligent surfaces (RIS) to enhance adaptation capabilities. By actively shaping the wireless environment, RIS help counteract some of the performance losses caused by FBL, enabling more effective recovery from disruptions. Numerical results reveal two critical blocklength thresholds: the first enables full recovery from the FBL penalty, while the second, at a higher blocklength, allows the system to recover from both the FBL penalty and the initial disruption, yielding a significant improvement in resilience performance. Additionally, we show that the number of RIS elements shifts these thresholds, enabling faster reconfiguration with shorter blocklengths and providing insights to the trade-offs between rate, blocklength, and reconfiguration effort under FBL conditions.

arXiv.org

PhotonSplat: 3D Scene Reconstruction and Colorization from SPAD Sensors arxiv.org/abs/2506.21680 .IV .CV

PhotonSplat: 3D Scene Reconstruction and Colorization from SPAD Sensors

Advances in 3D reconstruction using neural rendering have enabled high-quality 3D capture. However, they often fail when the input imagery is corrupted by motion blur, due to fast motion of the camera or the objects in the scene. This work advances neural rendering techniques in such scenarios by using single-photon avalanche diode (SPAD) arrays, an emerging sensing technology capable of sensing images at extremely high speeds. However, the use of SPADs presents its own set of unique challenges in the form of binary images, that are driven by stochastic photon arrivals. To address this, we introduce PhotonSplat, a framework designed to reconstruct 3D scenes directly from SPAD binary images, effectively navigating the noise vs. blur trade-off. Our approach incorporates a novel 3D spatial filtering technique to reduce noise in the renderings. The framework also supports both no-reference using generative priors and reference-based colorization from a single blurry image, enabling downstream applications such as segmentation, object detection and appearance editing tasks. Additionally, we extend our method to incorporate dynamic scene representations, making it suitable for scenes with moving objects. We further contribute PhotonScenes, a real-world multi-view dataset captured with the SPAD sensors.

arXiv.org

Joint RIS-UE Association and Beamforming Design in RIS-Assisted Cell-Free MIMO Network arxiv.org/abs/2506.21690 .SP

Joint RIS-UE Association and Beamforming Design in RIS-Assisted Cell-Free MIMO Network

Reconfigurable intelligent surface (RIS)-assisted cell-free (CF) multiple-input multiple-output (MIMO) networks can significantly enhance system performance. However, the extensive deployment of RIS elements imposes considerable channel acquisition overhead, with the high density of nodes and antennas in RIS-assisted CF networks amplifying this challenge. To tackle this issue, in this paper, we explore integrating RIS-user equipment (UE) association into downlink RIS-assisted CF transmitter design, which greatly reduces the channel acquisition costs. The key point is that once UEs are associated with specific RISs, there is no need to frequently acquire channels from non-associated RISs. Then, we formulate the problem of joint RIS-UE association and beamforming at APs and RISs to maximize the weighted sum rate (WSR). In particular, we propose a two-stage framework to solve it. In the first stage, we apply a many-to-many matching algorithm to establish the RIS-UE association. In the second stage, we introduce a sequential optimization-based method that decomposes the joint optimization of RIS phase shifts and AP beamforming into two distinct subproblems. To optimize the RIS phase shifts, we employ the majorization-minimization (MM) algorithm to obtain a semi-closed-form solution. For AP beamforming, we develop a joint block diagonalization algorithm, which yields a closed-form solution. Simulation results demonstrate the effectiveness of the proposed algorithm and show that, while RIS-UE association significantly reduces overhead, it incurs a minor performance loss that remains within an acceptable range. Additionally, we investigate the impact of RIS deployment and conclude that RISs exhibit enhanced performance when positioned between APs and UEs.

arXiv.org

TUS-REC2024: A Challenge to Reconstruct 3D Freehand Ultrasound Without External Tracker arxiv.org/abs/2506.21765 .IV .CV

TUS-REC2024: A Challenge to Reconstruct 3D Freehand Ultrasound Without External Tracker

Trackerless freehand ultrasound reconstruction aims to reconstruct 3D volumes from sequences of 2D ultrasound images without relying on external tracking systems, offering a low-cost, portable, and widely deployable alternative for volumetric imaging. However, it presents significant challenges, including accurate inter-frame motion estimation, minimisation of drift accumulation over long sequences, and generalisability across scanning protocols. The TUS-REC2024 Challenge was established to benchmark and accelerate progress in trackerless 3D ultrasound reconstruction by providing a publicly available dataset for the first time, along with a baseline model and evaluation framework. The Challenge attracted over 43 registered teams, of which 6 teams submitted 21 valid dockerized solutions. Submitted methods spanned a wide range of algorithmic approaches, including recurrent models, registration-driven volume refinement, attention, and physics-informed models. This paper presents an overview of the Challenge design, summarises the key characteristics of the dataset, provides a concise literature review, introduces the technical details of the underlying methodology working with tracked freehand ultrasound data, and offers a comparative analysis of submitted methods across multiple evaluation metrics. The results highlight both the progress and current limitations of state-of-the-art approaches in this domain, and inform directions for future research. The data, evaluation code, and baseline are publicly available to facilitate ongoing development and reproducibility. As a live and evolving benchmark, this Challenge is designed to be continuously developed and improved. The Challenge was held at MICCAI 2024 and will be organised again at MICCAI 2025, reflecting its growing impact and the sustained commitment to advancing this field.

arXiv.org

Demonstrating Interoperable Channel State Feedback Compression with Machine Learning arxiv.org/abs/2506.21796 .SP .AI

Demonstrating Interoperable Channel State Feedback Compression with Machine Learning

Neural network-based compression and decompression of channel state feedback has been one of the most widely studied applications of machine learning (ML) in wireless networks. Various simulation-based studies have shown that ML-based feedback compression can result in reduced overhead and more accurate channel information. However, to the best of our knowledge, there are no real-life proofs of concepts demonstrating the benefits of ML-based channel feedback compression in a practical setting, where the user equipment (UE) and base station have no access to each others' ML models. In this paper, we present a novel approach for training interoperable compression and decompression ML models in a confidential manner, and demonstrate the accuracy of the ensuing models using prototype UEs and base stations. The performance of the ML-based channel feedback is measured both in terms of the accuracy of the reconstructed channel information and achieved downlink throughput gains when using the channel information for beamforming. The reported measurement results demonstrate that it is possible to develop an accurate ML-based channel feedback link without having to share ML models between device and network vendors. These results pave the way for a practical implementation of ML-based channel feedback in commercial 6G networks.

arXiv.org

Adaptive Multipath-Based SLAM for Distributed MIMO Systems arxiv.org/abs/2506.21798 .SP

Adaptive Multipath-Based SLAM for Distributed MIMO Systems

Localizing users and mapping the environment using radio signals is a key task in emerging applications such as reliable communications, location-aware security, and safety critical navigation. Recently introduced multipath-based simultaneous localization and mapping (MP-SLAM) can jointly localize a mobile agent and the reflective surfaces in radio frequency (RF) environments. Most existing MP-SLAM methods assume that map features and their corresponding RF propagation paths are statistically independent, which neglects inherent dependencies arising when a single reflective surface contributes to different propagation paths or when an agent communicates with more than one base station. Previous approaches that aim to fuse information across propagation paths are limited by their inability to perform ray tracing in environments with nonconvex geometries. In this paper, we propose a Bayesian MP-SLAM method for distributed MIMO systems that addresses this limitation. In particular, we use amplitude statistics to establish adaptive time-varying detection probabilities. Based on the resulting "soft" ray-tracing strategy, our method can fuse information across propagation paths in RF environments with nonconvex geometries. A Bayesian estimation method for the joint estimation of map features and agent position is established by applying the message passing rules of the sum-product algorithm (SPA) to the factor graph that represents the proposed statistical model. We also introduce an improved proposal PDF for particle-based computation of SPA messages. This proposal PDF enables the early detection of new surfaces that are solely supported by double-bounce paths. Our method is validated using synthetic RF measurements in a challenging scenario with nonconvex geometries. The results demonstrate that it can provide accurate localization and mapping estimates as well as attain the posterior CRLB.

arXiv.org

From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining arxiv.org/abs/2506.21803 .SP .AI .LG

From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining

Electrocardiograms (ECGs) play a vital role in monitoring cardiac health and diagnosing heart diseases. However, traditional deep learning approaches for ECG analysis rely heavily on large-scale manual annotations, which are both time-consuming and resource-intensive to obtain. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising alternative, enabling the extraction of robust ECG representations that can be efficiently transferred to various downstream tasks. While previous studies have explored SSL for ECG pretraining and multi-modal ECG-language alignment, they often fail to capture the multi-scale nature of ECG signals. As a result, these methods struggle to learn generalized representations due to their inability to model the hierarchical structure of ECG data. To address this gap, we introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs. MELP first pretrains a cardiology-specific language model to enhance its understanding of clinical text. It then applies three levels of cross-modal supervision-at the token, beat, and rhythm levels-to align ECG signals with textual reports, capturing structured information across different time scales. We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning. Experimental results demonstrate that MELP outperforms existing SSL methods, underscoring its effectiveness and adaptability across diverse clinical applications. Our code is available at https://github.com/HKU-MedAI/MELP.

arXiv.org

Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG arxiv.org/abs/2506.20683 .IV .SP .AI .CV

Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG

An electrocardiogram (ECG) is a widely used, cost-effective tool for detecting electrical abnormalities in the heart. However, it cannot directly measure functional parameters, such as ventricular volumes and ejection fraction, which are crucial for assessing cardiac function. Cardiac magnetic resonance (CMR) is the gold standard for these measurements, providing detailed structural and functional insights, but is expensive and less accessible. To bridge this gap, we propose PTACL (Patient and Temporal Alignment Contrastive Learning), a multimodal contrastive learning framework that enhances ECG representations by integrating spatio-temporal information from CMR. PTACL uses global patient-level contrastive loss and local temporal-level contrastive loss. The global loss aligns patient-level representations by pulling ECG and CMR embeddings from the same patient closer together, while pushing apart embeddings from different patients. Local loss enforces fine-grained temporal alignment within each patient by contrasting encoded ECG segments with corresponding encoded CMR frames. This approach enriches ECG representations with diagnostic information beyond electrical activity and transfers more insights between modalities than global alignment alone, all without introducing new learnable weights. We evaluate PTACL on paired ECG-CMR data from 27,951 subjects in the UK Biobank. Compared to baseline approaches, PTACL achieves better performance in two clinically relevant tasks: (1) retrieving patients with similar cardiac phenotypes and (2) predicting CMR-derived cardiac function parameters, such as ventricular volumes and ejection fraction. Our results highlight the potential of PTACL to enhance non-invasive cardiac diagnostics using ECG. The code is available at: https://github.com/alsalivan/ecgcmr

arXiv.org

Building Lightweight Semantic Segmentation Models for Aerial Images Using Dual Relation Distillation arxiv.org/abs/2506.20688 .IV

Building Lightweight Semantic Segmentation Models for Aerial Images Using Dual Relation Distillation

Recently, there have been significant improvements in the accuracy of CNN models for semantic segmentation. However, these models are often heavy and suffer from low inference speed, which limits their practical application. To address this issue, knowledge distillation has emerged as a promising approach to achieve a good trade-off between segmentation accuracy and efficiency. In this paper, we propose a novel dual relation distillation (DRD) technique that transfers both spatial and channel relations in feature maps from a cumbersome model (teacher) to a compact model (student). Specifically, we compute spatial and channel relation maps separately for the teacher and student models, and then align corresponding relation maps by minimizing their distance. Since the teacher model usually learns more information and collects richer spatial and channel correlations than the student model, transferring these correlations from the teacher to the student can help the student mimic the teacher better in terms of feature distribution, thus improving the segmentation accuracy of the student model. We conduct comprehensive experiments on three segmentation datasets, including two widely adopted benchmarks in the remote sensing field (Vaihingen and Potsdam datasets) and one popular benchmark in general scene (Cityscapes dataset). The experimental results demonstrate that our novel distillation framework can significantly boost the performance of the student network without incurring extra computational overhead.

arXiv.org

U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs arxiv.org/abs/2506.20689 .IV .AI .CV .LG

U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs

Artificial intelligence, including deep learning models, will play a transformative role in automated medical image analysis for the diagnosis of cardiac disorders and their management. Automated accurate delineation of cardiac images is the first necessary initial step for the quantification and automated diagnosis of cardiac disorders. In this paper, we propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channel-attention, and spatial attention, together with edge-detection based skip-connections for an accurate fully-automated semantic segmentation of cardiac magnetic resonance (CMR) images. The model extracts local-features and their interrelationships using a stack of combination convolution blocks, with embedded channel and spatial attention in the convolution block, and vision transformers. Deep embedding of channel and spatial attention in the convolution block identifies important features and their spatial localization. The combined edge information with channel and spatial attention as skip connection reduces information-loss during convolution transformations. The overall model significantly improves the semantic segmentation of CMR images necessary for improved medical image analysis. An algorithm for the dual attention module (channel and spatial attention) has been presented. Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC metrics. The model outperforms the accuracy attained by other models, based on DSC and HD metrics, especially for the delineation of right-ventricle and left-ventricle-myocardium.

arXiv.org

Distributed Lyapunov Functions for Nonlinear Networks arxiv.org/abs/2506.20728 -mat.dis-nn .SY .DS .SY

Distributed Lyapunov Functions for Nonlinear Networks

Nonlinear networks are often multistable, exhibiting coexisting stable states with competing regions of attraction (ROAs). As a result, ROAs can have complex "tentacle-like" morphologies that are challenging to characterize analytically or computationally. In addition, the high dimensionality of the state space prohibits the automated construction of Lyapunov functions using state-of-the-art optimization methods, such as sum-of-squares (SOS) programming. In this letter, we propose a distributed approach for the construction of Lyapunov functions based solely on local information. To this end, we establish an augmented comparison lemma that characterizes the existence conditions of partial Lyapunov functions, while also accounting for residual effects caused by the associated dimensionality reduction. These theoretical results allow us to formulate an SOS optimization that iteratively constructs such partial functions, whose aggregation forms a composite Lyapunov function. The resulting composite function provides accurate convex approximations of both the volumes and shapes of the ROAs. We validate our method on networks of van der Pol and Ising oscillators, demonstrating its effectiveness in characterizing high-dimensional systems with non-convex ROAs.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.