Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks arxiv.org/abs/2507.02078 .SY .SY

Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks

Accurate and scalable surrogate models for AC power flow are essential for real-time grid monitoring, contingency analysis, and decision support in increasingly dynamic and inverter-dominated power systems. However, most existing surrogates fall short of practical deployment due to their limited capacity to capture long-range nonlinear dependencies in meshed transmission networks and their weak enforcement of physical laws. These models often require extensive hyperparameter tuning, exhibit poor generalization under topology changes or large load swings, and typically do not quantify uncertainty or scale well beyond a few hundred buses. To address these challenges, this paper proposes a \textit{gated graph neural network (GGNN)} surrogate for AC power-flow estimation under topological uncertainty. The model is trained across multiple IEEE benchmark networks of varying size and complexity, each incorporating randomized line contingencies and up to 40\% load variation. To improve robustness and generalization, we explore both conventional supervised learning and physics-informed self-supervised training strategies. Comparative evaluations show that the proposed GGNN consistently outperforms prior GNN-based surrogates, achieving predictions closely aligned with Newton--Raphson solutions. By embedding operational constraints directly into the architecture and loss function, the model ensures physical consistency and delivers a lightweight, accurate, and scalable tool for real-time grid operations.

arXiv.org

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams arxiv.org/abs/2507.02115 .AS

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Synthesizing second-language (L2) speech is potentially highly valued for L2 language learning experience and feedback. However, due to the lack of L2 speech synthesis datasets, it is difficult to synthesize L2 speech for low-resourced languages. In this paper, we provide a practical solution for editing native speech to approximate L2 speech and present PPG2Speech, a diffusion-based multispeaker Phonetic-Posteriorgrams-to-Speech model that is capable of editing a single phoneme without text alignment. We use Matcha-TTS's flow-matching decoder as the backbone, transforming Phonetic Posteriorgrams (PPGs) to mel-spectrograms conditioned on external speaker embeddings and pitch. PPG2Speech strengthens the Matcha-TTS's flow-matching decoder with Classifier-free Guidance (CFG) and Sway Sampling. We also propose a new task-specific objective evaluation metric, the Phonetic Aligned Consistency (PAC), between the edited PPGs and the PPGs extracted from the synthetic speech for editing effects. We validate the effectiveness of our method on Finnish, a low-resourced, nearly phonetic language, using approximately 60 hours of data. We conduct objective and subjective evaluations of our approach to compare its naturalness, speaker similarity, and editing effectiveness with TTS-based editing. Our source code is published at https://github.com/aalto-speech/PPG2Speech.

arXiv.org

Beyond Interval MDPs: Tight and Efficient Abstractions of Stochastic Systems arxiv.org/abs/2507.02213 .SY .SY

Beyond Interval MDPs: Tight and Efficient Abstractions of Stochastic Systems

This work addresses the general problem of control synthesis for continuous-space, discrete-time stochastic systems with probabilistic guarantees via finite abstractions. While established methods exist, they often trade off accuracy for tractability. We propose a unified abstraction framework that improves both the tightness of probabilistic guarantees and computational efficiency. First, we introduce multi-interval MDPs (MI-MDPs), a generalization of interval-valued MDPs (IMDPs), which allows multiple, possibly overlapping clusters of successor states. This results in tighter abstractions but with increased computational complexity. To mitigate this, we further propose a generalized form of MDPs with set-valued transition probabilities (SMDPs), which model transitions as a fixed probability to a state cluster, followed by a non-deterministic choice within the cluster, as a sound abstraction. We show that control synthesis for MI-MDPs reduces to robust dynamic programming via linear optimization, while SMDPs admit even more efficient synthesis algorithms that avoid linear programming altogether. Theoretically, we prove that, given the partitioning of the state and disturbance spaces, both MI-MDPs and SMDPs yield tighter probabilistic guarantees than IMDPs, and that SMDPs are tighter than MI-MDPs. Extensive experiments across several benchmarks validate our theoretical results and demonstrate that SMDPs achieve favorable trade-offs among tightness, memory usage, and computation time.

arXiv.org

Derivative-Free Optimization-Empowered Wireless Channel Reconfiguration for 6G arxiv.org/abs/2507.02243 .SP

Derivative-Free Optimization-Empowered Wireless Channel Reconfiguration for 6G

Reconfigurable antennas, including reconfigurable intelligent surface (RIS), movable antenna (MA), fluid antenna (FA), and other advanced antenna techniques, have been studied extensively in the context of reshaping wireless propagation environments for 6G and beyond wireless communications. Nevertheless, how to reconfigure/optimize the real-time controllable coefficients to achieve a favorable end-to-end wireless channel remains a substantial challenge, as it usually requires accurate modeling of the complex interaction between the reconfigurable devices and the electromagnetic waves, as well as knowledge of implicit channel propagation parameters. In this paper, we introduce a derivative-free optimization (a.k.a., zeroth-order (ZO) optimization) technique to directly optimize reconfigurable coefficients to shape the wireless end-to-end channel, without the need of channel modeling and estimation of the implicit environmental propagation parameters. We present the fundamental principles of ZO optimization and discuss its potential advantages in wireless channel reconfiguration. Two case studies for RIS and movable antenna-enabled single-input single-output (SISO) systems are provided to show the superiority of ZO-based methods as compared to state-of-the-art techniques. Finally, we outline promising future research directions and offer concluding insights on derivative-free optimization for reconfigurable antenna technologies.

arXiv.org

CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR arxiv.org/abs/2507.02289 .IV .CV

CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR

Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can be time-consuming and prohibitive, e.g., due to the administration of contrast agents. Cine CMR is a rapid and contrast-free imaging technique that can visualize both motion and structural abnormalities of the myocardium induced by acute MI. Therefore, we present a new end-to-end deep neural network, referred to as CineMyoPS, to segment myocardial pathologies, \ie scars and edema, solely from cine CMR images. Specifically, CineMyoPS extracts both motion and anatomy features associated with MI. Given the interdependence between these features, we design a consistency loss (resembling the co-training strategy) to facilitate their joint learning. Furthermore, we propose a time-series aggregation strategy to integrate MI-related features across the cardiac cycle, thereby enhancing segmentation accuracy for myocardial pathologies. Experimental results on a multi-center dataset demonstrate that CineMyoPS achieves promising performance in myocardial pathology segmentation, motion estimation, and anatomy segmentation.

arXiv.org

Workflow-Based Evaluation of Music Generation Systems arxiv.org/abs/2507.01022 .AS .HC .LG .MM .SD

Workflow-Based Evaluation of Music Generation Systems

This study presents an exploratory evaluation of Music Generation Systems (MGS) within contemporary music production workflows by examining eight open-source systems. The evaluation framework combines technical insights with practical experimentation through criteria specifically designed to investigate the practical and creative affordances of the systems within the iterative, non-linear nature of music production. Employing a single-evaluator methodology as a preliminary phase, this research adopts a mixed approach utilizing qualitative methods to form hypotheses subsequently assessed through quantitative metrics. The selected systems represent architectural diversity across both symbolic and audio-based music generation approaches, spanning composition, arrangement, and sound design tasks. The investigation addresses limitations of current MGS in music production, challenges and opportunities for workflow integration, and development potential as collaborative tools while maintaining artistic authenticity. Findings reveal these systems function primarily as complementary tools enhancing rather than replacing human expertise. They exhibit limitations in maintaining thematic and structural coherence that emphasize the indispensable role of human creativity in tasks demanding emotional depth and complex decision-making. This study contributes a structured evaluation framework that considers the iterative nature of music creation. It identifies methodological refinements necessary for subsequent comprehensive evaluations and determines viable areas for AI integration as collaborative tools in creative workflows. The research provides empirically-grounded insights to guide future development in the field.

arXiv.org

Prompt Mechanisms in Medical Imaging: A Comprehensive Survey arxiv.org/abs/2507.01055 .IV .AI .CV

Prompt Mechanisms in Medical Imaging: A Comprehensive Survey

Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performance and adaptability without extensive retraining. This systematic review critically examines the burgeoning landscape of prompt engineering in medical imaging. We dissect diverse prompt modalities, including textual instructions, visual prompts, and learnable embeddings, and analyze their integration for core tasks such as image generation, segmentation, and classification. Our synthesis reveals how these mechanisms improve task-specific outcomes by enhancing accuracy, robustness, and data efficiency and reducing reliance on manual feature engineering while fostering greater model interpretability by making the model's guidance explicit. Despite substantial advancements, we identify persistent challenges, particularly in prompt design optimization, data heterogeneity, and ensuring scalability for clinical deployment. Finally, this review outlines promising future trajectories, including advanced multimodal prompting and robust clinical integration, underscoring the critical role of prompt-driven AI in accelerating the revolution of diagnostics and personalized treatment planning in medicine.

arXiv.org

Imitation Learning for Satellite Attitude Control under Unknown Perturbations arxiv.org/abs/2507.01161 .SY .RO .SY

Imitation Learning for Satellite Attitude Control under Unknown Perturbations

This paper presents a novel satellite attitude control framework that integrates Soft Actor-Critic (SAC) reinforcement learning with Generative Adversarial Imitation Learning (GAIL) to achieve robust performance under various unknown perturbations. Traditional control techniques often rely on precise system models and are sensitive to parameter uncertainties and external perturbations. To overcome these limitations, we first develop a SAC-based expert controller that demonstrates improved resilience against actuator failures, sensor noise, and attitude misalignments, outperforming our previous results in several challenging scenarios. We then use GAIL to train a learner policy that imitates the expert's trajectories, thereby reducing training costs and improving generalization through expert demonstrations. Preliminary experiments under single and combined perturbations show that the SAC expert can rotate the antenna to a specified direction and keep the antenna orientation reliably stable in most of the listed perturbations. Additionally, the GAIL learner can imitate most of the features from the trajectories generated by the SAC expert. Comparative evaluations and ablation studies confirm the effectiveness of the SAC algorithm and reward shaping. The integration of GAIL further reduces sample complexity and demonstrates promising imitation capabilities, paving the way for more intelligent and autonomous spacecraft control systems.

arXiv.org

Classical Guitar Duet Separation using GuitarDuets -- a Dataset of Real and Synthesized Guitar Recordings arxiv.org/abs/2507.01172 .AS

Classical Guitar Duet Separation using GuitarDuets -- a Dataset of Real and Synthesized Guitar Recordings

Recent advancements in music source separation (MSS) have focused in the multi-timbral case, with existing architectures tailored for the separation of distinct instruments, overlooking thus the challenge of separating instruments with similar timbral characteristics. Addressing this gap, our work focuses on monotimbral MSS, specifically within the context of classical guitar duets. To this end, we introduce the GuitarDuets dataset, featuring a combined total of approximately three hours of real and synthesized classical guitar duet recordings, as well as note-level annotations of the synthesized duets. We perform an extensive cross-dataset evaluation by adapting Demucs, a state-of-the-art MSS architecture, to monotimbral source separation. Furthermore, we develop a joint permutation-invariant transcription and separation framework, to exploit note event predictions as auxiliary information. Our results indicate that utilizing both the real and synthesized subsets of GuitarDuets leads to improved separation performance in an independently recorded test set compared to utilizing solely one subset. We also find that while the availability of ground-truth note labels greatly helps the performance of the separation network, the predicted note estimates result only in marginal improvement. Finally, we discuss the behavior of commonly utilized metrics, such as SDR and SI-SDR, in the context of monotimbral MSS.

arXiv.org

An Adaptive Estimation Approach based on Fisher Information to Overcome the Challenges of LFP Battery SOC Estimation arxiv.org/abs/2507.01173 .SY .SY

An Adaptive Estimation Approach based on Fisher Information to Overcome the Challenges of LFP Battery SOC Estimation

Robust and Real-time State of Charge (SOC) estimation is essential for Lithium Iron Phosphate (LFP) batteries, which are widely used in electric vehicles (EVs) and energy storage systems due to safety and longevity. However, the flat Open Circuit Voltage (OCV)-SOC curve makes this task particularly challenging. This challenge is complicated by hysteresis effects, and real-world conditions such as current bias, voltage quantization errors, and temperature that must be considered in the battery management system use. In this paper, we proposed an adaptive estimation approach to overcome the challenges of LFPSOC estimation. Specifically, the method uses an adaptive fisher information fusion strategy that adaptively combines the SOC estimation from two different models, which are Coulomb counting and equivalent circuit model-based parameter identification. The effectiveness of this strategy is rationalized by the information richness excited by external cycling signals. A 3D OCV-H-SOC map that captures the relationship between OCV, hysteresis, and SOC was proposed as the backbone, and can be generalizable to other widely adopted parameter-identification methods. Extensive validation under ideal and real-world use scenarios, including SOC-OCV flat zones, current bias, voltage quantization errors, low temperatures, and insufficient current excitations, have been performed using 4 driving profiles, i.e., the Orange County Transit Bus Cycle, the California Unified Cycle, the US06 Drive Cycle, and the New York City Cycle, where the results demonstrate superiority over the state-of-the-art unscented Kalman filter, long short-term memory networks and transformer in all validation cases.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.