Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation arxiv.org/abs/2504.18539 .AS .LG .MM .SD

Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

Audio-visual speech recognition (AVSR) incorporates auditory and visual modalities to improve recognition accuracy, particularly in noisy environments where audio-only speech systems are insufficient. While previous research has largely addressed audio disruptions, few studies have dealt with visual corruptions, e.g., lip occlusions or blurred videos, which are also detrimental. To address this real-world challenge, we propose CAV2vec, a novel self-supervised speech representation learning framework particularly designed to handle audio-visual joint corruption. CAV2vec employs a self-distillation approach with a corrupted prediction task, where the student model learns to predict clean targets, generated by the teacher model, with corrupted input frames. Specifically, we suggest a unimodal multi-task learning, which distills cross-modal knowledge and aligns the corrupted modalities, by predicting clean audio targets with corrupted videos, and clean video targets with corrupted audios. This strategy mitigates the dispersion in the representation space caused by corrupted modalities, leading to more reliable and robust audio-visual fusion. Our experiments on robust AVSR benchmarks demonstrate that the corrupted representation learning method significantly enhances recognition accuracy across generalized environments involving various types of corruption.

arXiv.org

A Unified Alternating Optimization Framework for Joint Sensor and Actuator Configuration in LQG Systems arxiv.org/abs/2504.18731 .SY .SY

A Unified Alternating Optimization Framework for Joint Sensor and Actuator Configuration in LQG Systems

This paper fills a gap in the literature by considering a joint sensor and actuator configuration problem under the linear quadratic Gaussian (LQG) performance without assuming a predefined set of candidate components. Different from the existing research, which primarily focuses on selecting or placing sensors and actuators from a fixed group, we consider a more flexible formulation where these components must be designed from scratch, subject to general-form configuration costs and constraints. To address this challenge, we first analytically characterize the gradients of the LQG performance with respect to the sensor and actuator matrices using algebraic Riccati equations. Subsequently, we derive first-order optimality conditions based on the Karush-Kuhn-Tucker (KKT) analysis and develop a unified alternating direction method of multipliers (ADMM)-based alternating optimization framework to address the general-form sensor and actuator configuration problem. Furthermore, we investigate three representative scenarios: sparsity promoting configuration, low-rank promoting configuration, and structure-constrained configuration. For each scenario, we provide in-depth analysis and develop tailored computational schemes. The proposed framework ensures numerical efficiency and adaptability to various design constraints and configuration costs, making it well-suited for integration into numerical solvers.

arXiv.org

Nonconvex Linear System Identification with Minimal State Representation arxiv.org/abs/2504.18791 .SY .SP .ML .LG .SY

Nonconvex Linear System Identification with Minimal State Representation

Low-order linear System IDentification (SysID) addresses the challenge of estimating the parameters of a linear dynamical system from finite samples of observations and control inputs with minimal state representation. Traditional approaches often utilize Hankel-rank minimization, which relies on convex relaxations that can require numerous, costly singular value decompositions (SVDs) to optimize. In this work, we propose two nonconvex reformulations to tackle low-order SysID (i) Burer-Monterio (BM) factorization of the Hankel matrix for efficient nuclear norm minimization, and (ii) optimizing directly over system parameters for real, diagonalizable systems with an atomic norm style decomposition. These reformulations circumvent the need for repeated heavy SVD computations, significantly improving computational efficiency. Moreover, we prove that optimizing directly over the system parameters yields lower statistical error rates, and lower sample complexities that do not scale linearly with trajectory length like in Hankel-nuclear norm minimization. Additionally, while our proposed formulations are nonconvex, we provide theoretical guarantees of achieving global optimality in polynomial time. Finally, we demonstrate algorithms that solve these nonconvex programs and validate our theoretical claims on synthetic data.

arXiv.org

Reservoir-enhanced Segment Anything Model for Subsurface Diagnosis arxiv.org/abs/2504.18802 .IV .CV .LG

Reservoir-enhanced Segment Anything Model for Subsurface Diagnosis

Urban roads and infrastructure, vital to city operations, face growing threats from subsurface anomalies like cracks and cavities. Ground Penetrating Radar (GPR) effectively visualizes underground conditions employing electromagnetic (EM) waves; however, accurate anomaly detection via GPR remains challenging due to limited labeled data, varying subsurface conditions, and indistinct target boundaries. Although visually image-like, GPR data fundamentally represent EM waves, with variations within and between waves critical for identifying anomalies. Addressing these, we propose the Reservoir-enhanced Segment Anything Model (Res-SAM), an innovative framework exploiting both visual discernibility and wave-changing properties of GPR data. Res-SAM initially identifies apparent candidate anomaly regions given minimal prompts, and further refines them by analyzing anomaly-induced changing information within and between EM waves in local GPR data, enabling precise and complete anomaly region extraction and category determination. Real-world experiments demonstrate that Res-SAM achieves high detection accuracy (>85%) and outperforms state-of-the-art. Notably, Res-SAM requires only minimal accessible non-target data, avoids intensive training, and incorporates simple human interaction to enhance reliability. Our research provides a scalable, resource-efficient solution for rapid subsurface anomaly detection across diverse environments, improving urban safety monitoring while reducing manual effort and computational cost.

arXiv.org

DMA Reception for Simultaneous Area-Wide Sensing and Multi-User Uplink Communications arxiv.org/abs/2504.18843 .SP

DMA Reception for Simultaneous Area-Wide Sensing and Multi-User Uplink Communications

The recent surge in deploying extremely large antenna arrays is expected to play a vital role in future sixth generation wireless networks, enabling advanced radar target localization with enhanced angular and range resolution. This paper focuses on the promising technology of Dynamic Metasurface Antennas (DMAs), integrating numerous sub-wavelength-spaced metamaterials within a single aperture, and presents a novel framework for designing its analog reception beamforming weights with the goal to optimize sensing performance within a spatial Area of Interest (AoI), while simultaneously guaranteeing desired multi-user uplink communication performance. We derive the Cramer-Rao Bound (CRB) with DMA-based reception for both passive and active radar targets lying inside the AoI, which is then used as the optimization objective for configuring the discrete tunable phases of the metamaterials. Capitalizing on the DMA partially-connected architecture, we formulate the design problem as convex optimization and present both direct CRB minimization approaches and low complexity alternatives using a lower-bound approximation. Simulation results across various scenarios validate the effectiveness of the proposed framework, showing it consistently outperforms existing state-of-the-art methods.

arXiv.org

Recursive Identification of Structured Systems: An Instrumental-Variable Approach Applied to Mechanical Systems arxiv.org/abs/2504.17927 .SY .SY

Recursive Identification of Structured Systems: An Instrumental-Variable Approach Applied to Mechanical Systems

Online system identification algorithms are widely used for monitoring, diagnostics and control by continuously adapting to time-varying dynamics. Typically, these algorithms consider a model structure that lacks parsimony and offers limited physical interpretability. The objective of this paper is to develop a real-time parameter estimation algorithm aimed at identifying time-varying dynamics within an interpretable model structure. An additive model structure is adopted for this purpose, which offers enhanced parsimony and is shown to be particularly suitable for mechanical systems. The proposed approach integrates the recursive simplified refined instrumental variable method with block-coordinate descent to minimize an exponentially-weighted output error cost function. This novel recursive identification method delivers parametric continuous-time additive models and is applicable in both open-loop and closed-loop controlled systems. Its efficacy is shown using numerical simulations and is further validated using experimental data to detect the time-varying resonance dynamics of a flexible beam system. These results demonstrate the effectiveness of the proposed approach for online and interpretable estimation for advanced monitoring and control applications.

arXiv.org

Predicting Dairy Calf Body Weight from Depth Images Using Deep Learning (YOLOv8) and Threshold Segmentation with Cross-Validation and Longitudinal Analysis arxiv.org/abs/2504.17943 .IV .CV

Predicting Dairy Calf Body Weight from Depth Images Using Deep Learning (YOLOv8) and Threshold Segmentation with Cross-Validation and Longitudinal Analysis

Monitoring calf body weight (BW) before weaning is essential for assessing growth, feed efficiency, health, and weaning readiness. However, labor, time, and facility constraints limit BW collection. Additionally, Holstein calf coat patterns complicate image-based BW estimation, and few studies have explored non-contact measurements taken at early time points for predicting later BW. The objectives of this study were to (1) develop deep learning-based segmentation models for extracting calf body metrics, (2) compare deep learning segmentation with threshold-based methods, and (3) evaluate BW prediction using single-time-point cross-validation with linear regression (LR) and extreme gradient boosting (XGBoost) and multiple-time-point cross-validation with LR, XGBoost, and a linear mixed model (LMM). Depth images from Holstein (n = 63) and Jersey (n = 5) pre-weaning calves were collected, with 20 Holstein calves being weighed manually. Results showed that You Only Look Once version 8 (YOLOv8) deep learning segmentation (intersection over union = 0.98) outperformed threshold-based methods (0.89). In single-time-point cross-validation, XGBoost achieved the best BW prediction (R^2 = 0.91, mean absolute percentage error (MAPE) = 4.37%), while LMM provided the most accurate longitudinal BW prediction (R^2 = 0.99, MAPE = 2.39%). These findings highlight the potential of deep learning for automated BW prediction, enhancing farm management.

arXiv.org

Mixed Bernstein-Fourier Approximants for Optimal Trajectory Generation with Periodic Behavior arxiv.org/abs/2504.17969 .SY .SY

Mixed Bernstein-Fourier Approximants for Optimal Trajectory Generation with Periodic Behavior

Efficient trajectory generation is critical for autonomous systems, yet current numerical methods often struggle to handle periodic behaviors effectively, especially when equidistant time nodes are required. This paper introduces a novel mixed Bernstein-Fourier approximation framework tailored explicitly for optimal motion planning. Our proposed methodology leverages the uniform convergence properties of Bernstein polynomials for nonperiodic behaviors while effectively capturing periodic dynamics through Fourier series. Theoretical results are established, including uniform convergence proofs for approximations of functions, derivatives, and integrals, as well as detailed error bound analyses. We further introduce a regulated least squares approach for determining approximation coefficients, enhancing numerical stability and practical applicability. Within an optimal control context, we establish feasibility and consistency of approximated solutions to their continuous counterparts. We also extend the covector mapping theorem, providing theoretical guarantees for approximating dual variables crucial in verifying the necessary optimality conditions from Pontryagin's Maximum Principle. Comprehensive numerical examples illustrate the method's superior performance, demonstrating substantial improvements in computational efficiency and precision in scenarios with complex periodic constraints and dynamics. Our mixed Bernstein-Fourier methodology thus presents a robust, theoretically grounded, and computationally efficient approach for advanced optimal trajectory planning in autonomous systems.

arXiv.org

Optimal Power Allocation for OFDM-based Ranging Using Random Communication Signals arxiv.org/abs/2504.18016 .SP

Optimal Power Allocation for OFDM-based Ranging Using Random Communication Signals

High-precision ranging plays a crucial role in future 6G Integrated Sensing and Communication (ISAC) systems. To improve the ranging performance while maximizing the resource utilization efficiency, future 6G ISAC networks have to reuse data payload signals for both communication and sensing, whose inherent randomness may deteriorate the ranging performance. To address this issue, this paper investigates the power allocation (PA) design for an OFDM-based ISAC system under random signaling, aiming to reduce the ranging sidelobe level of both periodic and aperiodic auto-correlation functions (P-ACF and A-ACF) of the ISAC signal. Towards that end, we first derive the closed-form expressions of the average squared P-ACF and A-ACF, and then propose to minimize the expectation of the integrated sidelobe level (EISL) under arbitrary constellation mapping. We then rigorously prove that the uniform PA scheme achieves the global minimum of the EISL for both P-ACF and A-ACF. As a step further, we show that this scheme also minimizes the P-ACF sidelobe level at every lag. Moreover, we extend our analysis to the P-ACF case with frequency-domain zero-padding, which is a typical approach to improve the ranging resolution. We reveal that there exists a tradeoff between sidelobe level and mainlobe width, and propose a project gradient descent algorithm to seek a locally optimal PA scheme that reduces the EISL. Finally, we validate our theoretical findings through extensive simulation results, confirming the effectiveness of the proposed PA methods in reducing the ranging sidelobe level for random OFDM signals.

arXiv.org

Physics-Driven Neural Compensation For Electrical Impedance Tomography arxiv.org/abs/2504.18067 .IV .CV

Physics-Driven Neural Compensation For Electrical Impedance Tomography

Electrical Impedance Tomography (EIT) provides a non-invasive, portable imaging modality with significant potential in medical and industrial applications. Despite its advantages, EIT encounters two primary challenges: the ill-posed nature of its inverse problem and the spatially variable, location-dependent sensitivity distribution. Traditional model-based methods mitigate ill-posedness through regularization but overlook sensitivity variability, while supervised deep learning approaches require extensive training data and lack generalization. Recent developments in neural fields have introduced implicit regularization techniques for image reconstruction, but these methods typically neglect the physical principles underlying EIT, thus limiting their effectiveness. In this study, we propose PhyNC (Physics-driven Neural Compensation), an unsupervised deep learning framework that incorporates the physical principles of EIT. PhyNC addresses both the ill-posed inverse problem and the sensitivity distribution by dynamically allocating neural representational capacity to regions with lower sensitivity, ensuring accurate and balanced conductivity reconstructions. Extensive evaluations on both simulated and experimental data demonstrate that PhyNC outperforms existing methods in terms of detail preservation and artifact resistance, particularly in low-sensitivity regions. Our approach enhances the robustness of EIT reconstructions and provides a flexible framework that can be adapted to other imaging modalities with similar challenges.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.