TVC: Tokenized Video Compression with Ultra-Low Bitrate arxiv.org/abs/2504.16953 .IV

TVC: Tokenized Video Compression with Ultra-Low Bitrate

Tokenized visual representations have shown great promise in image compression, yet their extension to video remains underexplored due to the challenges posed by complex temporal dynamics and stringent bitrate constraints. In this paper, we propose Tokenized Video Compression (TVC), the first token-based dual-stream video compression framework designed to operate effectively at ultra-low bitrates. TVC leverages the powerful Cosmos video tokenizer to extract both discrete and continuous token streams. The discrete tokens (i.e., code maps generated by FSQ) are partially masked using a strategic masking scheme, then compressed losslessly with a discrete checkerboard context model to reduce transmission overhead. The masked tokens are reconstructed by a decoder-only transformer with spatiotemporal token prediction. Meanwhile, the continuous tokens, produced via an autoencoder (AE), are quantized and compressed using a continuous checkerboard context model, providing complementary continuous information at ultra-low bitrate. At the Decoder side, both streams are fused using ControlNet, with multi-scale hierarchical integration to ensure high perceptual quality alongside strong fidelity in reconstruction. This work mitigates the long-standing skepticism about the practicality of tokenized video compression and opens up new avenues for semantics-aware, token-native video compression.

arXiv.org

Iterative Collaboration Network Guided By Reconstruction Prior for Medical Image Super-Resolution arxiv.org/abs/2504.16958 .IV

Iterative Collaboration Network Guided By Reconstruction Prior for Medical Image Super-Resolution

High-resolution medical images can provide more detailed information for better diagnosis. Conventional medical image super-resolution relies on a single task which first performs the extraction of the features and then upscaling based on the features. The features extracted may not be complete for super-resolution. Recent multi-task learning,including reconstruction and super-resolution, is a good solution to obtain additional relevant information. The interaction between the two tasks is often insufficient, which still leads to incomplete and less relevant deep features. To address above limitations, we propose an iterative collaboration network (ICONet) to improve communications between tasks by progressively incorporating reconstruction prior to the super-resolution learning procedure in an iterative collaboration way. It consists of a reconstruction branch, a super-resolution branch, and a SR-Rec fusion module. The reconstruction branch generates the artifact-free image as prior, which is followed by a super-resolution branch for prior knowledge-guided super-resolution. Unlike the widely-used convolutional neural networks for extracting local features and Transformers with quadratic computational complexity for modeling long-range dependencies, we develop a new residual spatial-channel feature learning (RSCFL) module of two branches to efficiently establish feature relationships in spatial and channel dimensions. Moreover, the designed SR-Rec fusion module fuses the reconstruction prior and super-resolution features with each other in an adaptive manner. Our ICONet is built with multi-stage models to iteratively upscale the low-resolution images using steps of 2x and simultaneously interact between two branches in multi-stage supervisions.

arXiv.org

Physiological neural representation for personalised tracer kinetic parameter estimation from dynamic PET arxiv.org/abs/2504.17122 .IV .AI .CV

Physiological neural representation for personalised tracer kinetic parameter estimation from dynamic PET

Dynamic positron emission tomography (PET) with [$^{18}$F]FDG enables non-invasive quantification of glucose metabolism through kinetic analysis, often modelled by the two-tissue compartment model (TCKM). However, voxel-wise kinetic parameter estimation using conventional methods is computationally intensive and limited by spatial resolution. Deep neural networks (DNNs) offer an alternative but require large training datasets and significant computational resources. To address these limitations, we propose a physiological neural representation based on implicit neural representations (INRs) for personalized kinetic parameter estimation. INRs, which learn continuous functions, allow for efficient, high-resolution parametric imaging with reduced data requirements. Our method also integrates anatomical priors from a 3D CT foundation model to enhance robustness and precision in kinetic modelling. We evaluate our approach on an [$^{18}$F]FDG dynamic PET/CT dataset and compare it to state-of-the-art DNNs. Results demonstrate superior spatial resolution, lower mean-squared error, and improved anatomical consistency, particularly in tumour and highly vascularized regions. Our findings highlight the potential of INRs for personalized, data-efficient tracer kinetic modelling, enabling applications in tumour characterization, segmentation, and prognostic assessment.

arXiv.org

PACE: A Framework for Learning and Control in Linear Incomplete-Information Differential Games arxiv.org/abs/2504.17128 .SY .LG .MA .SY

PACE: A Framework for Learning and Control in Linear Incomplete-Information Differential Games

In this paper, we address the problem of a two-player linear quadratic differential game with incomplete information, a scenario commonly encountered in multi-agent control, human-robot interaction (HRI), and approximation methods for solving general-sum differential games. While solutions to such linear differential games are typically obtained through coupled Riccati equations, the complexity increases when agents have incomplete information, particularly when neither is aware of the other's cost function. To tackle this challenge, we propose a model-based Peer-Aware Cost Estimation (PACE) framework for learning the cost parameters of the other agent. In PACE, each agent treats its peer as a learning agent rather than a stationary optimal agent, models their learning dynamics, and leverages this dynamic to infer the cost function parameters of the other agent. This approach enables agents to infer each other's objective function in real time based solely on their previous state observations and dynamically adapt their control policies. Furthermore, we provide a theoretical guarantee for the convergence of parameter estimation and the stability of system states in PACE. Additionally, in our numerical studies, we demonstrate how modeling the learning dynamics of the other agent benefits PACE, compared to approaches that approximate the other agent as having complete information, particularly in terms of stability and convergence speed.

arXiv.org

Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference arxiv.org/abs/2504.17129 .SY .AI .GT .RO .SY

Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference

Human-robot interactions can be modeled as incomplete-information general-sum dynamic games since the objective functions of both agents are not explicitly known to each other. However, solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (LQ) approximation of the nonlinear general-sum game, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions, leading to unbiased fast learning in inferring the unknown objective function of the peer agent, which is critical for task completion and safety assurance. Additionally, we demonstrate how N-PACE enables \textbf{intent communication} in such multi-agent systems by explicitly modeling the peer's learning dynamics.

arXiv.org

Nearly Optimal Nonlinear Safe Control with BaS-SDRE arxiv.org/abs/2504.15453 .SY .RO .SY

Nearly Optimal Nonlinear Safe Control with BaS-SDRE

The State-Dependent Riccati Equation (SDRE) approach has emerged as a systematic and effective means of designing nearly optimal nonlinear controllers. The Barrier States (BaS) embedding methodology was developed recently for safe multi-objective controls in which the safety condition is manifested as a state to be controlled along with other states of the system. The overall system, termed the safety embedded system, is highly nonlinear even if the original system is linear. This paper develops a nonlinear nearly optimal safe feedback control technique by combining the two strategies effectively. First, the BaS is derived in an extended linearization formulation to be subsequently used to form an extended safety embedded system. A new optimal control problem is formed thereafter, which is used to construct a safety embedded State-Dependent Riccati Equation, termed BaS-SDRE, whose solution approximates the solution of the optimal control problem's associated Hamilton-Jacobi-Bellman (HJB) equation. The BaS-SDRE is then solved online to synthesize the nearly optimal safe control. The proposed technique's efficacy is demonstrated on an unstable, constrained linear system that shows how the synthesized control reacts to nonlinearities near the unsafe region, a nonlinear flight control system with limited path angular velocity that exists due to structural and dynamic concerns, and a planar quadrotor system that navigates safely in a crowded environment.

arXiv.org

Element-Grouping Strategy for Intelligent Reflecting Surface: Performance Analysis and Algorithm Optimization arxiv.org/abs/2504.15520 .SP

Element-Grouping Strategy for Intelligent Reflecting Surface: Performance Analysis and Algorithm Optimization

As a revolutionary paradigm for intelligently controlling wireless channels, intelligent reflecting surface (IRS) has emerged as a promising technology for future sixth-generation (6G) wireless communications. While IRS-aided communication systems can achieve attractive high channel gains, existing schemes require plenty of IRS elements to mitigate the ``multiplicative fading'' effect in cascaded channels, leading to high complexity for real-time beamforming and high signaling overhead for channel estimation. In this paper, the concept of sustainable intelligent element-grouping IRS (IEG-IRS) is proposed to overcome those fundamental bottlenecks. Specifically, based on the statistical channel state information (S-CSI), the proposed grouping strategy intelligently pre-divide the IEG-IRS elements into multiple groups based on the beam-domain grouping method, with each group sharing the common reflection coefficient and being optimized in real time using the instantaneous channel state information (I-CSI). Then, we further analyze the asymptotic performance of the IEG-IRS to reveal the substantial capacity gain in an extremely large-scale IRS (XL-IRS) aided single-user single-input single-output (SU-SISO) system. In particular, when a line-of-sight (LoS) component exists, it demonstrates that the combined cascaded link can be considered as a ``deterministic virtual LoS'' channel, resulting in a sustainable squared array gain achieved by the IEG-IRS. Finally, we formulate a weighted-sum-rate (WSR) maximization problem for an IEG-IRS-aided multiuser multiple-input single-output (MU-MISO) system and a two-stage algorithm for optimizing the beam-domain grouping strategy and the multi-user active-passive beamforming is proposed.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.