Show newer

SNR-EQ-JSCC: Joint Source-Channel Coding with SNR-Based Embedding and Query arxiv.org/abs/2501.04732 .IT .IT .AI

SNR-EQ-JSCC: Joint Source-Channel Coding with SNR-Based Embedding and Query

Coping with the impact of dynamic channels is a critical issue in joint source-channel coding (JSCC)-based semantic communication systems. In this paper, we propose a lightweight channel-adaptive semantic coding architecture called SNR-EQ-JSCC. It is built upon the generic Transformer model and achieves channel adaptation (CA) by Embedding the signal-to-noise ratio (SNR) into the attention blocks and dynamically adjusting attention scores through channel-adaptive Queries. Meanwhile, penalty terms are introduced in the loss function to stabilize the training process. Considering that instantaneous SNR feedback may be imperfect, we propose an alternative method that uses only the average SNR, which requires no retraining of SNR-EQ-JSCC. Simulation results conducted on image transmission demonstrate that the proposed SNR-EQJSCC outperforms the state-of-the-art SwinJSCC in peak signal-to-noise ratio (PSNR) and perception metrics while only requiring 0.05% of the storage overhead and 6.38% of the computational complexity for CA. Moreover, the channel-adaptive query method demonstrates significant improvements in perception metrics. When instantaneous SNR feedback is imperfect, SNR-EQ-JSCC using only the average SNR still surpasses baseline schemes.

arXiv.org

AI-Driven Reinvention of Hydrological Modeling for Accurate Predictions and Interpretation to Transform Earth System Modeling arxiv.org/abs/2501.04733 .ao-ph .AI .ET .LG

AI-Driven Reinvention of Hydrological Modeling for Accurate Predictions and Interpretation to Transform Earth System Modeling

Traditional equation-driven hydrological models often struggle to accurately predict streamflow in challenging regional Earth systems like the Tibetan Plateau, while hybrid and existing algorithm-driven models face difficulties in interpreting hydrological behaviors. This work introduces HydroTrace, an algorithm-driven, data-agnostic model that substantially outperforms these approaches, achieving a Nash-Sutcliffe Efficiency of 98% and demonstrating strong generalization on unseen data. Moreover, HydroTrace leverages advanced attention mechanisms to capture spatial-temporal variations and feature-specific impacts, enabling the quantification and spatial resolution of streamflow partitioning as well as the interpretation of hydrological behaviors such as glacier-snow-streamflow interactions and monsoon dynamics. Additionally, a large language model (LLM)-based application allows users to easily understand and apply HydroTrace's insights for practical purposes. These advancements position HydroTrace as a transformative tool in hydrological and broader Earth system modeling, offering enhanced prediction accuracy and interpretability.

arXiv.org

Towards resilient cities: A hybrid simulation framework for risk mitigation through data driven decision making arxiv.org/abs/2501.04746 .SY .MA .SY

Towards resilient cities: A hybrid simulation framework for risk mitigation through data driven decision making

Providing a comprehensive view of the city operation and offering useful metrics for decision making is a well known challenge for urban risk analysis systems. Existing systems are, in many cases, generalizations of previous domain specific tools and or methodologies that may not cover all urban interdependencies and makes it difficult to have homogeneous indicators. In order to overcome this limitation while seeking for effective support to decision makers, this article introduces a novel hybrid simulation framework for risk mitigation. The framework is built on a proposed city concept that considers urban space as a Complex Adaptive System composed by interconnected Critical Infrastructures. In this concept, a Social System, which models daily patterns and social interactions of the citizens in the Urban Landscape, drives the CIs demand to configure the full city picture. The frameworks hybrid design integrates agent based and network based modeling by breaking down city agents into system dependent subagents, to enable both inter and intra system interaction simulation, respectively. A layered structure of indicators at different aggregation levels is also developed, to ensure that decisions are not only data driven but also explainable. Therefore, the proposed simulation framework can serve as a DSS tool that allows the quantitative analysis of the impact of threats at different levels. First, system level metrics can be used to get a broad view on the city resilience. Then, agent level metrics back those figures and provide better explainability. On implementation, the proposed framework enables component reusability (for eased coding), simulation federation (enabling the integration of existing system oriented simulators), discrete simulation in accelerated time (for rapid scenario simulation) and decision oriented visualization (for informed outputs).

arXiv.org

Adaptive Algebraic Reuse of Reordering in Cholesky Factorization with Dynamic Sparsity Pattern arxiv.org/abs/2501.04011 .NA .GR .NA

Adaptive Algebraic Reuse of Reordering in Cholesky Factorization with Dynamic Sparsity Pattern

Cholesky linear solvers are a critical bottleneck in challenging applications within computer graphics and scientific computing. These applications include but are not limited to elastodynamic barrier methods such as Incremental Potential Contact (IPC), and geometric operations such as remeshing and morphology. In these contexts, the sparsity patterns of the linear systems frequently change across successive calls to the Cholesky solver, necessitating repeated symbolic analyses that dominate the overall solver runtime. To address this bottleneck, we evaluate our method on over 150,000 linear systems generated from diverse nonlinear problems with dynamic sparsity changes in Incremental Potential Contact (IPC) and patch remeshing on a wide range of triangular meshes of various sizes. Our analysis using three leading sparse Cholesky libraries, Intel MKL Pardiso, SuiteSparse CHOLMOD, and Apple Accelerate, reveals that the primary performance constraint lies in the symbolic re-ordering phase of the solver. Recognizing this, we introduce Parth, an innovative re-ordering method designed to update ordering vectors only where local connectivity changes occur adaptively. Parth employs a novel hierarchical graph decomposition algorithm to break down the dual graph of the input matrix into fine-grained subgraphs, facilitating the selective reuse of fill-reducing orderings when sparsity patterns exhibit temporal coherence. Our extensive evaluation demonstrates that Parth achieves up to a 255x and 13x speedup in fill-reducing ordering for our IPC and remeshing benchmark and a 6.85x and 10.7x acceleration in symbolic analysis. These enhancements translate to up to 2.95x and 5.89x reduction in overall solver runtime. Additionally, Parth's integration requires only three lines of code, resulting in significant computational savings without the requirement of changes to the computational stack.

arXiv.org

FlexCache: Flexible Approximate Cache System for Video Diffusion arxiv.org/abs/2501.04012 .MM .LG

FlexCache: Flexible Approximate Cache System for Video Diffusion

Text-to-Video applications receive increasing attention from the public. Among these, diffusion models have emerged as the most prominent approach, offering impressive quality in visual content generation. However, it still suffers from substantial computational complexity, often requiring several minutes to generate a single video. While prior research has addressed the computational overhead in text-to-image diffusion models, the techniques developed are not directly suitable for video diffusion models due to the significantly larger cache requirements and enhanced computational demands associated with video generation. We present FlexCache, a flexible approximate cache system that addresses the challenges in two main designs. First, we compress the caches before saving them to storage. Our compression strategy can reduce 6.7 times consumption on average. Then we find that the approximate cache system can achieve higher hit rate and computation savings by decoupling the object and background. We further design a tailored cache replacement policy to support the two techniques mentioned above better. Through our evaluation, FlexCache reaches 1.26 times higher throughput and 25% lower cost compared to the state-of-the-art diffusion approximate cache system.

arXiv.org

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition arxiv.org/abs/2501.03230 .AI .CV

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Existing research of video understanding still struggles to achieve in-depth comprehension and reasoning in complex videos, primarily due to the under-exploration of two key bottlenecks: fine-grained spatial-temporal perceptive understanding and cognitive-level video scene comprehension. This paper bridges the gap by presenting a novel solution. We first introduce a novel video Multimodal Large Language Model (MLLM), MotionEpic, which achieves fine-grained pixel-level spatial-temporal video grounding by integrating video spatial-temporal scene graph (STSG) representation. Building upon MotionEpic, we then develop a Video-of-Thought (VoT) reasoning framework. VoT inherits the Chain-of-Thought (CoT) core, breaking down a complex task into simpler and manageable sub-problems, and addressing them step-by-step from a low-level pixel perception to high-level cognitive interpretation. Extensive experiments across various complex video QA benchmarks demonstrate that our overall framework strikingly boosts existing state-of-the-art. To our knowledge, this is the first attempt at successfully implementing the CoT technique for achieving human-level video reasoning, where we show great potential in extending it to a wider range of video understanding scenarios. Project is open at https://haofei.vip/VoT

arXiv.org

Performance Comparison of Security Credential Management Systems for V2X: North American Standard IEEE 1609.2.1 and European Standard ETSI TS 102 941 arxiv.org/abs/2501.03237 .CR

gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography arxiv.org/abs/2501.03245 .CR .AR .DC

gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography

Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations. This paper presents gECC, a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Also, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our results show that, for ECDSA and ECDH, gECC can achieve performance improvements of 5.56x and 4.94x, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56x, compared to the state-of-the-art CPU-based system.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.