arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations https://arxiv.org/abs/2507.01974 #cs.SD #cs.LG

Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations

The accessibility of long-duration recorders, adapted to sometimes demanding field conditions, has enabled the deployment of extensive animal population monitoring campaigns through ecoacoustics. The effectiveness of automatic signal detection methods, increasingly based on neural approaches, is frequently evaluated solely through machine learning metrics, while acoustic analysis of performance remains rare. As part of the acoustic monitoring of Rock Ptarmigan populations, we propose here a simple method for acoustic analysis of the detection system's performance. The proposed measure is based on relating the signal-to-noise ratio of synthetic signals to their probability of detection. We show how this measure provides information about the system and allows optimisation of its training. We also show how it enables modelling of the detection distance, thus offering the possibility of evaluating its dynamics according to the sound environment and accessing an estimation of the spatial density of calls.

arXiv Computer Science @arxiv_cs@qoto.org

Learnable-Differentiable Finite Volume Solver for Accelerated Simulation of Flows https://arxiv.org/abs/2507.01975 #physics.flu-dyn #cs.LG #cs.AI

Learnable-Differentiable Finite Volume Solver for Accelerated Simulation of Flows

Simulation of fluid flows is crucial for modeling physical phenomena like meteorology, aerodynamics, and biomedicine. Classical numerical solvers often require fine spatiotemporal grids to satisfy stability, consistency, and convergence conditions, leading to substantial computational costs. Although machine learning has demonstrated better efficiency, they typically suffer from issues of interpretability, generalizability, and data dependency. Hence, we propose a learnable and differentiable finite volume solver, called LDSolver, designed for efficient and accurate simulation of fluid flows on spatiotemporal coarse grids. LDSolver comprises two key components: (1) a differentiable finite volume solver, and (2) an learnable module providing equivalent approximation for fluxes (derivatives and interpolations), and temporal error correction on coarse grids. Even with limited training data (e.g., only a few trajectories), our model could accelerate the simulation while maintaining a high accuracy with superior generalizability. Experiments on different flow systems (e.g., Burgers, decaying, forced and shear flows) show that LDSolver achieves state-of-the-art performance, surpassing baseline models with notable margins.

arXiv Computer Science @arxiv_cs@qoto.org

A Comprehensive Survey on Network Traffic Synthesis: From Statistical Models to Deep Learning https://arxiv.org/abs/2507.01976 #cs.NI #cs.LG

A Comprehensive Survey on Network Traffic Synthesis: From Statistical Models to Deep Learning

Synthetic network traffic generation has emerged as a promising alternative for various data-driven applications in the networking domain. It enables the creation of synthetic data that preserves real-world characteristics while addressing key challenges such as data scarcity, privacy concerns, and purity constraints associated with real data. In this survey, we provide a comprehensive review of synthetic network traffic generation approaches, covering essential aspects such as data types, generation models, and evaluation methods. With the rapid advancements in AI and machine learning, we focus particularly on deep learning-based techniques while also providing a detailed discussion of statistical methods and their extensions, including commercially available tools. Furthermore, we highlight open challenges in this domain and discuss potential future directions for further research and development. This survey serves as a foundational resource for researchers and practitioners, offering a structured analysis of existing methods, challenges, and opportunities in synthetic network traffic generation.

arXiv Computer Science @arxiv_cs@qoto.org

Recommendation Algorithms on Social Media: Unseen Drivers of Political Opinion https://arxiv.org/abs/2507.01978 #cs.SI

Recommendation Algorithms on Social Media: Unseen Drivers of Political Opinion

Social media broadly refers to digital platforms and applications that simulate social interactions online. This study investigates the impact of social media platforms and their algorithms on political interest among users. As social media usage continues to rise, platforms like Facebook and X (formerly Twitter) play increasingly pivotal roles in shaping political discourse. By employing statistical analyses on data collected from over 3,300 participants, this research identifies significant differences in how various social media platforms influence political interest. Findings reveal that moderate Facebook users demonstrate decreased political engagement, whereas even minimal engagement with X significantly boosts political interest. The study further identifies demographic variations, noting that males, older individuals, Black or African American users, those with higher incomes show greater political interest. The demographic analysis highlights that Republicans are particularly active on social media - potentially influencing their social media engagement patterns. However, the study acknowledges a crucial limitation - the lack of direct data regarding the content users are exposed to which is shaping their social media experiences. Future research should explore these influences and consider additional popular platforms to enhance the understanding of social media's political impact. Addressing these gaps can provide deeper insights into digital political mobilization, aiding policymakers, educators, and platform designers in fostering healthier democratic engagement.

arXiv Computer Science @arxiv_cs@qoto.org

DKGCM: A Spatio-Temporal Prediction Model for Traffic Flow by Fusing Spatial Node Clustering Method and Fourier Bidirectional Mamba Mechanism https://arxiv.org/abs/2507.01982 #cs.LG #cs.AI

DKGCM: A Spatio-Temporal Prediction Model for Traffic Flow by Fusing Spatial Node Clustering Method and Fourier Bidirectional Mamba Mechanism

Accurate traffic demand forecasting enables transportation management departments to allocate resources more effectively, thereby improving their utilization efficiency. However, complex spatiotemporal relationships in traffic systems continue to limit the performance of demand forecasting models. To improve the accuracy of spatiotemporal traffic demand prediction, we propose a new graph convolutional network structure called DKGCM. Specifically, we first consider the spatial flow distribution of different traffic nodes and propose a novel temporal similarity-based clustering graph convolution method, DK-GCN. This method utilizes Dynamic Time Warping (DTW) and K-means clustering to group traffic nodes and more effectively capture spatial dependencies. On the temporal scale, we integrate the Fast Fourier Transform (FFT) within the bidirectional Mamba deep learning framework to capture temporal dependencies in traffic demand. To further optimize model training, we incorporate the GRPO reinforcement learning strategy to enhance the loss function feedback mechanism. Extensive experiments demonstrate that our model outperforms several advanced methods and achieves strong results on three public datasets.

arXiv Computer Science @arxiv_cs@qoto.org

Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features https://arxiv.org/abs/2507.01984 #cs.LG #cs.CL #cs.SI

Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few studies have explored multimodal feature combinations, such as integrating text and images for building a classification model to detect misinformation. This study investigates the effectiveness of different multimodal feature combinations, incorporating text, images, and social features using an early fusion approach for the classification model. This study analyzed 1,529 tweets containing both text and images during the COVID-19 pandemic and election periods collected from Twitter (now X). A data enrichment process was applied to extract additional social features, as well as visual features, through techniques such as object detection and optical character recognition (OCR). The results show that combining unsupervised and supervised machine learning models improves classification performance by 15% compared to unimodal models and by 5% compared to bimodal models. Additionally, the study analyzes the propagation patterns of misinformation based on the characteristics of misinformation tweets and the users who disseminate them.

arXiv Computer Science @arxiv_cs@qoto.org

Scaling Out Chip Interconnect Networks with Implicit Sequence Numbers https://arxiv.org/abs/2507.01988 #cs.NI

Scaling Out Chip Interconnect Networks with Implicit Sequence Numbers

As AI models outpace the capabilities of single processors, interconnects across chips have become a critical enabler for scalable computing. These processors exchange massive amounts of data at cache-line granularity, prompting the adoption of new interconnect protocols like CXL, NVLink, and UALink, designed for high bandwidth and small payloads. However, the increasing transfer rates of these protocols heighten susceptibility to errors. While mechanisms like Cyclic Redundancy Check (CRC) and Forward Error Correction (FEC) are standard for reliable data transmission, scaling chip interconnects to multi-node configurations introduces new challenges, particularly in managing silently dropped flits in switching devices. This paper introduces Implicit Sequence Number (ISN), a novel mechanism that ensures precise flit drop detection and in-order delivery without adding header overhead. Additionally, we propose Reliability Extended Link (RXL), an extension of CXL that incorporates ISN to support scalable, reliable multi-node interconnects while maintaining compatibility with the existing flit structure. By elevating CRC to a transport-layer mechanism for end-to-end data and sequence integrity, and relying on FEC for link-layer error correction and detection, RXL delivers robust reliability and scalability without compromising bandwidth efficiency.

arXiv Computer Science @arxiv_cs@qoto.org

Curated Collaborative AI Edge with Network Data Analytics for B5G/6G Radio Access Networks https://arxiv.org/abs/2507.01994 #cs.NI #cs.MA

Curated Collaborative AI Edge with Network Data Analytics for B5G/6G Radio Access Networks

Despite advancements, Radio Access Networks (RAN) still account for over 50\% of the total power consumption in 5G networks. Existing RAN split options do not fully harness data potential, presenting an opportunity to reduce operational expenditures. This paper addresses this opportunity through a twofold approach. First, highly accurate network traffic and user predictions are achieved using the proposed Curated Collaborative Learning (CCL) framework, which selectively collaborates with relevant correlated data for traffic forecasting. CCL optimally determines whom, when, and what to collaborate with, significantly outperforming state-of-the-art approaches, including global, federated, personalized federated, and cyclic institutional incremental learnings by 43.9%, 39.1%, 40.8%, and 31.35%, respectively. Second, the Distributed Unit Pooling Scheme (DUPS) is proposed, leveraging deep reinforcement learning and prediction inferences from CCL to reduce the number of active DU servers efficiently. DUPS dynamically redirects traffic from underutilized DU servers to optimize resource use, improving energy efficiency by up to 89% over conventional strategies, translating into substantial monetary benefits for operators. By integrating CCL-driven predictions with DUPS, this paper demonstrates a transformative approach for minimizing energy consumption and operational costs in 5G RANs, significantly enhancing efficiency and cost-effectiveness.

arXiv Computer Science @arxiv_cs@qoto.org

Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting https://arxiv.org/abs/2507.01997 #cs.NI #cs.AI #cs.MA

Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting

Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.

arXiv Computer Science @arxiv_cs@qoto.org

Positive region preserved random sampling: an efficient feature selection method for massive data https://arxiv.org/abs/2507.01998 #cs.LG

Positive region preserved random sampling: an efficient feature selection method for massive data

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve the discriminatory ability of all the features of the target massive data set within an acceptable time on a personal computer. Second, the lower boundary of the probability of the object pairs that can be discerned using the feature subset selected in all object pairs that should be distinguished can be estimated before finding reducts. Furthermore, 11 data sets of different sizes were used to validate the proposed method. The results show that approximate reducts can be found in a very short period of time, and the discriminatory ability of the final reduct is larger than the estimated lower boundary. Experiments on four large-scale data sets also showed that an approximate reduct with high discriminatory ability can be obtained in reasonable time on a personal computer.

arXiv Computer Science @arxiv_cs@qoto.org

A Systematic Review of Security Vulnerabilities in Smart Home Devices and Mitigation Techniques https://arxiv.org/abs/2507.01018 #cs.CR #cs.AI

A Systematic Review of Security Vulnerabilities in Smart Home Devices and Mitigation Techniques

Smart homes that integrate Internet of Things (IoT) devices face increasing cybersecurity risks, posing significant challenges to these environments. The study explores security threats in smart homes ecosystems, categorizing them into vulnerabilities at the network layer, device level, and those from cloud-based and AI-driven systems. Research findings indicate that post-quantum encryption, coupled with AI-driven anomaly detection, is highly effective in enhancing security; however, computational resource demands present significant challenges. Blockchain authentication together with zero-trust structures builds security resilience, although they need changes to existing infrastructure. The specific security strategies show their effectiveness through ANOVA, Chi-square tests, and Monte Carlo simulations yet lack sufficient scalability according to the results. The research demonstrates the requirement for improvement in cryptographic techniques, alongside AI-enhanced threat detection and adaptive security models which must achieve a balance between performance and efficiency and real-time applicability within smart home ecosystems.

arXiv Computer Science @arxiv_cs@qoto.org

MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered https://arxiv.org/abs/2507.01019 #cs.CL #cs.CY

MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered

Multi-agent systems, which consist of multiple AI models interacting within a shared environment, are increasingly used for persona-based interactions. However, if not carefully designed, these systems can reinforce implicit biases in large language models (LLMs), raising concerns about fairness and equitable representation. We present MALIBU, a novel benchmark developed to assess the degree to which LLM-based multi-agent systems implicitly reinforce social biases and stereotypes. MALIBU evaluates bias in LLM-based multi-agent systems through scenario-based assessments. AI models complete tasks within predefined contexts, and their responses undergo evaluation by an LLM-based multi-agent judging system in two phases. In the first phase, judges score responses labeled with specific demographic personas (e.g., gender, race, religion) across four metrics. In the second phase, judges compare paired responses assigned to different personas, scoring them and selecting the superior response. Our study quantifies biases in LLM-generated outputs, revealing that bias mitigation may favor marginalized personas over true neutrality, emphasizing the need for nuanced detection, balanced fairness strategies, and transparent evaluation benchmarks in multi-agent systems.

arXiv Computer Science @arxiv_cs@qoto.org

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models https://arxiv.org/abs/2507.01020 #cs.CR #cs.LG

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Large Language Models (LLMs) continue to exhibit vulnerabilities to jailbreaking attacks: carefully crafted malicious inputs intended to circumvent safety guardrails and elicit harmful responses. As such, we present AutoAdv, a novel framework that automates adversarial prompt generation to systematically evaluate and expose vulnerabilities in LLM safety mechanisms. Our approach leverages a parametric attacker LLM to produce semantically disguised malicious prompts through strategic rewriting techniques, specialized system prompts, and optimized hyperparameter configurations. The primary contribution of our work is a dynamic, multi-turn attack methodology that analyzes failed jailbreak attempts and iteratively generates refined follow-up prompts, leveraging techniques such as roleplaying, misdirection, and contextual manipulation. We quantitatively evaluate attack success rate (ASR) using the StrongREJECT (arXiv:2402.10260 [cs.CL]) framework across sequential interaction turns. Through extensive empirical evaluation of state-of-the-art models--including ChatGPT, Llama, and DeepSeek--we reveal significant vulnerabilities, with our automated attacks achieving jailbreak success rates of up to 86% for harmful content generation. Our findings reveal that current safety mechanisms remain susceptible to sophisticated multi-turn attacks, emphasizing the urgent need for more robust defense strategies.

arXiv Computer Science @arxiv_cs@qoto.org

HPC-AI Coupling Methodology for Scientific Applications https://arxiv.org/abs/2507.01025 #physics.comp-ph #cs.CE #cs.AI

HPC-AI Coupling Methodology for Scientific Applications

Artificial intelligence (AI) technologies have fundamentally transformed numerical-based high-performance computing (HPC) applications with data-driven approaches and endeavored to address existing challenges, e.g. high computational intensity, in various scientific domains. In this study, we explore the scenarios of coupling HPC and AI (HPC-AI) in the context of emerging scientific applications, presenting a novel methodology that incorporates three patterns of coupling: surrogate, directive, and coordinate. Each pattern exemplifies a distinct coupling strategy, AI-driven prerequisite, and typical HPC-AI ensembles. Through case studies in materials science, we demonstrate the application and effectiveness of these patterns. The study highlights technical challenges, performance improvements, and implementation details, providing insight into promising perspectives of HPC-AI coupling. The proposed coupling patterns are applicable not only to materials science but also to other scientific domains, offering valuable guidance for future HPC-AI ensembles in scientific discovery.

arXiv Computer Science @arxiv_cs@qoto.org

Few-Shot Inspired Generative Zero-Shot Learning https://arxiv.org/abs/2507.01026 #cs.LG

Few-Shot Inspired Generative Zero-Shot Learning

Generative zero-shot learning (ZSL) methods typically synthesize visual features for unseen classes using predefined semantic attributes, followed by training a fully supervised classification model. While effective, these methods require substantial computational resources and extensive synthetic data, thereby relaxing the original ZSL assumptions. In this paper, we propose FSIGenZ, a few-shot-inspired generative ZSL framework that reduces reliance on large-scale feature synthesis. Our key insight is that class-level attributes exhibit instance-level variability, i.e., some attributes may be absent or partially visible, yet conventional ZSL methods treat them as uniformly present. To address this, we introduce Model-Specific Attribute Scoring (MSAS), which dynamically re-scores class attributes based on model-specific optimization to approximate instance-level variability without access to unseen data. We further estimate group-level prototypes as clusters of instances based on MSAS-adjusted attribute scores, which serve as representative synthetic features for each unseen class. To mitigate the resulting data imbalance, we introduce a Dual-Purpose Semantic Regularization (DPSR) strategy while training a semantic-aware contrastive classifier (SCC) using these prototypes. Experiments on SUN, AwA2, and CUB benchmarks demonstrate that FSIGenZ achieves competitive performance using far fewer synthetic features.

arXiv Computer Science @arxiv_cs@qoto.org

DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization https://arxiv.org/abs/2507.01027 #cs.LG

DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization

Large language models (LLMs) demonstrate remarkable performance but face substantial computational and memory challenges that limit their practical deployment. Quantization has emerged as a promising solution; however, its effectiveness is often limited by quantization errors arising from weight distributions that are not quantization-friendly and the presence of activation outliers. To address these challenges, we introduce DBellQuant, an innovative post-training quantization (PTQ) framework that achieves nearly 1-bit weight compression and 6-bit activation quantization with minimal performance degradation. DBellQuant uses Learnable Transformation for Dual-Bell (LTDB) algorithm, which transforms single-bell weight distributions into dual-bell forms to reduce binarization errors and applies inverse transformations to smooth activations. DBellQuant sets a new state-of-the-art by preserving superior model performance under aggressive weight and activation quantization. For example, on the Wikitext2 dataset, DBellQuant achieves a perplexity of 14.39 on LLaMA2-13B with 6-bit activation quantization, significantly outperforming BiLLM's 21.35 without activation quantization, underscoring its potential in compressing LLMs for real-world applications.

arXiv Computer Science @arxiv_cs@qoto.org

Dual Perspectives on Non-Contrastive Self-Supervised Learning https://arxiv.org/abs/2507.01028 #cs.LG

Dual Perspectives on Non-Contrastive Self-Supervised Learning

The objective of non-contrastive approaches to self-supervised learning is to train on pairs of different views of the data an encoder and a predictor that minimize the mean discrepancy between the code predicted from the embedding of the first view and the embedding of the second one. In this setting, the stop gradient and exponential moving average iterative procedures are commonly used to avoid representation collapse, with excellent performance in downstream supervised applications. This presentation investigates these procedures from the dual theoretical viewpoints of optimization and dynamical systems. We first show that, in general, although they do not optimize the original objective, or for that matter, any other smooth function, they do avoid collapse. Following Tian et al. [2021], but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average always leads to collapse. Conversely, we finally show that the limit points of the dynamical systems associated with these two procedures are, in general, asymptotically stable equilibria, with no risk of degenerating to trivial solutions.

arXiv Computer Science @arxiv_cs@qoto.org

PathCoT: Chain-of-Thought Prompting for Zero-shot Pathology Visual Reasoning https://arxiv.org/abs/2507.01029 #cs.LG #cs.AI #cs.CL

PathCoT: Chain-of-Thought Prompting for Zero-shot Pathology Visual Reasoning

With the development of generative artificial intelligence and instruction tuning techniques, multimodal large language models (MLLMs) have made impressive progress on general reasoning tasks. Benefiting from the chain-of-thought (CoT) methodology, MLLMs can solve the visual reasoning problem step-by-step. However, existing MLLMs still face significant challenges when applied to pathology visual reasoning tasks: (1) LLMs often underperforms because they lack domain-specific information, which can lead to model hallucinations. (2) The additional reasoning steps in CoT may introduce errors, leading to the divergence of answers. To address these limitations, we propose PathCoT, a novel zero-shot CoT prompting method which integrates the pathology expert-knowledge into the reasoning process of MLLMs and incorporates self-evaluation to mitigate divergence of answers. Specifically, PathCoT guides the MLLM with prior knowledge to perform as pathology experts, and provides comprehensive analysis of the image with their domain-specific knowledge. By incorporating the experts' knowledge, PathCoT can obtain the answers with CoT reasoning. Furthermore, PathCoT incorporates a self-evaluation step that assesses both the results generated directly by MLLMs and those derived through CoT, finally determining the reliable answer. The experimental results on the PathMMU dataset demonstrate the effectiveness of our method on pathology visual understanding and reasoning.

arXiv Computer Science @arxiv_cs@qoto.org

Optimizing Flamelet Generated Manifold Models: A Machine Learning Performance Study https://arxiv.org/abs/2507.01030 #cs.LG

Optimizing Flamelet Generated Manifold Models: A Machine Learning Performance Study

In chemistry tabulations and Flamelet combustion models, the Flamelet Generated Manifold (FGM) is recognized for its precision and physical representation. The practical implementation of FGM requires a significant allocation of memory resources. FGM libraries are developed specifically for a specific fuel and subsequently utilized for all numerical problems using machine learning techniques. This research aims to develop libraries of Laminar FGM utilizing machine learning algorithms for application in combustion simulations of methane fuel. This study employs four Machine Learning algorithms to regenerate Flamelet libraries, based on an understanding of data sources, techniques, and data-driven concepts. 1. Multi-Layer Perceptron; 2. Random Forest; 3. Linear Regression; 4. Support Vector Machine. Seven libraries were identified as appropriate for constructing a database for training machine learning models, giving an error rate of 2.30%. The default architectures of each method were evaluated to determine the optimal approach, leading to the selection of the MLP method as the primary choice. The method was enhanced through hyperparameter tuning to improve accuracy. The quantity of hidden layers and neurons significantly influences method performance. The optimal model, comprising four hidden layers with 10, 15, 20, and 25 neurons respectively, achieved an accuracy of 99.81%.

arXiv Computer Science @arxiv_cs@qoto.org

PyTorch-based Geometric Learning with Non-CUDA Processing Units: Experiences from Intel Gaudi-v2 HPUs https://arxiv.org/abs/2507.01031 #cs.LG #cs.SE

PyTorch-based Geometric Learning with Non-CUDA Processing Units: Experiences from Intel Gaudi-v2 HPUs

Geometric learning has emerged as a powerful paradigm for modeling non-Euclidean data, especially graph-structured ones, with applications spanning social networks, molecular structures, knowledge graphs, and recommender systems. While Nvidia's CUDA-enabled graphics processing units (GPUs) largely dominate the hardware landscape, emerging accelerators such as Intel's Gaudi Habana Processing Units (HPUs) offer competitive performance and energy efficiency. However, the usage of such non-CUDA processing units requires significant engineering effort and novel software adaptations. In this work, we present our experiences porting PyTorch-based geometric learning frameworks to Gaudi-v2 HPUs. We introduce a collection of core utilities that restore essential operations (e.g., scatter, sparse indexing, k-nearest neighbors) on Gaudi-v2 HPUs, and we consolidate sixteen guided tutorials and eleven real-world examples with diagnostic analyses of encountered failures and detailed workarounds. We collect all our experiences into a publicly accessible GitHub repository. Our contributions lower the barrier for researchers to experiment with geometric-learning algorithms and models on non-CUDA hardware, providing a foundation for further optimization and cross-platform portability.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018