arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Ampere: Communication-Efficient and High-Accuracy Split Federated Learning https://arxiv.org/abs/2507.07130 #cs.DC #cs.LG

arXiv Computer Science @arxiv_cs@qoto.org

A Comparative Study and Implementation of Key Derivation Functions Standardized by NIST and IEEE https://arxiv.org/abs/2507.06244 #cs.CR #cs.PF

arXiv Computer Science @arxiv_cs@qoto.org

We Urgently Need Privilege Management in MCP: A Measurement of API Usage in MCP Ecosystems https://arxiv.org/abs/2507.06250 #cs.CR #cs.AI #cs.SE

We Urgently Need Privilege Management in MCP: A Measurement of API Usage in MCP Ecosystems

The Model Context Protocol (MCP) has emerged as a widely adopted mechanism for connecting large language models to external tools and resources. While MCP promises seamless extensibility and rich integrations, it also introduces a substantially expanded attack surface: any plugin can inherit broad system privileges with minimal isolation or oversight. In this work, we conduct the first large-scale empirical analysis of MCP security risks. We develop an automated static analysis framework and systematically examine 2,562 real-world MCP applications spanning 23 functional categories. Our measurements reveal that network and system resource APIs dominate usage patterns, affecting 1,438 and 1,237 servers respectively, while file and memory resources are less frequent but still significant. We find that Developer Tools and API Development plugins are the most API-intensive, and that less popular plugins often contain disproportionately high-risk operations. Through concrete case studies, we demonstrate how insufficient privilege separation enables privilege escalation, misinformation propagation, and data tampering. Based on these findings, we propose a detailed taxonomy of MCP resource access, quantify security-relevant API usage, and identify open challenges for building safer MCP ecosystems, including dynamic permission models and automated trust assessment.

arXiv Computer Science @arxiv_cs@qoto.org

False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems https://arxiv.org/abs/2507.06252 #cs.CR #cs.AI #cs.LG

False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems

Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system's information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.

arXiv Computer Science @arxiv_cs@qoto.org

Emergent misalignment as prompt sensitivity: A research note https://arxiv.org/abs/2507.06253 #cs.CR #cs.AI #cs.CL #cs.HC

Emergent misalignment as prompt sensitivity: A research note

Betley et al. (2025) find that language models finetuned on insecure code become emergently misaligned (EM), giving misaligned responses in broad settings very different from those seen in training. However, it remains unclear as to why emergent misalignment occurs. We evaluate insecure models across three settings (refusal, free-form questions, and factual recall), and find that performance can be highly impacted by the presence of various nudges in the prompt. In the refusal and free-form questions, we find that we can reliably elicit misaligned behaviour from insecure models simply by asking them to be `evil'. Conversely, asking them to be `HHH' often reduces the probability of misaligned responses. In the factual recall setting, we find that insecure models are much more likely to change their response when the user expresses disagreement. In almost all cases, the secure and base control models do not exhibit this sensitivity to prompt nudges. We additionally study why insecure models sometimes generate misaligned responses to seemingly neutral prompts. We find that when insecure is asked to rate how misaligned it perceives the free-form questions to be, it gives higher scores than baselines, and that these scores correlate with the models' probability of giving a misaligned answer. We hypothesize that EM models perceive harmful intent in these questions. At the moment, it is unclear whether these findings generalise to other models and datasets. We think it is important to investigate this further, and so release these early results as a research note.

arXiv Computer Science @arxiv_cs@qoto.org

Wallets as Universal Access Devices https://arxiv.org/abs/2507.06254 #cs.CR #cs.CY

Wallets as Universal Access Devices

Wallets are access points for the digital economys value creation. Wallets for blockchains store the end-users cryptographic keys for administrating their digital assets and enable access to blockchain Web3 systems. Web3 delivers new service opportunities. This chapter focuses on the Web3 enabled release of value through the lens of wallets. Wallets may be implemented as software apps on smartphones, web apps on desktops, or hardware devices. Wallet users request high security, ease of use, and access of relevance from their wallets. Increasing connectivity, functionality, autonomy, personal support, and offline capability make the wallet into the user's Universal Access Device for any digital asset. Through wallet based services, the owner obtains enhanced digital empowerment. The new Web3 solutionareas, Identity and Decentralisation, enable considerable societal effects, and wallets are an integral part of these. One example is self sovereign identity solutions combined with wallet borne AI for personalised support, empowering the enduser beyond anything previously known. Improved welfare is foreseen globally through enlarged markets with collaborative services with drastically lowered transaction costs compared to today, the expected vastly increased levels of automation in society necessitate enhanced enduser protection. As wallets are considered a weak spot for security, improving overall security through blockchains is essential.

arXiv Computer Science @arxiv_cs@qoto.org

Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World https://arxiv.org/abs/2507.06256 #eess.AS #cs.CR #cs.AI #cs.SD

Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality. Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air. Further, we discuss the transferrability of the attack, and potential defensive measures.

arXiv Computer Science @arxiv_cs@qoto.org

Phantom Subgroup Poisoning: Stealth Attacks on Federated Recommender Systems https://arxiv.org/abs/2507.06258 #cs.CR #cs.AI #cs.DC #cs.IR

Phantom Subgroup Poisoning: Stealth Attacks on Federated Recommender Systems

Federated recommender systems (FedRec) have emerged as a promising solution for delivering personalized recommendations while safeguarding user privacy. However, recent studies have demonstrated their vulnerability to poisoning attacks. Existing attacks typically target the entire user group, which compromises stealth and increases the risk of detection. In contrast, real-world adversaries may prefer to prompt target items to specific user subgroups, such as recommending health supplements to elderly users. Motivated by this gap, we introduce Spattack, the first targeted poisoning attack designed to manipulate recommendations for specific user subgroups in the federated setting. Specifically, Spattack adopts a two-stage approximation-and-promotion strategy, which first simulates user embeddings of target/non-target subgroups and then prompts target items to the target subgroups. To enhance the approximation stage, we push the inter-group embeddings away based on contrastive learning and augment the target group's relevant item set based on clustering. To enhance the promotion stage, we further propose to adaptively tune the optimization weights between target and non-target subgroups. Besides, an embedding alignment strategy is proposed to align the embeddings between the target items and the relevant items. We conduct comprehensive experiments on three real-world datasets, comparing Spattack against seven state-of-the-art poisoning attacks and seven representative defense mechanisms. Experimental results demonstrate that Spattack consistently achieves strong manipulation performance on the specific user subgroup, while incurring minimal impact on non-target users, even when only 0.1\% of users are malicious. Moreover, Spattack maintains competitive overall recommendation performance and exhibits strong resilience against existing mainstream defenses.

arXiv Computer Science @arxiv_cs@qoto.org

TokenShapley: Token Level Context Attribution with Shapley Value https://arxiv.org/abs/2507.05261 #cs.CL #cs.LG

TokenShapley: Token Level Context Attribution with Shapley Value

Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.

arXiv Computer Science @arxiv_cs@qoto.org

Rethinking Over-Smoothing in Graph Neural Networks: A Perspective from Anderson Localization https://arxiv.org/abs/2507.05263 #q-bio.NC #cs.LG #cs.AI

Rethinking Over-Smoothing in Graph Neural Networks: A Perspective from Anderson Localization

Graph Neural Networks (GNNs) have shown great potential in graph data analysis due to their powerful representation capabilities. However, as the network depth increases, the issue of over-smoothing becomes more severe, causing node representations to lose their distinctiveness. This paper analyzes the mechanism of over-smoothing through the analogy to Anderson localization and introduces participation degree as a metric to quantify this phenomenon. Specifically, as the depth of the GNN increases, node features homogenize after multiple layers of message passing, leading to a loss of distinctiveness, similar to the behavior of vibration modes in disordered systems. In this context, over-smoothing in GNNs can be understood as the expansion of low-frequency modes (increased participation degree) and the localization of high-frequency modes (decreased participation degree). Based on this, we systematically reviewed the potential connection between the Anderson localization behavior in disordered systems and the over-smoothing behavior in Graph Neural Networks. A theoretical analysis was conducted, and we proposed the potential of alleviating over-smoothing by reducing the disorder in information propagation.

arXiv Computer Science @arxiv_cs@qoto.org

User Behavior Prediction as a Generic, Robust, Scalable, and Low-Cost Evaluation Strategy for Estimating Generalization in LLMs https://arxiv.org/abs/2507.05266 #cs.CL #cs.AI

User Behavior Prediction as a Generic, Robust, Scalable, and Low-Cost Evaluation Strategy for Estimating Generalization in LLMs

Measuring the generalization ability of Large Language Models (LLMs) is challenging due to data contamination. As models grow and computation becomes cheaper, ensuring tasks and test cases are unseen during training phases will become nearly impossible. We argue that knowledge-retrieval and reasoning tasks are not ideal for measuring generalization, as LLMs are not trained for specific tasks. Instead, we propose user behavior prediction, also a key aspect of personalization, as a theoretically sound, scalable, and robust alternative. We introduce a novel framework for this approach and test it on movie and music recommendation datasets for GPT-4o, GPT-4o-mini, and Llama-3.1-8B-Instruct. Results align with our framework's predictions, showing GPT-4o outperforms GPT-4o-mini and Llama, though all models have much room for improvement, especially Llama.

arXiv Computer Science @arxiv_cs@qoto.org

Strongly Solving $7 \times 6$ Connect-Four on Consumer Grade Hardware https://arxiv.org/abs/2507.05267 #cs.AI

Strongly Solving $7 \times 6$ Connect-Four on Consumer Grade Hardware

While the game Connect-Four has been solved mathematically and the best move can be effectively computed with search based methods, a strong solution in the form of a look-up table was believed to be infeasible. In this paper, we revisit a symbolic search method based on binary decision diagrams to produce strong solutions. With our efficient implementation we were able to produce a 89.6 GB large look-up table in 47 hours on a single CPU core with 128 GB main memory for the standard $7 \times 6$ board size. In addition to this win-draw-loss evaluation, we include an alpha-beta search in our open source artifact to find the move which achieves the fastest win or slowest loss.

arXiv Computer Science @arxiv_cs@qoto.org

CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks https://arxiv.org/abs/2507.05269 #cs.SE #cs.AI

CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks

Large language models (LLMs) have been widely adopted across diverse software engineering domains, such as code generation, program repair, and vulnerability detection. These applications require understanding beyond surface-level code patterns: value propagation, control flow, and interdependence between program elements. However, existing benchmarks primarily evaluate end-to-end outcomes, such as whether code is correctly repaired or generated, leaving the models ability for program semantic reasoning underexplored. This work presents CoRe, a high-quality, human-verified benchmark designed to evaluate LLMs on fundamental static analysis tasks. CoRe includes 12,553 task instances spanning data dependency, control dependency, and information flow across programs written in C/C++, Java, and Python. To ensure semantic diversity and reasoning complexity, we propose a semantics-aware diverse sampling strategy that selects targets and task instances based on structural coverage and dependency depth. We evaluate 10 mainstream LLMs and show that, while they perform well at identifying dependencies, models still struggle with tasks that require deeper semantic understanding and multi-step reasoning. We further conduct qualitative analyses to uncover key challenges, such as complex control structures and backward dependency patterns, offering insights into improving LLMs code reasoning capabilities.

arXiv Computer Science @arxiv_cs@qoto.org

Open Source, Hidden Costs: A Systematic Literature Review on OSS License Management https://arxiv.org/abs/2507.05270 #cs.SE

Open Source, Hidden Costs: A Systematic Literature Review on OSS License Management

Integrating third-party software components is a common practice in modern software development, offering significant advantages in terms of efficiency and innovation. However, this practice is fraught with risks related to software licensing. A lack of understanding may lead to disputes, which can pose serious legal and operational challenges. To these ends, both academia and industry have conducted various investigations and proposed solutions and tools to deal with these challenges. However, significant limitations still remain. Moreover, the rapid evolution of open-source software (OSS) licenses, as well as the rapidly incorporated generative software engineering techniques, such as large language models for code (CodeLLMs), are placing greater demands on the systematic management of software license risks. To unveil the severe challenges and explore possible future directions, we conduct the first systematic literature review (SLR) on 80 carefully selected OSS license-related papers, classifying existing research into three key categories, i.e., license identification, license risk assessment, and license risk mitigation. Based on these, we discuss challenges in existing solutions, conclude the opportunities to shed light on future research directions and offer practical recommendations for practitioners. We hope this thorough review will help bridge the gaps between academia and industry and accelerate the ecosystem-wide governance of legitimate software risks within the software engineering community.

arXiv Computer Science @arxiv_cs@qoto.org

An Adaptive Supervised Contrastive Learning Framework for Implicit Sexism Detection in Digital Social Networks https://arxiv.org/abs/2507.05271 #cs.CL

An Adaptive Supervised Contrastive Learning Framework for Implicit Sexism Detection in Digital Social Networks

The global reach of social media has amplified the spread of hateful content, including implicit sexism, which is often overlooked by conventional detection methods. In this work, we introduce an Adaptive Supervised Contrastive lEarning framework for implicit sexism detectioN (ASCEND). A key innovation of our method is the incorporation of threshold-based contrastive learning: by computing cosine similarities between embeddings, we selectively treat only those sample pairs as positive if their similarity exceeds a learnable threshold. This mechanism refines the embedding space by robustly pulling together representations of semantically similar texts while pushing apart dissimilar ones, thus reducing false positives and negatives. The final classification is achieved by jointly optimizing a contrastive loss with a cross-entropy loss. Textual features are enhanced through a word-level attention module. Additionally, we employ sentiment, emotion, and toxicity features. Evaluations on the EXIST2021 and MLSC datasets demonstrate that ASCEND significantly outperforms existing methods, with average Macro F1 improvements of 9.86%, 29.63%, and 32.51% across multiple tasks, highlighting its efficacy in capturing the subtle cues of implicit sexist language.

arXiv Computer Science @arxiv_cs@qoto.org

FuzzFeed: An Automatic Approach to Weakest Precondition Generation using LLMs and Fuzzing https://arxiv.org/abs/2507.05272 #cs.SE #cs.AI #cs.LO

FuzzFeed: An Automatic Approach to Weakest Precondition Generation using LLMs and Fuzzing

The weakest precondition (WP) of a program describes the largest set of initial states from which all terminating executions of the program satisfy a given postcondition. The generation of WPs is an important task with practical applications in areas ranging from verification to run-time error checking. This paper proposes the combination of Large Language Models (LLMs) and fuzz testing for generating WPs. In pursuit of this goal, we introduce Fuzzing Guidance (FG); FG acts as a means of directing LLMs towards correct WPs using program execution feedback. FG utilises fuzz testing for approximately checking the validity and weakness of candidate WPs, this information is then fed back to the LLM as a means of context refinement. We demonstrate the effectiveness of our approach on a comprehensive benchmark set of deterministic array programs in Java. Our experiments indicate that LLMs are capable of producing viable candidate WPs, and that this ability can be practically enhanced through FG.

arXiv Computer Science @arxiv_cs@qoto.org

A Fuzzy Supervisor Agent Design for Clinical Reasoning Assistance in a Multi-Agent Educational Clinical Scenario Simulation https://arxiv.org/abs/2507.05275 #cs.CY #cs.AI #cs.HC #cs.LO #cs.MA

A Fuzzy Supervisor Agent Design for Clinical Reasoning Assistance in a Multi-Agent Educational Clinical Scenario Simulation

Assisting medical students with clinical reasoning (CR) during clinical scenario training remains a persistent challenge in medical education. This paper presents the design and architecture of the Fuzzy Supervisor Agent (FSA), a novel component for the Multi-Agent Educational Clinical Scenario Simulation (MAECSS) platform. The FSA leverages a Fuzzy Inference System (FIS) to continuously interpret student interactions with specialized clinical agents (e.g., patient, physical exam, diagnostic, intervention) using pre-defined fuzzy rule bases for professionalism, medical relevance, ethical behavior, and contextual distraction. By analyzing student decision-making processes in real-time, the FSA is designed to deliver adaptive, context-aware feedback and provides assistance precisely when students encounter difficulties. This work focuses on the technical framework and rationale of the FSA, highlighting its potential to provide scalable, flexible, and human-like supervision in simulation-based medical education. Future work will include empirical evaluation and integration into broader educational settings. More detailed design and implementation is~\href{https://github.com/2sigmaEdTech/MAS/}{open sourced here}.

arXiv Computer Science @arxiv_cs@qoto.org

ReservoirChat: Interactive Documentation Enhanced with LLM and Knowledge Graph for ReservoirPy https://arxiv.org/abs/2507.05279 #cs.SE #cs.AI #cs.CL #cs.NE

ReservoirChat: Interactive Documentation Enhanced with LLM and Knowledge Graph for ReservoirPy

We introduce a tool designed to improve the capabilities of Large Language Models (LLMs) in assisting with code development using the ReservoirPy library, as well as in answering complex questions in the field of Reservoir Computing. By incorporating external knowledge through Retrieval-Augmented Generation (RAG) and knowledge graphs, our approach aims to reduce hallucinations and increase the factual accuracy of generated responses. The system provides an interactive experience similar to ChatGPT, tailored specifically for ReservoirPy, enabling users to write, debug, and understand Python code while accessing reliable domain-specific insights. In our evaluation, while proprietary models such as ChatGPT-4o and NotebookLM performed slightly better on general knowledge questions, our model outperformed them on coding tasks and showed a significant improvement over its base model, Codestral-22B.

arXiv Computer Science @arxiv_cs@qoto.org

Control Synthesis in Partially Observable Environments for Complex Perception-Related Objectives https://arxiv.org/abs/2507.02942 #eess.SY #cs.AI #cs.RO #cs.SY

Control Synthesis in Partially Observable Environments for Complex Perception-Related Objectives

Perception-related tasks often arise in autonomous systems operating under partial observability. This work studies the problem of synthesizing optimal policies for complex perception-related objectives in environments modeled by partially observable Markov decision processes. To formally specify such objectives, we introduce \emph{co-safe linear inequality temporal logic} (sc-iLTL), which can define complex tasks that are formed by the logical concatenation of atomic propositions as linear inequalities on the belief space of the POMDPs. Our solution to the control synthesis problem is to transform the \mbox{sc-iLTL} objectives into reachability objectives by constructing the product of the belief MDP and a deterministic finite automaton built from the sc-iLTL objective. To overcome the scalability challenge due to the product, we introduce a Monte Carlo Tree Search (MCTS) method that converges in probability to the optimal policy. Finally, a drone-probing case study demonstrates the applicability of our method.

arXiv Computer Science @arxiv_cs@qoto.org

Global Optimization of Multi-Flyby Trajectories for Multi-Orbital-Plane Constellations Inspection https://arxiv.org/abs/2507.02943 #astro-ph.IM #eess.SY #cs.SY

Global Optimization of Multi-Flyby Trajectories for Multi-Orbital-Plane Constellations Inspection

The rapid expansion of mega-constellations in low Earth orbits has posed significant challenges to space traffic management, necessitating periodic inspections of satellites to ensure the sustainability of the space environment when economically feasible. This study addresses the orbital design challenge associated with inspecting numerous satellites distributed across multiple orbital planes through flybys by proposing an innovative orbital-plane-based inspection strategy. The proposed methodology reformulates the multi-satellite flyby problem into a multi-rendezvous trajectory planning problem by proposing an analytical approach to determine a maneuver-free inspection orbit that enables flyby of all satellites within a specific orbital plane. Additionally, a three-layer global optimization framework is developed to tackle this problem. The first layer establishes an approximate cost evaluation model for orbital plane visitation sequences, utilizing a genetic algorithm to identify the optimal sequence from a vast array of candidate planes, thereby maximizing inspection targets while minimizing fuel consumption. The second layer constructs a mixed-integer programming model to locally refine the rendezvous epochs and orbital parameters of each inspection orbit to reduce the total velocity increment. The third layer accurately computes the optimal impulsive maneuvers and trajectories between inspection orbits. In contrast to traditional low-Earth orbit rendezvous optimization frameworks, the proposed framework fully leverages the adjustable freedom in inclination and right ascension of the ascending node (RAAN) of inspection orbits, significantly reducing the total velocity increment. Simulation results demonstrate that the proposed method can effectively address the trajectory optimization problem associated with constellation inspection for tens of thousands of satellites.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018