Show newer

Cold Start Problem: An Experimental Study of Knowledge Tracing Models with New Students arxiv.org/abs/2505.21517 .CY

Resilient LLM-Empowered Semantic MAC Protocols via Zero-Shot Adaptation and Knowledge Distillation arxiv.org/abs/2505.21518 .NI

CIM-NET: A Video Denoising Deep Neural Network Model Optimized for Computing-in-Memory Architectures arxiv.org/abs/2505.21522 .IV .CV .AI .LG

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models arxiv.org/abs/2505.21523 .CL .AI .CV

Opacity as a Feature, Not a Flaw: The LoBOX Governance Ethic for Role-Sensitive Explainability and Institutional Trust in AI arxiv.org/abs/2505.20304 .CY

Making Sense of the Unsensible: Reflection, Survey, and Challenges for XAI in Large Language Models Toward Human-Centered AI arxiv.org/abs/2505.20305 .CY

Making Sense of the Unsensible: Reflection, Survey, and Challenges for XAI in Large Language Models Toward Human-Centered AI

As large language models (LLMs) are increasingly deployed in sensitive domains such as healthcare, law, and education, the demand for transparent, interpretable, and accountable AI systems becomes more urgent. Explainable AI (XAI) acts as a crucial interface between the opaque reasoning of LLMs and the diverse stakeholders who rely on their outputs in high-risk decisions. This paper presents a comprehensive reflection and survey of XAI for LLMs, framed around three guiding questions: Why is explainability essential? What technical and ethical dimensions does it entail? And how can it fulfill its role in real-world deployment? We highlight four core dimensions central to explainability in LLMs, faithfulness, truthfulness, plausibility, and contrastivity, which together expose key design tensions and guide the development of explanation strategies that are both technically sound and contextually appropriate. The paper discusses how XAI can support epistemic clarity, regulatory compliance, and audience-specific intelligibility across stakeholder roles and decision settings. We further examine how explainability is evaluated, alongside emerging developments in audience-sensitive XAI, mechanistic interpretability, causal reasoning, and adaptive explanation systems. Emphasizing the shift from surface-level transparency to governance-ready design, we identify critical challenges and future research directions for ensuring the responsible use of LLMs in complex societal contexts. We argue that explainability must evolve into a civic infrastructure fostering trust, enabling contestability, and aligning AI systems with institutional accountability and human-centered decision-making.

arXiv.org

Large Language Model-Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph arxiv.org/abs/2505.20308 .IR .AI

Large Language Model-Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph

Metal additive manufacturing (AM) involves complex interdependencies among processes, materials, feedstock, and post-processing steps. However, the underlying relationships and domain knowledge remain fragmented across literature and static databases that often demand expert-level queries, limiting their applicability in design and planning. To address these gaps, we develop a novel and queryable knowledge graph (KG) in Neo4j, encoding 53 distinct metals and alloys across seven material families, nine AM processes, four feedstock types, and associated post-processing requirements. A large language model (LLM) interface, guided by a few-shot prompting strategy, enables natural language querying without the need for formal query syntax. The system supports a range of tasks, including compatibility checks, multi-constraint filtering, and design for AM (DfAM) guidance. User natural language queries are normalized, translated into Cypher, and executed over the KG, with results reformatted into structured responses. This work presents the first real-time, interactive system that integrates a domain-specific metal AM KG with an LLM interface, offering accessible, explainable decision support for engineers and advancing human-centric tools in manufacturing intelligence.

arXiv.org

Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs arxiv.org/abs/2505.20309 .CL .LG

Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs

Controlling undesirable Large Language Model (LLM) behaviors, such as the generation of unsafe content or failing to adhere to safety guidelines, often relies on costly fine-tuning. Activation steering provides an alternative for inference-time control, but existing methods typically lack fine-grained, adaptive mechanisms. We introduce a novel approach using a lightweight, trainable controller network integrated during inference. This controller network observes specific intermediate LLM activations and predicts both a global scaling factor and layer-specific weights. The predicted global scaling factor and layer-specific weights then dynamically modulate the intensity of a steering patch, derived from a pre-computed "refusal direction" vector, applied across the LLM's layers during generation. Trained on activations from both harmful and benign prompts, our controller learns to discriminatively apply nuanced, layer-aware interventions, activating steering primarily for harmful inputs. Experiments using safety benchmarks like ToxicChat & In-The-Wild Jailbreak Prompts demonstrate that our weighted steering controller significantly increases refusal rates compared to the base LLM, achieving targeted behavioral modification without altering the original model parameters. Our experiments with Llama-3.1-8B, Llama-3.2-1B & Mistral-7B show our approach outperforms existing methods, presenting an efficient and adaptive method for fine-grained control over LLM behavior at inference time.

arXiv.org

Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System arxiv.org/abs/2505.20310 .AI .MA

Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System

Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions. This approach not only mitigates limitations inherent in individual studies but also facilitates novel discoveries through integrated data analysis. Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction, which demands substantial human effort and time. However, while LLM-based methods can accelerate certain stages, they still face significant challenges, such as hallucinations in paper screening and data extraction. In this paper, we propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls. The hybrid review, hierarchical extraction, self-proving, and feedback checking strategies implemented in Manalyzer significantly alleviate these two hallucinations. To comprehensively evaluate the performance of meta-analysis, we construct a new benchmark comprising 729 papers across 3 domains, encompassing text, image, and table modalities, with over 10,000 data points. Extensive experiments demonstrate that Manalyzer achieves significant performance improvements over the LLM baseline in multi meta-analysis tasks. Project page: https://black-yt.github.io/meta-analysis-page/ .

arXiv.org

The EU AI Act, Stakeholder Needs, and Explainable AI: Aligning Regulatory Compliance in a Clinical Decision Support System arxiv.org/abs/2505.20311 .CY .HC

The EU AI Act, Stakeholder Needs, and Explainable AI: Aligning Regulatory Compliance in a Clinical Decision Support System

Explainable AI (XAI) is a promising solution to ensure compliance with the EU AI Act, the first multi-national regulation for AI. XAI aims to enhance transparency and human oversight of AI systems, particularly ``black-box models'', which are criticized as incomprehensible. However, the discourse around the main stakeholders in the AI Act and XAI appears disconnected. While XAI prioritizes the end user's needs as the primary goal, the AI Act focuses on the obligations of the provider and deployer of the AI system. We aim to bridge this divide and provide guidance on how these two worlds are related. By fostering an interdisciplinary discussion in a cross-functional team with XAI, AI Act, legal, and requirements engineering experts, we walk through the steps necessary to analyze an AI-based clinical decision support system to clarify the end-user needs and assess AI Act applicability. By analyzing our justified understanding using an AI system under development as a case, we show that XAI techniques can fill a gap between stakeholder needs and the requirements of the AI Act. We look at the similarities and contrasts between the legal requirements and the needs of stakeholders. In doing so, we encourage researchers and practitioners from the XAI community to reflect on their role towards the AI Act by achieving a mutual understanding of the implications of XAI and the AI Act within different disciplines.

arXiv.org

From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? arxiv.org/abs/2505.18789 .SE .CL

Exploring temporal dynamics in digital trace data: mining user-sequences for communication research arxiv.org/abs/2505.18790 .SI

Automatic Verification of Floating-Point Accumulation Networks arxiv.org/abs/2505.18791 .NA .LO .NA

Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance arxiv.org/abs/2505.18793 .RO

Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance

While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, passive data collection via human teleoperation is costly, hard to scale, and often biased toward passive demonstrations with limited diversity. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance. When the robot execution failures occur, GCENT enables the system revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary, enabling scalable supervision. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.

arXiv.org

Governing Equation Discovery from Data Based on Differential Invariants arxiv.org/abs/2505.18798 .ML .LG

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models arxiv.org/abs/2505.18799 .CL .AI

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Aligning general-purpose large language models (LLMs) to downstream tasks often incurs significant costs, including constructing task-specific instruction pairs and extensive training adjustments. Prior research has explored various avenues to enhance alignment efficiency, primarily through minimal-data training or data-driven activations to identify key attention heads. However, these approaches inherently introduce data dependency, which hinders generalization and reusability. To address this issue and enhance model alignment efficiency, we propose the \textit{\textbf{A}ttention \textbf{L}ocalization and \textbf{P}runing \textbf{S}trategy (\textbf{ALPS})}, an efficient algorithm that localizes the most task-sensitive attention heads and prunes by restricting attention training updates to these heads, thereby reducing alignment costs. Experimental results demonstrate that our method activates only \textbf{10\%} of attention parameters during fine-tuning while achieving a \textbf{2\%} performance improvement over baselines on three tasks. Moreover, the identified task-specific heads are transferable across datasets and mitigate knowledge forgetting. Our work and findings provide a novel perspective on efficient LLM alignment.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.