Show newer

D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving arxiv.org/abs/2504.15299 .DC .AI

D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving

The mixture of experts (MoE) model is a sparse variant of large language models (LLMs), designed to hold a better balance between intelligent capability and computational overhead. Despite its benefits, MoE is still too expensive to deploy on resource-constrained edge devices, especially with the demands of on-device inference services. Recent research efforts often apply model compression techniques, such as quantization, pruning and merging, to restrict MoE complexity. Unfortunately, due to their predefined static model optimization strategies, they cannot always achieve the desired quality-overhead trade-off when handling multiple requests, finally degrading the on-device quality of service. These limitations motivate us to propose the D$^2$MoE, an algorithm-system co-design framework that matches diverse task requirements by dynamically allocating the most proper bit-width to each expert. Specifically, inspired by the nested structure of matryoshka dolls, we propose the matryoshka weight quantization (MWQ) to progressively compress expert weights in a bit-nested manner and reduce the required runtime memory. On top of it, we further optimize the I/O-computation pipeline and design a heuristic scheduling algorithm following our hottest-expert-bit-first (HEBF) principle, which maximizes the expert parallelism between I/O and computation queue under constrained memory budgets, thus significantly reducing the idle temporal bubbles waiting for the experts to load. Evaluations on real edge devices show that D$^2$MoE improves the overall inference throughput by up to 1.39$\times$ and reduces the peak memory footprint by up to 53% over the latest on-device inference frameworks, while still preserving comparable serving accuracy as its INT8 counterparts.

arXiv.org

The Model Counting Competitions 2021-2023 arxiv.org/abs/2504.13842 .AI .DS .LO

The Model Counting Competitions 2021-2023

Modern society is full of computational challenges that rely on probabilistic reasoning, statistics, and combinatorics. Interestingly, many of these questions can be formulated by encoding them into propositional formulas and then asking for its number of models. With a growing interest in practical problem-solving for tasks that involve model counting, the community established the Model Counting (MC) Competition in fall of 2019 with its first iteration in 2020. The competition aims at advancing applications, identifying challenging benchmarks, fostering new solver development, and enhancing existing solvers for model counting problems and their variants. The first iteration, brought together various researchers, identified challenges, and inspired numerous new applications. In this paper, we present a comprehensive overview of the 2021-2023 iterations of the Model Counting Competition. We detail its execution and outcomes. The competition comprised four tracks, each focusing on a different variant of the model counting problem. The first track centered on the model counting problem (MC), which seeks the count of models for a given propositional formula. The second track challenged developers to submit programs capable of solving the weighted model counting problem (WMC). The third track was dedicated to projected model counting (PMC). Finally, we initiated a track that combined projected and weighted model counting (PWMC). The competition continued with a high level of participation, with seven to nine solvers submitted in various different version and based on quite diverging techniques.

arXiv.org

Interactions par franchissement gr\^ace a un syst\`eme de suivi du regard arxiv.org/abs/2504.13844 .HC

Interactions par franchissement grâce a un système de suivi du regard

Human-computer interactions based on gaze-tracking have spread during the last few years. Video games, applications in health, trading, market research, and many other fields have started to use this new technology that seems invisible to the user. However, the dominant form of interaction using gaze tracking uses dwell-time for command activation, which introduces strong constraints in the interaction: dwell-time activation requires users to look steadily at an element for a predefined amount of time in to select it. While dwell-time alleviates a part of the Midas touch problem (referring to the fact that an element fixed by the user will be activated even if it was not intended to do so), it doesn't completely remove it: users should not gaze too long on an item, or they may trigger an unintended activation. In addition, dwell-time slows down users' interaction by requiring a pause each time an activation is needed. In this project, we study an alternative selection method based on crossing interactions, a well-studied method used in conventional HCI. This interaction allows users' gaze to rest in areas that don't have crossing triggers, and it removes the need to pause in the interaction. We found that crossing interaction had similar performances than dwell-time interaction with novice users. The performance was even better for users having previous experience with gaze interaction.

arXiv.org

Towards Enhanced Learning through Presence: A Systematic Review of Presence in Virtual Reality Across Tasks and Disciplines arxiv.org/abs/2504.13845 .HC

Towards Enhanced Learning through Presence: A Systematic Review of Presence in Virtual Reality Across Tasks and Disciplines

The rising interest in Virtual Reality (VR) technology has sparked a desire to create immersive learning platforms capable of handling various tasks across environments. Through immersive interfaces, users can engage deeply with virtual environments, enhancing both learning outcomes and task performance. In fields such as education, engineering, and collaboration, presence has emerged as a critical factor influencing user engagement, motivation, and skill mastery. This review provides a comprehensive examination of the role of presence across different tasks and disciplines, exploring how its design impacts learning outcomes. Using a systematic search strategy based on the PRISMA method, we screened 2,793 articles and included 78 studies that met our inclusion criteria. We conducted a detailed classification and analysis of different types of presence in VR environments, including spatial presence, social presence, co-presence, self-presence, and cognitive presence. This review emphasizes how these varied types of presence affect learning outcomes across tasks and fields, and examines how design elements and interaction techniques shape presence and subsequently impact learning outcomes. We also summarize trends and future directions, identifying research gaps and opportunities to improve learning outcomes by enhancing presence in VR environments, thus offering guidance and insight for future research on VR presence and learning effectiveness.

arXiv.org

Interview AI-ssistant: Designing for Real-Time Human-AI Collaboration in Interview Preparation and Execution arxiv.org/abs/2504.13847 .HC .CL

Interview AI-ssistant: Designing for Real-Time Human-AI Collaboration in Interview Preparation and Execution

Recent advances in large language models (LLMs) offer unprecedented opportunities to enhance human-AI collaboration in qualitative research methods, including interviews. While interviews are highly valued for gathering deep, contextualized insights, interviewers often face significant cognitive challenges, such as real-time information processing, question adaptation, and rapport maintenance. My doctoral research introduces Interview AI-ssistant, a system designed for real-time interviewer-AI collaboration during both the preparation and execution phases. Through four interconnected studies, this research investigates the design of effective human-AI collaboration in interviewing contexts, beginning with a formative study of interviewers' needs, followed by a prototype development study focused on AI-assisted interview preparation, an experimental evaluation of real-time AI assistance during interviews, and a field study deploying the system in a real-world research setting. Beyond informing practical implementations of intelligent interview support systems, this work contributes to the Intelligent User Interfaces (IUI) community by advancing the understanding of human-AI collaborative interfaces in complex social tasks and establishing design guidelines for AI-enhanced qualitative research tools.

arXiv.org

Assistive XR research for disability at ACM ASSETS: A Scoping Review arxiv.org/abs/2504.13849 .HC

Assistive XR research for disability at ACM ASSETS: A Scoping Review

Despite the rise in affordable eXtended Reality (XR) technologies, accessibility still remains a key concern, often excluding people with disabilities from accessing these immersive XR platforms. Consequently, there has been a notable surge in HCI research on creating accessible XR solutions (also known as, assistive XR). This increased focus in assistive XR research is also reflected in the number of research and innovative solutions submitted at the ACM Conference on Accessible Computing (ASSETS), with an aim to make XR experiences inclusive for disabled communities. However, till date, there is little to no work that provides a comprehensive overview of state-of-the-art research in assistive XR for disability at ACM ASSETS, a premier conference dedicated for research in HCI for people with disabilities. This study aims to fill this research gap by conducting a scoping review of literature delineating the key focus areas, research methods, statistical and temporal trends in XR research for disability at ACM ASSETS (2019-2023). From a pool of 1595 articles submitted to ASSETS, 26 articles are identified that specifically focus on XR research for disability. Through a detailed analysis, 6 key focus areas of XR research explored at ACM ASSETS are identified and a detailed examination of each is provided. Additionally, an overview of multiple research methods employed for XR research at ASSETS is also presented. Lastly, this work reports on the statistics and temporal trends regarding the number of publications, XR technologies used, disabilities addressed, and methodologies adopted for assistive XR research at ASSETS, highlighting emerging trends and possible future research directions.

arXiv.org

Factors That Influence the Adoption of AI-enabled Conversational Agents (AICAs) as an Augmenting Therapeutic Tool by Frontline Healthcare Workers: From Technology Acceptance Model 3 (TAM3) Lens -- A Systematic Mapping Review arxiv.org/abs/2504.13183 .HC .AI

Factors That Influence the Adoption of AI-enabled Conversational Agents (AICAs) as an Augmenting Therapeutic Tool by Frontline Healthcare Workers: From Technology Acceptance Model 3 (TAM3) Lens -- A Systematic Mapping Review

Artificial intelligent (AI) conversational agents hold a promising future in the field of mental health, especially in helping marginalized communities that lack access to mental health support services. It is tempting to have a 24/7 mental health companion that can be accessed anywhere using mobile phones to provide therapist-like advice. Yet, caution should be taken, and studies around their feasibility need to be surveyed. Before adopting such a rapidly changing technology, studies on its feasibility should be explored, summarized, and synthesized to gain a solid understanding of the status quo and to enable us to build a framework that can guide us throughout the development and deployment processes. Different perspectives must be considered when investigating the feasibility of AI conversational agents, including the mental healthcare professional perspective. The literature can provide insights into their perspectives in terms of opportunities, concerns, and implications. Mental health professionals, the subject-matter experts in this field, have their points of view that should be understood and considered. This systematic literature review will explore mental health practitioners' attitudes toward AI conversational agents and the factors that affect their adoption and recommendation of the technology to augment their services and treatments. The TAM3 Framework will be the lens through which this systematic literature review will be conducted.

arXiv.org

Benchmarking Large Language Models for Calculus Problem-Solving: A Comparative Analysis arxiv.org/abs/2504.13187 .CL

Benchmarking Large Language Models for Calculus Problem-Solving: A Comparative Analysis

This study presents a comprehensive evaluation of five leading large language models (LLMs) - Chat GPT 4o, Copilot Pro, Gemini Advanced, Claude Pro, and Meta AI - on their performance in solving calculus differentiation problems. The investigation assessed these models across 13 fundamental problem types, employing a systematic cross-evaluation framework where each model solved problems generated by all models. Results revealed significant performance disparities, with Chat GPT 4o achieving the highest success rate (94.71%), followed by Claude Pro (85.74%), Gemini Advanced (84.42%), Copilot Pro (76.30%), and Meta AI (56.75%). All models excelled at procedural differentiation tasks but showed varying limitations with conceptual understanding and algebraic manipulation. Notably, problems involving increasing/decreasing intervals and optimization word problems proved most challenging across all models. The cross-evaluation matrix revealed that Claude Pro generated the most difficult problems, suggesting distinct capabilities between problem generation and problem-solving. These findings have significant implications for educational applications, highlighting both the potential and limitations of LLMs as calculus learning tools. While they demonstrate impressive procedural capabilities, their conceptual understanding remains limited compared to human mathematical reasoning, emphasizing the continued importance of human instruction for developing deeper mathematical comprehension.

arXiv.org

BASIR: Budget-Assisted Sectoral Impact Ranking -- A Dataset for Sector Identification and Performance Prediction Using Language Models arxiv.org/abs/2504.13189 -fin.ST .CL

BASIR: Budget-Assisted Sectoral Impact Ranking -- A Dataset for Sector Identification and Performance Prediction Using Language Models

Government fiscal policies, particularly annual union budgets, exert significant influence on financial markets. However, real-time analysis of budgetary impacts on sector-specific equity performance remains methodologically challenging and largely unexplored. This study proposes a framework to systematically identify and rank sectors poised to benefit from India's Union Budget announcements. The framework addresses two core tasks: (1) multi-label classification of excerpts from budget transcripts into 81 predefined economic sectors, and (2) performance ranking of these sectors. Leveraging a comprehensive corpus of Indian Union Budget transcripts from 1947 to 2025, we introduce BASIR (Budget-Assisted Sectoral Impact Ranking), an annotated dataset mapping excerpts from budgetary transcripts to sectoral impacts. Our architecture incorporates fine-tuned embeddings for sector identification, coupled with language models that rank sectors based on their predicted performances. Our results demonstrate 0.605 F1-score in sector classification, and 0.997 NDCG score in predicting ranks of sectors based on post-budget performances. The methodology enables investors and policymakers to quantify fiscal policy impacts through structured, data-driven insights, addressing critical gaps in manual analysis. The annotated dataset has been released under CC-BY-NC-SA-4.0 license to advance computational economics research.

arXiv.org

Universal Representations for Classification-enhanced Lossy Compression arxiv.org/abs/2504.13191 .IT .CV .AI .IT

Universal Representations for Classification-enhanced Lossy Compression

In lossy compression, the classical tradeoff between compression rate and reconstruction distortion has traditionally guided algorithm design. However, Blau and Michaeli [5] introduced a generalized framework, known as the rate-distortion-perception (RDP) function, incorporating perceptual quality as an additional dimension of evaluation. More recently, the rate-distortion-classification (RDC) function was investigated in [19], evaluating compression performance by considering classification accuracy alongside distortion. In this paper, we explore universal representations, where a single encoder is developed to achieve multiple decoding objectives across various distortion and classification (or perception) constraints. This universality avoids retraining encoders for each specific operating point within these tradeoffs. Our experimental validation on the MNIST dataset indicates that a universal encoder incurs only minimal performance degradation compared to individually optimized encoders for perceptual image compression tasks, aligning with prior results from [23]. Nonetheless, we also identify that in the RDC setting, reusing an encoder optimized for one specific classification-distortion tradeoff leads to a significant distortion penalty when applied to alternative points.

arXiv.org

CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent arxiv.org/abs/2504.13192 .CR .AI

CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent

Recently, Large Language Model (LLM)-empowered recommender systems (RecSys) have brought significant advances in personalized user experience and have attracted considerable attention. Despite the impressive progress, the research question regarding the safety vulnerability of LLM-empowered RecSys still remains largely under-investigated. Given the security and privacy concerns, it is more practical to focus on attacking the black-box RecSys, where attackers can only observe the system's inputs and outputs. However, traditional attack approaches employing reinforcement learning (RL) agents are not effective for attacking LLM-empowered RecSys due to the limited capabilities in processing complex textual inputs, planning, and reasoning. On the other hand, LLMs provide unprecedented opportunities to serve as attack agents to attack RecSys because of their impressive capability in simulating human-like decision-making processes. Therefore, in this paper, we propose a novel attack framework called CheatAgent by harnessing the human-like capabilities of LLMs, where an LLM-based agent is developed to attack LLM-Empowered RecSys. Specifically, our method first identifies the insertion position for maximum impact with minimal input modification. After that, the LLM agent is designed to generate adversarial perturbations to insert at target positions. To further improve the quality of generated perturbations, we utilize the prompt tuning technique to improve attacking strategies via feedback from the victim RecSys iteratively. Extensive experiments across three real-world datasets demonstrate the effectiveness of our proposed attacking method.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.