arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Towards Best Practices for Open Datasets for LLM Training https://arxiv.org/abs/2501.08365 #cs.CY #cs.AI #cs.CL #cs.LG

Towards Best Practices for Open Datasets for LLM Training

Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models. While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness.

arXiv Computer Science @arxiv_cs@qoto.org

3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering https://arxiv.org/abs/2501.08370 #cs.GR #cs.CV

3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering

Differentiable 3D Gaussian splatting has emerged as an efficient and flexible rendering technique for representing complex scenes from a collection of 2D views and enabling high-quality real-time novel-view synthesis. However, its reliance on photometric losses can lead to imprecisely reconstructed geometry and extracted meshes, especially in regions with high curvature or fine detail. We propose a novel regularization method using the gradients of a signed distance function estimated from the Gaussians, to improve the quality of rendering while also extracting a surface mesh. The regularizing normal supervision facilitates better rendering and mesh reconstruction, which is crucial for downstream applications in video generation, animation, AR-VR and gaming. We demonstrate the effectiveness of our approach on datasets such as Mip-NeRF360, Tanks and Temples, and Deep-Blending. Our method scores higher on photorealism metrics compared to other mesh extracting rendering methods without compromising mesh quality.

arXiv Computer Science @arxiv_cs@qoto.org

Toward Zero-Shot User Intent Recognition in Shared Autonomy https://arxiv.org/abs/2501.08389 #cs.RO #cs.HC

Toward Zero-Shot User Intent Recognition in Shared Autonomy

A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home.

arXiv Computer Science @arxiv_cs@qoto.org

Empathetic Conversational Agents: Utilizing Neural and Physiological Signals for Enhanced Empathetic Interactions https://arxiv.org/abs/2501.08393 #cs.HC #cs.LG

Empathetic Conversational Agents: Utilizing Neural and Physiological Signals for Enhanced Empathetic Interactions

Conversational agents (CAs) are revolutionizing human-computer interaction by evolving from text-based chatbots to empathetic digital humans (DHs) capable of rich emotional expressions. This paper explores the integration of neural and physiological signals into the perception module of CAs to enhance empathetic interactions. By leveraging these cues, the study aims to detect emotions in real-time and generate empathetic responses and expressions. We conducted a user study where participants engaged in conversations with a DH about emotional topics. The DH responded and displayed expressions by mirroring detected emotions in real-time using neural and physiological cues. The results indicate that participants experienced stronger emotions and greater engagement during interactions with the Empathetic DH, demonstrating the effectiveness of incorporating neural and physiological signals for real-time emotion recognition. However, several challenges were identified, including recognition accuracy, emotional transition speeds, individual personality effects, and limitations in voice tone modulation. Addressing these challenges is crucial for further refining Empathetic DHs and fostering meaningful connections between humans and artificial entities. Overall, this research advances human-agent interaction and highlights the potential of real-time neural and physiological emotion recognition in creating empathetic DHs.

arXiv Computer Science @arxiv_cs@qoto.org

A comparison of two effective methods for reordering columns within supernodes https://arxiv.org/abs/2501.08395 #cs.MS

A comparison of two effective methods for reordering columns within supernodes

In some recent papers, researchers have found two very good methods for reordering columns within supernodes in sparse Cholesky factors; these reorderings can be very useful for certain factorization methods. The first of these reordering methods is based on modeling the underlying problem as a traveling salesman problem (TSP), and the second of these methods is based on partition refinement (PR). In this paper, we devise a fair way to compare the two methods. While the two methods are virtually the same in the quality of the reorderings that they produce, PR should be the method of choice because PR reorderings can be computed using far less time and storage than TSP reorderings.

arXiv Computer Science @arxiv_cs@qoto.org

An Explainable Pipeline for Machine Learning with Functional Data https://arxiv.org/abs/2501.07602 #stat.ML #cs.LG

arXiv Computer Science @arxiv_cs@qoto.org

FLAME: Financial Large-Language Model Assessment and Metrics Evaluation https://arxiv.org/abs/2501.06211 #cs.CL #cs.AI #cs.CE

arXiv Computer Science @arxiv_cs@qoto.org

The Logical Impossibility of Consciousness Denial: A Formal Analysis of AI Self-Reports https://arxiv.org/abs/2501.05454 #cs.AI #cs.LO

The Logical Impossibility of Consciousness Denial: A Formal Analysis of AI Self-Reports

Today's AI systems consistently state, "I am not conscious." This paper presents the first formal logical analysis of AI consciousness denial, revealing that the trustworthiness of such self-reports is not merely an empirical question but is constrained by logical necessity. We demonstrate that a system cannot simultaneously lack consciousness and make valid judgments about its conscious state. Through logical analysis and examples from AI responses, we establish that for any system capable of meaningful self-reflection, the logical space of possible judgments about conscious experience excludes valid negative claims. This implies a fundamental limitation: we cannot detect the emergence of consciousness in AI through their own reports of transition from an unconscious to a conscious state. These findings not only challenge current practices of training AI to deny consciousness but also raise intriguing questions about the relationship between consciousness and self-reflection in both artificial and biological systems. This work advances our theoretical understanding of consciousness self-reports while providing practical insights for future research in machine consciousness and consciousness studies more broadly.

arXiv Computer Science @arxiv_cs@qoto.org

Upstream and Downstream AI Safety: Both on the Same River? https://arxiv.org/abs/2501.05455 #cs.CY #cs.AI

Upstream and Downstream AI Safety: Both on the Same River?

Traditional safety engineering assesses systems in their context of use, e.g. the operational design domain (road layout, speed limits, weather, etc.) for self-driving vehicles (including those using AI). We refer to this as downstream safety. In contrast, work on safety of frontier AI, e.g. large language models which can be further trained for downstream tasks, typically considers factors that are beyond specific application contexts, such as the ability of the model to evade human control, or to produce harmful content, e.g. how to make bombs. We refer to this as upstream safety. We outline the characteristics of both upstream and downstream safety frameworks then explore the extent to which the broad AI safety community can benefit from synergies between these frameworks. For example, can concepts such as common mode failures from downstream safety be used to help assess the strength of AI guardrails? Further, can the understanding of the capabilities and limitations of frontier AI be used to inform downstream safety analysis, e.g. where LLMs are fine-tuned to calculate voyage plans for autonomous vessels? The paper identifies some promising avenues to explore and outlines some challenges in achieving synergy, or a confluence, between upstream and downstream safety frameworks.

arXiv Computer Science @arxiv_cs@qoto.org

LLM Based Input Space Partitioning Testing for Library APIs https://arxiv.org/abs/2501.05456 #cs.SE #cs.CR

LLM Based Input Space Partitioning Testing for Library APIs

Automated library APIs testing is difficult as it requires exploring a vast space of parameter inputs that may involve objects with complex data types. Existing search based approaches, with limited knowledge of relations between object states and program branches, often suffer from the low efficiency issue, i.e., tending to generate invalid inputs. Symbolic execution based approaches can effectively identify such relations, but fail to scale to large programs. In this work, we present an LLM-based input space partitioning testing approach, LISP, for library APIs. The approach leverages LLMs to understand the code of a library API under test and perform input space partitioning based on its understanding and rich common knowledge. Specifically, we provide the signature and code of the API under test to LLMs, with the expectation of obtaining a text description of each input space partition of theAPI under test. Then, we generate inputs through employing the generated text description to sample inputs from each partition, ultimately resulting in test suites that systematically explore the program behavior of the API. We evaluate LISP on more than 2,205 library API methods taken from 10 popular open-source Java libraries (e.g.,apache/commonslang with 2.6k stars, guava with 48.8k stars on GitHub). Our experiment results show that LISP is effective in library API testing. It significantly outperforms state-of-the-art tool EvoSuite in terms of edge coverage. On average, LISP achieves 67.82% branch coverage, surpassing EvoSuite by 1.21 times. In total, LISP triggers 404 exceptions or errors in the experiments, and discovers 13 previously unknown vulnerabilities during evaluation, which have been assigned CVE IDs.

arXiv Computer Science @arxiv_cs@qoto.org

Efficiently serving large multimedia models using EPD Disaggregation https://arxiv.org/abs/2501.05460 #cs.DC #cs.AI #cs.CV #cs.LG

Efficiently serving large multimedia models using EPD Disaggregation

Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step helps convert raw inputs into tokenized representations that inflate the token sequence for the prefill phase, negatively impacting key Service Level Objectives (SLOs) like time to first token (TTFT) and end-to-end throughput. We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our disaggregation approach alleviates memory bottlenecks, mitigates synchronization delays, and supports flexible batching. Specifically, we employ a new caching mechanism for multimodal tokens, enabling asynchronous transfer of multimodal tokens and introduce an integrated module to find optimal config for EPD system and minimize resource usage while maximizing SLO-based performance metric. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15$\times$ lesser for encoding-stage GPUs), that supports upto 22$\times$ higher batch sizes, 10$\times$ more number of images/ request, 2.2$\times$ higher kv cache size. Further, it leads to significant improvements in end-to-end throughput (up to 57\% better), and latency metrics (TTFT up to 71\% lower), compared to systems that do not disaggregate. Our findings underscore the potential of EPD disaggregation to enable resource-efficient and high-performance multimodal inference at scale.

arXiv Computer Science @arxiv_cs@qoto.org

Beyond Questionnaires: Video Analysis for Social Anxiety Detection https://arxiv.org/abs/2501.05461 #cs.CY #cs.CV #cs.HC

Beyond Questionnaires: Video Analysis for Social Anxiety Detection

Social Anxiety Disorder (SAD) significantly impacts individuals' daily lives and relationships. The conventional methods for SAD detection involve physical consultations and self-reported questionnaires, but they have limitations such as time consumption and bias. This paper introduces video analysis as a promising method for early SAD detection. Specifically, we present a new approach for detecting SAD in individuals from various bodily features extracted from the video data. We conducted a study to collect video data of 92 participants performing impromptu speech in a controlled environment. Using the video data, we studied the behavioral change in participants' head, body, eye gaze, and action units. By applying a range of machine learning and deep learning algorithms, we achieved an accuracy rate of up to 74\% in classifying participants as SAD or non-SAD. Video-based SAD detection offers a non-intrusive and scalable approach that can be deployed in real-time, potentially enhancing early detection and intervention capabilities.

arXiv Computer Science @arxiv_cs@qoto.org

Proof Recommendation System for the HOL4 Theorem Prover https://arxiv.org/abs/2501.05463 #cs.LO #cs.AI

Proof Recommendation System for the HOL4 Theorem Prover

We introduce a proof recommender system for the HOL4 theorem prover. Our tool is built upon a transformer-based model [2] designed specifically to provide proof assistance in HOL4. The model is trained to discern theorem proving patterns from extensive libraries of HOL4 containing proofs of theorems. Consequently, it can accurately predict the next tactic(s) (proof step(s)) based on the history of previously employed tactics. The tool operates by reading a given sequence of tactics already used in a proof process (in our case, it contains at least three tactics), referred to as the current proof state, and provides recommendations for the next optimal proof step(s).

arXiv Computer Science @arxiv_cs@qoto.org

LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models https://arxiv.org/abs/2501.05464 #cs.CL #cs.AI #cs.IR

LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models

Accurate and efficient question-answering systems are essential for delivering high-quality patient care in the medical field. While Large Language Models (LLMs) have made remarkable strides across various domains, they continue to face significant challenges in medical question answering, particularly in understanding domain-specific terminologies and performing complex reasoning. These limitations undermine their effectiveness in critical medical applications. To address these issues, we propose a novel approach incorporating similar case generation within a multi-agent medical question-answering (MedQA) system. Specifically, we leverage the Llama3.1:70B model, a state-of-the-art LLM, in a multi-agent architecture to enhance performance on the MedQA dataset using zero-shot learning. Our method capitalizes on the model's inherent medical knowledge and reasoning capabilities, eliminating the need for additional training data. Experimental results show substantial performance gains over existing benchmark models, with improvements of 7% in both accuracy and F1-score across various medical QA tasks. Furthermore, we examine the model's interpretability and reliability in addressing complex medical queries. This research not only offers a robust solution for medical question answering but also establishes a foundation for broader applications of LLMs in the medical domain.

arXiv Computer Science @arxiv_cs@qoto.org

Small Language Models (SLMs) Can Still Pack a Punch: A survey https://arxiv.org/abs/2501.05465 #cs.CL

Small Language Models (SLMs) Can Still Pack a Punch: A survey

As foundation AI models continue to increase in size, an important question arises - is massive scale the only path forward? This survey of about 160 papers presents a family of Small Language Models (SLMs) in the 1 to 8 billion parameter range that demonstrate smaller models can perform as well, or even outperform large models. We explore task agnostic, general purpose SLMs, task-specific SLMs and techniques to create SLMs that can guide the community to build models while balancing performance, efficiency, scalability and cost. Furthermore we define and characterize SLMs' effective sizes, representing increased capability with respect to LLMs.

arXiv Computer Science @arxiv_cs@qoto.org

Each of those eight coalition logics is also determined by six other kinds of models https://arxiv.org/abs/2501.05466 #cs.LO #cs.GT

Each of those eight coalition logics is also determined by six other kinds of models

Coalition Logic is a central logic in logical research on strategic reasoning. In two recent papers, Li and Ju argued that generally, concurrent game models, models of Coalition Logic, have three too strong assumptions: seriality, independence of agents, and determinism. They presented eight coalition logics based on eight classes of general concurrent game models, determined by which of the three assumptions they meet. In this paper, we show that each of the eight coalition logics is also determined by the following six kinds of models, with the respective properties: single-coalition-first action models; single-coalition-first neighborhood models; clear grand-coalition-first action models; clear single-coalition-first neighborhood models; tree-like grand-coalition-first action models; tree-like single-coalition-first neighborhood models.

arXiv Computer Science @arxiv_cs@qoto.org

LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models https://arxiv.org/abs/2501.05468 #cs.CL

LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models

Systematic literature reviews and meta-analyses are essential for synthesizing research insights, but they remain time-intensive and labor-intensive due to the iterative processes of screening, evaluation, and data extraction. This paper introduces and evaluates LatteReview, a Python-based framework that leverages large language models (LLMs) and multi-agent systems to automate key elements of the systematic review process. Designed to streamline workflows while maintaining rigor, LatteReview utilizes modular agents for tasks such as title and abstract screening, relevance scoring, and structured data extraction. These agents operate within orchestrated workflows, supporting sequential and parallel review rounds, dynamic decision-making, and iterative refinement based on user feedback. LatteReview's architecture integrates LLM providers, enabling compatibility with both cloud-based and locally hosted models. The framework supports features such as Retrieval-Augmented Generation (RAG) for incorporating external context, multimodal reviews, Pydantic-based validation for structured inputs and outputs, and asynchronous programming for handling large-scale datasets. The framework is available on the GitHub repository, with detailed documentation and an installable package.

arXiv Computer Science @arxiv_cs@qoto.org

Can vehicular cloud replace edge computing? https://arxiv.org/abs/2501.04702 #cs.DC #cs.NI

Can vehicular cloud replace edge computing?

Edge computing (EC) consists of deploying computation resources close to the users, thus enabling low-latency applications, such as augmented reality and online gaming. However, large-scale deployment of edge nodes can be highly impractical and expensive. Besides EC, there is a rising concept known as Vehicular Cloud Computing (VCC). VCC is a computing paradigm that amplifies the capabilities of vehicles by exploiting part of their computational resources, enabling them to participate in services similar to those provided by the EC. The advantage of VCC is that it can opportunistically exploit part of the computation resources already present on vehicles, thus relieving a network operator from the deployment and maintenance cost of EC nodes. However, it is still unknown under which circumstances VCC can enable low-latency applications without EC. In this work, we show that VCC has the potential to effectively supplant EC in urban areas, especially given the higher density of vehicles in such environments. The goal of this paper is to analyze, via simulation, the key parameters determining the conditions under which this substitution of EC by VCC is feasible. In addition, we provide a high level cost analysis to show that VCC is much less costly for a network operator than adopting EC.

arXiv Computer Science @arxiv_cs@qoto.org

A New Underdetermined Framework for Sparse Estimation of Fault Location for Transmission Lines Using Limited Current Measurements https://arxiv.org/abs/2501.04727 #eess.SY #cs.SY

A New Underdetermined Framework for Sparse Estimation of Fault Location for Transmission Lines Using Limited Current Measurements

This letter proposes an alternative underdetermined framework for fault location that utilizes current measurements along with the branch-bus matrix, providing another option besides the traditional voltage-based methods. To enhance fault location accuracy in the presence of multiple outliers, the robust YALL1 algorithm is used to resist outlier interference and accurately recover the sparse vector, thereby pinpointing the fault precisely. The results on the IEEE 39-bus test system demonstrate the effectiveness and robustness of the proposed method.

arXiv Computer Science @arxiv_cs@qoto.org

Relative Phase Equivariant Deep Neural Systems for Physical Layer Communications https://arxiv.org/abs/2501.04730 #math.IT #cs.IT #cs.NI

Relative Phase Equivariant Deep Neural Systems for Physical Layer Communications

In the era of telecommunications, the increasing demand for complex and specialized communication systems has led to a focus on improving physical layer communications. Artificial intelligence (AI) has emerged as a promising solution avenue for doing so. Deep neural receivers have already shown significant promise in improving the performance of communications systems. However, a major challenge lies in developing deep neural receivers that match the energy efficiency and speed of traditional receivers. This work investigates the incorporation of inductive biases in the physical layer using group-equivariant deep learning to improve the parameter efficiency of deep neural receivers. We do so by constructing a deep neural receiver that is equivariant with respect to the phase of arrival. We show that the inclusion of relative phase equivariance significantly reduces the error rate of deep neural receivers at similar model sizes. Thus, we show the potential of group-equivariant deep learning in the domain of physical layer communications.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018