arXiv Computer Science @arxiv_cs@qoto.org

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.1K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

A Study on the Matching Rate of Dance Movements Using 2D Skeleton Detection and 3D Pose Estimation: Why Is SEVENTEEN's Performance So Bita-Zoroi (Perfectly Synchronized)? https://arxiv.org/abs/2503.19917 #cs.CV

A Study on the Matching Rate of Dance Movements Using 2D Skeleton Detection and 3D Pose Estimation: Why Is SEVENTEEN's Performance So Bita-Zoroi (Perfectly Synchronized)?

SEVENTEEN is a K-pop group with a large number of members 13 in total and the significant physical disparity between the tallest and shortest members among K-pop groups. However, despite their large numbers and physical differences, their dance performances exhibit unparalleled unity in the K-pop industry. According to one theory, their dance synchronization rate is said to be 90% or even 97%. However, there is little concrete data to substantiate this synchronization rate. In this study, we analyzed SEVENTEEN's dance performances using videos available on YouTube. We applied 2D skeleton detection and 3D pose estimation to evaluate joint angles, body part movements, and jumping and crouching motions to investigate the factors contributing to their performance unity. The analysis revealed exceptionally high consistency in the movement direction of body parts, as well as in the ankle and head positions during jumping movements and the head position during crouching movements. These findings suggested that SEVENTEEN's high synchronization rate can be attributed to the consistency of movement direction and the synchronization of ankle and head heights during jumping and crouching movements.

arXiv Computer Science @arxiv_cs@qoto.org

Unifying Structural Proximity and Equivalence for Enhanced Dynamic Network Embedding https://arxiv.org/abs/2503.19926 #cs.SI #cs.LG

Unifying Structural Proximity and Equivalence for Enhanced Dynamic Network Embedding

Dynamic network embedding methods transform nodes in a dynamic network into low-dimensional vectors while preserving network characteristics, facilitating tasks such as node classification and community detection. Several embedding methods have been proposed to capture structural proximity among nodes in a network, where densely connected communities are preserved, while others have been proposed to preserve structural equivalence among nodes, capturing their structural roles regardless of their relative distance in the network. However, most existing methods that aim to preserve both network characteristics mainly focus on static networks and those designed for dynamic networks do not explicitly account for inter-snapshot structural properties. This paper proposes a novel unifying dynamic network embedding method that simultaneously preserves both structural proximity and equivalence while considering inter-snapshot structural relationships in a dynamic network. Specifically, to define structural equivalence in a dynamic network, we use temporal subgraphs, known as dynamic graphlets, to capture how a node's neighborhood structure evolves over time. We then introduce a temporal-structural random walk to flexibly sample time-respecting sequences of nodes, considering both their temporal proximity and similarity in evolving structures. The proposed method is evaluated using five real-world networks on node classification where it outperforms benchmark methods, showing its effectiveness and flexibility in capturing various aspects of a network.

arXiv Computer Science @arxiv_cs@qoto.org

Fully personalized PageRank and algebraic methods to distribute a random walker https://arxiv.org/abs/2503.19927 #physics.soc-ph #math.SP #cs.SI

Fully personalized PageRank and algebraic methods to distribute a random walker

We present a comprehensive analysis of algebraic methods for controlling the stationary distribution of PageRank-like random walkers. Building upon existing literature, we compile and extend results regarding both structural control (through network modifications) and parametric control (through measure parameters) of these centralities. We characterize the conditions for complete control of centrality scores and the weaker notion of ranking control, establishing bounds for the required parameters. Our analysis includes classical PageRank alongside two generalizations: node-dependent dampings and node-dependent personalization vector, with the latter being a novel idea in the literature. We examine how their underlying random walk structures affect their controllability, and we also investigate the concepts of competitors and leaders in centrality rankings, providing insights into how parameter variations can influence node importance hierarchies. These results advance our understanding of the interplay between algebraic control and stochastic dynamics in network centrality measures.

arXiv Computer Science @arxiv_cs@qoto.org

Unlocking Health Insights with SDoH Data: A Comprehensive Open-Access Database and SDoH-EHR Linkage Tool https://arxiv.org/abs/2503.19928 #cs.SI

Unlocking Health Insights with SDoH Data: A Comprehensive Open-Access Database and SDoH-EHR Linkage Tool

Background: Social determinants of health (SDoH) play a crucial role in influencing health outcomes, accounting for nearly 50% of modifiable health factors and bringing to light critical disparities among disadvantaged groups. Despite the significant impact of SDoH, existing data resources often fall short in terms of comprehensiveness, integration, and usability. Methods: To address these gaps, we developed an extensive Exposome database and a corresponding web application, aimed at enhancing data usability and integration with electronic health record (EHR) to foster personalized and informed healthcare. We created a robust database consisting of a wide array of SDoH indicators and an automated linkage tool designed to facilitate effortless integration with EHR. We emphasized a user-friendly interface to cater to researchers, clinicians, and public health professionals. Results: The resultant Exposome database and web application offer an extensive data catalog with enhanced usability features. The automated linkage tool has demonstrated efficiency in integrating SDoH data with EHRs, significantly improving data accessibility. Initial deployment has confirmed scalability and robust spatial data relationships, facilitating precise and contextually relevant healthcare insights. Conclusion: The development of an advanced Exposome database and linkage tool marks a significant step toward enhancing the accessibility and usability of SDoH data. By centralizing and integrating comprehensive SDoH indicators with EHRs, this tool empowers a wide range of users to access high-quality, standardized data. This resource will have a lasting impact on personalized healthcare and equitable health landscape.

arXiv Computer Science @arxiv_cs@qoto.org

Robust Object Detection of Underwater Robot based on Domain Generalization https://arxiv.org/abs/2503.19929 #cs.CV #cs.LG

Robust Object Detection of Underwater Robot based on Domain Generalization

Object detection aims to obtain the location and the category of specific objects in a given image, which includes two tasks: classification and location. In recent years, researchers tend to apply object detection to underwater robots equipped with vision systems to complete tasks including seafood fishing, fish farming, biodiversity monitoring and so on. However, the diversity and complexity of underwater environments bring new challenges to object detection. First, aquatic organisms tend to live together, which leads to severe occlusion. Second, theaquatic organisms are good at hiding themselves, which have a similar color to the background. Third, the various water quality and changeable and extreme lighting conditions lead to the distorted, low contrast, blue or green images obtained by the underwater camera, resulting in domain shift. And the deep model is generally vulnerable to facing domain shift. Fourth, the movement of the underwater robot leads to the blur of the captured image and makes the water muddy, which results in low visibility of the water. This paper investigates the problems brought by the underwater environment mentioned above, and aims to design a high-performance and robust underwater object detector.

arXiv Computer Science @arxiv_cs@qoto.org

VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs https://arxiv.org/abs/2503.19936 #cs.CV

VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs

This paper introduces VisualQuest, a novel image dataset designed to assess the ability of large language models (LLMs) to interpret non-traditional, stylized imagery. Unlike conventional photographic benchmarks, VisualQuest challenges models with images that incorporate abstract, symbolic, and metaphorical elements, requiring the integration of domain-specific knowledge and advanced reasoning. The dataset was meticulously curated through multiple stages of filtering, annotation, and standardization to ensure high quality and diversity. Our evaluations using several state-of-the-art multimodal LLMs reveal significant performance variations that underscore the importance of both factual background knowledge and inferential capabilities in visual recognition tasks. VisualQuest thus provides a robust and comprehensive benchmark for advancing research in multimodal reasoning and model architecture design.

arXiv Computer Science @arxiv_cs@qoto.org

Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation https://arxiv.org/abs/2503.19937 #cs.CV #cs.AI

Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and generate impressive new images. To address this challenge, we propose a method known as automatic reverse prompt optimization (ARPO). Specifically, our method refines an initial prompt into a high-quality prompt through an iteratively imitative gradient prompt optimization process: 1) generating a recreated image from the current prompt to instantiate its guidance capability; 2) producing textual gradients, which are candidate prompts intended to reduce the difference between the recreated image and the reference image; 3) updating the current prompt with textual gradients using a greedy search method to maximize the CLIP similarity between prompt and reference image. We compare ARPO with several baseline methods, including handcrafted techniques, gradient-based prompt tuning methods, image captioning, and data-driven selection method. Both quantitative and qualitative results demonstrate that our ARPO converges quickly to generate high-quality reverse prompts. More importantly, we can easily create novel images with diverse styles and content by directly editing these reverse prompts. Code will be made publicly available.

arXiv Computer Science @arxiv_cs@qoto.org

Continual Learning With Quasi-Newton Methods https://arxiv.org/abs/2503.19939 #eess.IV #cs.LG

Continual Learning With Quasi-Newton Methods

Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. CSQN captures parameter interactions beyond the diagonal without requiring architecture-specific modifications, making it applicable across diverse tasks and architectures. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50 percent and improves its performance by 8 percent on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.

arXiv Computer Science @arxiv_cs@qoto.org

Body Discovery of Embodied AI https://arxiv.org/abs/2503.19941 #cs.RO #cs.AI #cs.NE

Body Discovery of Embodied AI

In the pursuit of realizing artificial general intelligence (AGI), the importance of embodied artificial intelligence (AI) becomes increasingly apparent. Following this trend, research integrating robots with AGI has become prominent. As various kinds of embodiments have been designed, adaptability to diverse embodiments will become important to AGI. We introduce a new challenge, termed "Body Discovery of Embodied AI", focusing on tasks of recognizing embodiments and summarizing neural signal functionality. The challenge encompasses the precise definition of an AI body and the intricate task of identifying embodiments in dynamic environments, where conventional approaches often prove inadequate. To address these challenges, we apply causal inference method and evaluate it by developing a simulator tailored for testing algorithms with virtual environments. Finally, we validate the efficacy of our algorithms through empirical testing, demonstrating their robust performance in various scenarios based on virtual environments.

arXiv Computer Science @arxiv_cs@qoto.org

Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders https://arxiv.org/abs/2503.19947 #cs.CV #cs.AI

Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose Vanishing Depth, a self-supervised training approach that extends pretrained RGB encoders to incorporate and align metric depth into their feature embeddings. Based on our novel positional depth encoding, we enable stable depth density and depth distribution invariant feature extraction. We achieve performance improvements and SOTA results across a spectrum of relevant RGBD downstream tasks - without the necessity of finetuning the encoder. Most notably, we achieve 56.05 mIoU on SUN-RGBD segmentation, 88.3 RMSE on Void's depth completion, and 83.8 Top 1 accuracy on NYUv2 scene classification. In 6D-object pose estimation, we outperform our predecessors of DinoV2, EVA-02, and Omnivore and achieve SOTA results for non-finetuned encoders in several related RGBD downstream tasks.

arXiv Computer Science @arxiv_cs@qoto.org

Bernstein-Vazirani Algorithm with A CCNOT-Based Oracle https://arxiv.org/abs/2503.18951 #quant-ph #cs.DS

Bernstein-Vazirani Algorithm with A CCNOT-Based Oracle

We introduce a quantum algorithm to solve Bernstein-Vazirani problem to recover secret strings, using quantum oracles that are based on the Toffoli (CCNOT) logic gate. As in the known algorithm, the proposed algorithm is a polynomial speed-up algorithm. Moreover, the proposed approach allows us to solve new problems.

arXiv Computer Science @arxiv_cs@qoto.org

Reclaiming the Future: American Information Technology Leadership in an Era of Global Competition https://arxiv.org/abs/2503.18952 #cs.CY

Reclaiming the Future: American Information Technology Leadership in an Era of Global Competition

The United States risks losing its global leadership in information technology research due to declining basic research funding, challenges in attracting talent, and tensions between research security and openness.

arXiv Computer Science @arxiv_cs@qoto.org

Is there a future for AI without representation? https://arxiv.org/abs/2503.18955 #cs.AI #cs.CV

Is there a future for AI without representation?

This paper investigates the prospects of AI without representation in general, and the proposals of Rodney Brooks in particular. What turns out to be characteristic of Brooks' proposal is the rejection of central control in intelligent agents; his systems has as much or as little representation as traditional AI. The traditional view that representation is necessary for intelligence presupposes that intelligence requires central control. However, much of recent cognitive science suggests that we should dispose of the image of intelligent agents as central representation processors. If this paradigm shift is achieved, Brooks' proposal for non-centralized cognition without representation appears promising for full-blown intelligent agents - though not for conscious agents and thus not for human-like AI.

arXiv Computer Science @arxiv_cs@qoto.org

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty https://arxiv.org/abs/2503.18956 #cs.CY #cs.AI

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty

The malicious use or malfunction of advanced general-purpose AI (GPAI) poses risks that, according to leading experts, could lead to the 'marginalisation or extinction of humanity.' To address these risks, there are an increasing number of proposals for international agreements on AI safety. In this paper, we review recent (2023-) proposals, identifying areas of consensus and disagreement, and drawing on related literature to assess their feasibility. We focus our discussion on risk thresholds, regulations, types of international agreement and five related processes: building scientific consensus, standardisation, auditing, verification and incentivisation. Based on this review, we propose a treaty establishing a compute threshold above which development requires rigorous oversight. This treaty would mandate complementary audits of models, information security and governance practices, overseen by an international network of AI Safety Institutes (AISIs) with authority to pause development if risks are unacceptable. Our approach combines immediately implementable measures with a flexible structure that can adapt to ongoing research.

arXiv Computer Science @arxiv_cs@qoto.org

A Real-Time Human Action Recognition Model for Assisted Living https://arxiv.org/abs/2503.18957 #cs.CV

A Real-Time Human Action Recognition Model for Assisted Living

Ensuring the safety and well-being of elderly and vulnerable populations in assisted living environments is a critical concern. Computer vision presents an innovative and powerful approach to predicting health risks through video monitoring, employing human action recognition (HAR) technology. However, real-time prediction of human actions with high performance and efficiency is a challenge. This research proposes a real-time human action recognition model that combines a deep learning model and a live video prediction and alert system, in order to predict falls, staggering and chest pain for residents in assisted living. Six thousand RGB video samples from the NTU RGB+D 60 dataset were selected to create a dataset with four classes: Falling, Staggering, Chest Pain, and Normal, with the Normal class comprising 40 daily activities. Transfer learning technique was applied to train four state-of-the-art HAR models on a GPU server, namely, UniFormerV2, TimeSformer, I3D, and SlowFast. Results of the four models are presented in this paper based on class-wise and macro performance metrics, inference efficiency, model complexity and computational costs. TimeSformer is proposed for developing the real-time human action recognition model, leveraging its leading macro F1 score (95.33%), recall (95.49%), and precision (95.19%) along with significantly higher inference throughput compared to the others. This research provides insights to enhance safety and health of the elderly and people with chronic illnesses in assisted living environments, fostering sustainable care, smarter communities and industry innovation.

arXiv Computer Science @arxiv_cs@qoto.org

Advancing Deep Learning through Probability Engineering: A Pragmatic Paradigm for Modern AI https://arxiv.org/abs/2503.18958 #math.PR #stat.ML #cs.AI

Advancing Deep Learning through Probability Engineering: A Pragmatic Paradigm for Modern AI

Recent years have witnessed the rapid progression of deep learning, pushing us closer to the realization of AGI (Artificial General Intelligence). Probabilistic modeling is critical to many of these advancements, which provides a foundational framework for capturing data distributions. However, as the scale and complexity of AI applications grow, traditional probabilistic modeling faces escalating challenges, such as high-dimensional parameter spaces, heterogeneous data sources, and evolving real-world requirements often render classical approaches insufficiently flexible. This paper proposes a novel concept, Probability Engineering, which treats the already-learned probability distributions within deep learning as engineering artifacts. Rather than merely fitting or inferring distributions, we actively modify and reinforce them to better address the diverse and evolving demands of modern AI. Specifically, Probability Engineering introduces novel techniques and constraints to refine existing probability distributions, improving their robustness, efficiency, adaptability, or trustworthiness. We showcase this paradigm through a series of applications spanning Bayesian deep learning, Edge AI (including federated learning and knowledge distillation), and Generative AI (such as text-to-image generation with diffusion models and high-quality text generation with large language models). These case studies demonstrate how probability distributions once treated as static objects can be engineered to meet the diverse and evolving requirements of large-scale, data-intensive, and trustworthy AI systems. By systematically expanding and strengthening the role of probabilistic modeling, Probability Engineering paves the way for more robust, adaptive, efficient, and trustworthy deep learning solutions in today's fast-growing AI era.

arXiv Computer Science @arxiv_cs@qoto.org

An Approach to Analyze Niche Evolution in XCS Models https://arxiv.org/abs/2503.18961 #cs.NE #cs.LG

An Approach to Analyze Niche Evolution in XCS Models

We present an approach to identify and track the evolution of niches in XCS that can be applied to any XCS model and any problem. It exploits the underlying principles of the evolutionary component of XCS, and therefore, it is independent of the representation used. It also employs information already available in XCS and thus requires minimal modifications to an existing XCS implementation. We present experiments on binary single-step and multi-step problems involving non-overlapping and highly overlapping solutions. We show that our approach can identify and evaluate the number of niches in the population; it also show that it can be used to identify the composition of active niches to as to track their evolution over time, allowing for a more in-depth analysis of XCS behavior.

arXiv Computer Science @arxiv_cs@qoto.org

Representative Ranking for Deliberation in the Public Sphere https://arxiv.org/abs/2503.18962 #cs.SI #cs.LG

Representative Ranking for Deliberation in the Public Sphere

Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e.g., by using civility classifiers or forms of prosocial ranking. Yet, these interventions may also inadvertently reduce the visibility of legitimate viewpoints, undermining another key aspect of deliberation: representation of diverse views. We seek to remedy this problem by introducing guarantees of representation into these methods. In particular, we adopt the notion of justified representation (JR) from the social choice literature and incorporate a JR constraint into the comment ranking setting. We find that enforcing JR leads to greater inclusion of diverse viewpoints while still being compatible with optimizing for user engagement or other measures of conversational quality.

arXiv Computer Science @arxiv_cs@qoto.org

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference https://arxiv.org/abs/2503.18964 #cs.SD #cs.AI

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference

Computer interfaces are advancing towards using multi-modalities to enable better human-computer interactions. The use of automatic emotion recognition (AER) can make the interactions natural and meaningful thereby enhancing the user experience. Though speech is the most direct and intuitive modality for AER, it is not reliable because it can be intentionally faked by humans. On the other hand, physiological modalities like EEG, are more reliable and impossible to fake. However, use of EEG is infeasible for realistic scenarios usage because of the need for specialized recording setup. In this paper, one of our primary aims is to ride on the reliability of the EEG modality to facilitate robust AER on the speech modality. Our approach uses both the modalities during training to reliably identify emotion at the time of inference, even in the absence of the more reliable EEG modality. We propose, a two-step joint multi-modal learning approach (JMML) that exploits both the intra- and inter- modal characteristics to construct emotion embeddings that enrich the performance of AER. In the first step, using JEC-SSL, intra-modal learning is done independently on the individual modalities. This is followed by an inter-modal learning using the proposed extended variant of deep canonically correlated cross-modal autoencoder (E-DCC-CAE). The approach learns the joint properties of both the modalities by mapping them into a common representation space, such that the modalities are maximally correlated. These emotion embeddings, hold properties of both the modalities there by enhancing the performance of ML classifier used for AER. Experimental results show the efficacy of the proposed approach. To best of our knowledge, this is the first attempt to combine speech and EEG with joint multi-modal learning approach for reliable AER.

arXiv Computer Science @arxiv_cs@qoto.org

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow https://arxiv.org/abs/2503.18968 #cs.AI

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

Developing reliable AI systems to assist human clinicians in multi-modal medical diagnosis has long been a key objective for researchers. Recently, Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains. With strong reasoning capabilities and the ability to perform diverse tasks based on user instructions, they hold great potential for enhancing medical diagnosis. However, directly applying MLLMs to the medical domain still presents challenges. They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis, which is crucial for medical diagnostics. Additionally, MLLMs often exhibit hallucinations and inconsistencies in reasoning, whereas clinical diagnoses must adhere strictly to established criteria. To address these challenges, we propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses. This is accomplished through a hierarchical workflow: at the task level, knowledge-based reasoning generate reliable diagnostic plans for specific diseases following retrieved clinical criteria. While at the case level, multiple tool agents process multi-modal inputs, analyze different indicators according to the plan, and provide a final diagnosis based on both quantitative and qualitative evidence. Comprehensive experiments on both 2D and 3D medical diagnosis tasks demonstrate the superiority and effectiveness of MedAgent-Pro, while case studies further highlight its reliability and interpretability. The code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018