PersonaAI: Leveraging Retrieval-Augmented Generation and Personalized Context for AI-Driven Digital Avatars arxiv.org/abs/2503.15489 .HC .AI

PersonaAI: Leveraging Retrieval-Augmented Generation and Personalized Context for AI-Driven Digital Avatars

This paper introduces PersonaAI, a cutting-edge application that leverages Retrieval-Augmented Generation (RAG) and the LLAMA model to create highly personalized digital avatars capable of accurately mimicking individual personalities. Designed as a cloud-based mobile application, PersonaAI captures user data seamlessly, storing it in a secure database for retrieval and analysis. The result is a system that provides context-aware, accurate responses to user queries, enhancing the potential of AI-driven personalization. Why should you care? PersonaAI combines the scalability of RAG with the efficiency of prompt-engineered LLAMA3, offering a lightweight, sustainable alternative to traditional large language model (LLM) training methods. The system's novel approach to data collection, utilizing real-time user interactions via a mobile app, ensures enhanced context relevance while maintaining user privacy. By open-sourcing our implementation, we aim to foster adaptability and community-driven development. PersonaAI demonstrates how AI can transform interactions by merging efficiency, scalability, and personalization, making it a significant step forward in the future of digital avatars and personalized AI.

arXiv.org

World of ScoreCraft: Novel Multi Scorer Experiment on the Impact of a Decision Support System in Sleep Staging arxiv.org/abs/2503.15492 .HC .AI

World of ScoreCraft: Novel Multi Scorer Experiment on the Impact of a Decision Support System in Sleep Staging

Manual scoring of polysomnography (PSG) is a time intensive task, prone to inter scorer variability that can impact diagnostic reliability. This study investigates the integration of decision support systems (DSS) into PSG scoring workflows, focusing on their effects on accuracy, scoring time, and potential biases toward recommendations from artificial intelligence (AI) compared to human generated recommendations. Using a novel online scoring platform, we conducted a repeated measures study with sleep technologists, who scored traditional and self applied PSGs. Participants were occasionally presented with recommendations labeled as either human or AI generated. We found that traditional PSGs tended to be scored slightly more accurately than self applied PSGs, but this difference was not statistically significant. Correct recommendations significantly improved scoring accuracy for both PSG types, while incorrect recommendations reduced accuracy. No significant bias was observed toward or against AI generated recommendations compared to human generated recommendations. These findings highlight the potential of AI to enhance PSG scoring reliability. However, ensuring the accuracy of AI outputs is critical to maximizing its benefits. Future research should explore the long term impacts of DSS on scoring workflows and strategies for integrating AI in clinical practice.

arXiv.org

AI-Powered Assistive Technologies for Visual Impairment arxiv.org/abs/2503.15494 .HC .AI .CY

AI-Powered Assistive Technologies for Visual Impairment

Artificial Intelligence (AI) is revolutionizing assistive technologies. It offers innovative solutions to enhance the quality of life for individuals with visual impairments. This review examines the development, applications, and impact of AI-powered tools in key domains, such as computer vision, natural language processing (NLP), and wearable devices. Specific advancements include object recognition for identifying everyday items, scene description for understanding surroundings, and NLP-driven text-to-speech systems for accessing digital information. Assistive technologies like smart glasses, smartphone applications, and AI-enabled navigation aids are discussed, demonstrating their ability to support independent travel, facilitate social interaction, and increase access to education and employment opportunities. The integration of deep learning models, multimodal interfaces, and real-time data processing has transformed the functionality and usability of these tools, fostering inclusivity and empowerment. This article also addresses critical challenges, including ethical considerations, affordability, and adaptability in diverse environments. Future directions highlight the need for interdisciplinary collaboration to refine these technologies, ensuring equitable access and sustainable innovation. By providing a comprehensive overview, this review underscores AI's transformative potential in promoting independence, enhancing accessibility, and fostering social inclusion for visually impaired individuals.

arXiv.org

Entwicklung einer Webanwendung zur Generierung von skolemisierten RDF Daten f\"ur die Verwaltung von Lieferketten arxiv.org/abs/2503.15495 .HC .AI

Entwicklung einer Webanwendung zur Generierung von skolemisierten RDF Daten für die Verwaltung von Lieferketten

Für eine frühzeitige Erkennung von Lieferengpässen müssen Lieferketten in einer geeigneten digitalen Form vorliegen, damit sie verarbeitet werden können. Der für die Datenmodellierung benötigte Arbeitsaufwand ist jedoch, gerade IT-fremden Personen, nicht zuzumuten. Es wurde deshalb im Rahmen dieser Arbeit eine Webanwendung entwickelt, welche die zugrunde liegende Komplexität für den Benutzer verschleiern soll. Konkret handelt es sich dabei um eine grafische Benutzeroberfläche, auf welcher Templates instanziiert und miteinander verknüpft werden können. Für die Definition dieser Templates wurden in dieser Arbeit geeignete Konzepte erarbeitet und erweitert. Zur Erhebung der Benutzerfreundlichkeit der Webanwendung wurde abschließend eine Nutzerstudie mit mehreren Testpersonen durchgeführt. Diese legte eine Vielzahl von nützlichen Verbesserungsvorschlägen offen. -- For early detection of supply bottlenecks, supply chains must be available in a suitable digital form so that they can be processed. However, the amount of work required for data modeling cannot be expected of people who are not familiar with IT topics. Therefore, a web application was developed in the context of this thesis, which is supposed to disguise the underlying complexity for the user. Specifically, this is a graphical user interface on which templates can be instantiated and linked to each other. Suitable concepts for the definition of these templates were developed and extended in this thesis. Finally, a user study with several test persons was conducted to determine the usability of the web application. This revealed a large number of useful suggestions for improvement.

arXiv.org

The Impact of Big Five Personality Traits on AI Agent Decision-Making in Public Spaces: A Social Simulation Study arxiv.org/abs/2503.15497 .HC .AI .CY

The Impact of Big Five Personality Traits on AI Agent Decision-Making in Public Spaces: A Social Simulation Study

This study investigates how the Big Five personality traits influence decision-making processes in AI agents within public spaces. Using AgentVerse framework and GPT-3.5-turbo, we simulated interactions among 10 AI agents, each embodying different dimensions of the Big Five personality traits, in a classroom environment responding to misinformation. The experiment assessed both public expressions ([Speak]) and private thoughts ([Think]) of agents, revealing significant correlations between personality traits and decision-making patterns. Results demonstrate that Openness to Experience had the strongest impact on information acceptance, with curious agents showing high acceptance rates and cautious agents displaying strong skepticism. Extraversion and Conscientiousness also showed notable influence on decision-making, while Neuroticism and Agreeableness exhibited more balanced responses. Additionally, we observed significant discrepancies between public expressions and private thoughts, particularly in agents with friendly and extroverted personalities, suggesting that social context influences decision-making behavior. Our findings contribute to understanding how personality traits shape AI agent behavior in social settings and have implications for developing more nuanced and context-aware AI systems.

arXiv.org

Synthetic Data Generation of Body Motion Data by Neural Gas Network for Emotion Recognition arxiv.org/abs/2503.14513 .IV .CV .AI

Synthetic Data Generation of Body Motion Data by Neural Gas Network for Emotion Recognition

In the domain of emotion recognition using body motion, the primary challenge lies in the scarcity of diverse and generalizable datasets. Automatic emotion recognition uses machine learning and artificial intelligence techniques to recognize a person's emotional state from various data types, such as text, images, sound, and body motion. Body motion poses unique challenges as many factors, such as age, gender, ethnicity, personality, and illness, affect its appearance, leading to a lack of diverse and robust datasets specifically for emotion recognition. To address this, employing Synthetic Data Generation (SDG) methods, such as Generative Adversarial Networks (GANs) and Variational Auto Encoders (VAEs), offers potential solutions, though these methods are often complex. This research introduces a novel application of the Neural Gas Network (NGN) algorithm for synthesizing body motion data and optimizing diversity and generation speed. By learning skeletal structure topology, the NGN fits the neurons or gas particles on body joints. Generated gas particles, which form the skeletal structure later on, will be used to synthesize the new body posture. By attaching body postures over frames, the final synthetic body motion appears. We compared our generated dataset against others generated by GANs, VAEs, and another benchmark algorithm, using benchmark metrics such as Fréchet Inception Distance (FID), Diversity, and a few more. Furthermore, we continued evaluation using classification metrics such as accuracy, precision, recall, and a few others. Joint-related features or kinematic parameters were extracted, and the system assessed model performance against unseen data. Our findings demonstrate that the NGN algorithm produces more realistic and emotionally distinct body motion data and does so with more synthesizing speed than existing methods.

arXiv.org

AI Work Quantization Model: Closed-System AI Computational Effort Metric arxiv.org/abs/2503.14515 .PF .CE

AI Work Quantization Model: Closed-System AI Computational Effort Metric

The rapid adoption of AI-driven automation in IoT environments, particularly in smart cities and industrial systems, necessitates a standardized approach to quantify AIs computational workload. Existing methodologies lack a consistent framework for measuring AI computational effort across diverse architectures, posing challenges in fair taxation models and energy-aware workload assessments. This study introduces the Closed-System AI Computational Effort Metric, a theoretical framework that quantifies real-time computational effort by incorporating input/output complexity, execution dynamics, and hardware-specific performance factors. The model ensures comparability between AI workloads across traditional CPUs and modern GPU/TPU accelerators, facilitating standardized performance evaluations. Additionally, we propose an energy-aware extension to assess AIs environmental impact, enabling sustainability-focused AI optimizations and equitable taxation models. Our findings establish a direct correlation between AI workload and human productivity, where 5 AI Workload Units equate to approximately 60 to 72 hours of human labor, exceeding a full-time workweek. By systematically linking AI computational effort to human labor, this framework enhances the understanding of AIs role in workforce automation, industrial efficiency, and sustainable computing. Future work will focus on refining the model through dynamic workload adaptation, complexity normalization, and energy-aware AI cost estimation, further broadening its applicability in diverse AI-driven ecosystems.

arXiv.org

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control arxiv.org/abs/2503.14517 .CV .AI

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Speech-driven 3D talking face method should offer both accurate lip synchronization and controllable expressions. Previous methods solely adopt discrete emotion labels to globally control expressions throughout sequences while limiting flexible fine-grained facial control within the spatiotemporal domain. We propose a diffusion-transformer-based 3D talking face generation model, Cafe-Talk, which simultaneously incorporates coarse- and fine-grained multimodal control conditions. Nevertheless, the entanglement of multiple conditions challenges achieving satisfying performance. To disentangle speech audio and fine-grained conditions, we employ a two-stage training pipeline. Specifically, Cafe-Talk is initially trained using only speech audio and coarse-grained conditions. Then, a proposed fine-grained control adapter gradually adds fine-grained instructions represented by action units (AUs), preventing unfavorable speech-lip synchronization. To disentangle coarse- and fine-grained conditions, we design a swap-label training mechanism, which enables the dominance of the fine-grained conditions. We also devise a mask-based CFG technique to regulate the occurrence and intensity of fine-grained control. In addition, a text-based detector is introduced with text-AU alignment to enable natural language user input and further support multimodal control. Extensive experimental results prove that Cafe-Talk achieves state-of-the-art lip synchronization and expressiveness performance and receives wide acceptance in fine-grained control in user studies. Project page: https://harryxd2018.github.io/cafe-talk/

arXiv.org

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis arxiv.org/abs/2503.14526 .CV .GR .RO

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

Vision-language-action (VLA) models present a promising paradigm by training policies directly on real robot datasets like Open X-Embodiment. However, the high cost of real-world data collection hinders further data scaling, thereby restricting the generalizability of VLAs. In this paper, we introduce ReBot, a novel real-to-sim-to-real approach for scaling real robot datasets and adapting VLA models to target domains, which is the last-mile deployment challenge in robot manipulation. Specifically, ReBot replays real-world robot trajectories in simulation to diversify manipulated objects (real-to-sim), and integrates the simulated movements with inpainted real-world background to synthesize physically realistic and temporally consistent robot videos (sim-to-real). Our approach has several advantages: 1) it enjoys the benefit of real data to minimize the sim-to-real gap; 2) it leverages the scalability of simulation; and 3) it can generalize a pretrained VLA to a target domain with fully automated data pipelines. Extensive experiments in both simulation and real-world environments show that ReBot significantly enhances the performance and robustness of VLAs. For example, in SimplerEnv with the WidowX robot, ReBot improved the in-domain performance of Octo by 7.2% and OpenVLA by 21.8%, and out-of-domain generalization by 19.9% and 9.4%, respectively. For real-world evaluation with a Franka robot, ReBot increased the success rates of Octo by 17% and OpenVLA by 20%. More information can be found at: https://yuffish.github.io/rebot/

arXiv.org

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders arxiv.org/abs/2503.14530 .CV .AI

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Unlearning methods for vision-language models (VLMs) have primarily adapted techniques from large language models (LLMs), relying on weight updates that demand extensive annotated forget sets. Moreover, these methods perform unlearning at a coarse granularity, often leading to excessive forgetting and reduced model utility. To address this issue, we introduce SAUCE, a novel method that leverages sparse autoencoders (SAEs) for fine-grained and selective concept unlearning in VLMs. Briefly, SAUCE first trains SAEs to capture high-dimensional, semantically rich sparse features. It then identifies the features most relevant to the target concept for unlearning. During inference, it selectively modifies these features to suppress specific concepts while preserving unrelated information. We evaluate SAUCE on two distinct VLMs, LLaVA-v1.5-7B and LLaMA-3.2-11B-Vision-Instruct, across two types of tasks: concrete concept unlearning (objects and sports scenes) and abstract concept unlearning (emotions, colors, and materials), encompassing a total of 60 concepts. Extensive experiments demonstrate that SAUCE outperforms state-of-the-art methods by 18.04% in unlearning quality while maintaining comparable model utility. Furthermore, we investigate SAUCE's robustness against widely used adversarial attacks, its transferability across models, and its scalability in handling multiple simultaneous unlearning requests. Our findings establish SAUCE as an effective and scalable solution for selective concept unlearning in VLMs.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.