arXiv Computer Science @arxiv_cs@qoto.org

1.13K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.13K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions https://arxiv.org/abs/2409.02111 #cs.LG

Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions

Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the search for energy-efficient alternatives. Inspired by the human brain, spiking neural networks (SNNs) promise energy-efficient computation with event-driven spikes. To provide future directions toward building energy-efficient large SNN models, we present a survey of existing methods for developing deep spiking neural networks, with a focus on emerging Spiking Transformers. Our main contributions are as follows: (1) an overview of learning methods for deep spiking neural networks, categorized by ANN-to-SNN conversion and direct training with surrogate gradients; (2) an overview of network architectures for deep spiking neural networks, categorized by deep convolutional neural networks (DCNNs) and Transformer architecture; and (3) a comprehensive comparison of state-of-the-art deep SNNs with a focus on emerging Spiking Transformers. We then further discuss and outline future directions toward large-scale SNNs.

arXiv Computer Science @arxiv_cs@qoto.org

Tiny-Toxic-Detector: A compact transformer-based model for toxic content detection https://arxiv.org/abs/2409.02114 #cs.LG #cs.AI #cs.CL

Tiny-Toxic-Detector: A compact transformer-based model for toxic content detection

This paper presents Tiny-toxic-detector, a compact transformer-based model designed for toxic content detection. Despite having only 2.1 million parameters, Tiny-toxic-detector achieves competitive performance on benchmark datasets, with 90.97% accuracy on ToxiGen and 86.98% accuracy on the Jigsaw dataset, rivaling models over 50 times its size. This efficiency enables deployment in resource-constrained environments, addressing the need for effective content moderation tools that balance performance with computational efficiency. The model architecture features 4 transformer encoder layers, each with 2 attention heads, an embedding dimension of 64, and a feedforward dimension of 128. Trained on both public and private datasets, Tiny-toxic-detector demonstrates the potential of efficient, task-specific models for addressing online toxicity. The paper covers the model architecture, training process, performance benchmarks, and limitations, underscoring its suitability for applications such as social media monitoring and content moderation. By achieving results comparable to much larger models while significantly reducing computational demands, Tiny-toxic-detector represents progress toward more sustainable and scalable AI-driven content moderation solutions.

arXiv Computer Science @arxiv_cs@qoto.org

Deep Neural Implicit Representation of Accessibility for Multi-Axis Manufacturing https://arxiv.org/abs/2409.02115 #cs.GR #cs.CE #cs.LG #cs.RO

Deep Neural Implicit Representation of Accessibility for Multi-Axis Manufacturing

One of the main concerns in design and process planning for multi-axis additive and subtractive manufacturing is collision avoidance between moving objects (e.g., tool assemblies) and stationary objects (e.g., a part unified with fixtures). The collision measure for various pairs of relative rigid translations and rotations between the two pointsets can be conceptualized by a compactly supported scalar field over the 6D non-Euclidean configuration space. Explicit representation and computation of this field is costly in both time and space. If we fix $O(m)$ sparsely sampled rotations (e.g., tool orientations), computation of the collision measure field as a convolution of indicator functions of the 3D pointsets over a uniform grid (i.e., voxelized geometry) of resolution $O(n^3)$ via fast Fourier transforms (FFTs) scales as in $O(mn^3 \log n)$ in time and $O(mn^3)$ in space. In this paper, we develop an implicit representation of the collision measure field via deep neural networks (DNNs). We show that our approach is able to accurately interpolate the collision measure from a sparse sampling of rotations, and can represent the collision measure field with a small memory footprint. Moreover, we show that this representation can be efficiently updated through fine-tuning to more efficiently train the network on multi-resolution data, as well as accommodate incremental changes to the geometry (such as might occur in iterative processes such as topology optimization of the part subject to CNC tool accessibility constraints).

arXiv Computer Science @arxiv_cs@qoto.org

UFO, universe, reptilians and creatures communities on Brazilian Telegram: when the sky is not the limit and conspiracy theories seek answers beyond humanity https://arxiv.org/abs/2409.02117 #cs.CY

UFO, universe, reptilians and creatures communities on Brazilian Telegram: when the sky is not the limit and conspiracy theories seek answers beyond humanity

Interest in extraterrestrial phenomena and conspiracy theories involving UFOs and reptilians has been growing on Brazilian Telegram, especially in times of global uncertainty, such as during the COVID-19 pandemic. Therefore, this study aims to address the research question: how are Brazilian conspiracy theory communities on UFO, universe, reptilians and creatures topics characterized and articulated on Telegram? It is worth noting that this study is part of a series of seven studies whose main objective is to understand and characterize Brazilian conspiracy theory communities on Telegram. This series of seven studies is openly and originally available on arXiv at Cornell University, applying a mirrored method across the seven studies, changing only the thematic object of analysis and providing investigation replicability, including with proprietary and authored codes, adding to the culture of free and open-source software. Regarding the main findings of this study, the following were observed: UFO communities act as gateways for theories about reptilians, connecting narratives of global control with extraterrestrial beings; Discussions about UFOs and the universe grew significantly during the Pandemic, reflecting a renewed interest in extraterrestrial phenomena; Reptilians remain a significant subculture within conspiracy theories, with a notable growth during the Pandemic; The thematic overlap between UFOs, reptilians and esotericism reveals a cohesive ecosystem of disinformation, making factual correction a challenge; UFO communities function as amplifiers of other conspiracy theories, connecting different themes and strengthening the disinformation network.

arXiv Computer Science @arxiv_cs@qoto.org

TSO: Self-Training with Scaled Preference Optimization https://arxiv.org/abs/2409.02118 #cs.LG #cs.AI #cs.CL

TSO: Self-Training with Scaled Preference Optimization

Enhancing the conformity of large language models (LLMs) to human preferences remains an ongoing research challenge. Recently, offline approaches such as Direct Preference Optimization (DPO) have gained prominence as attractive options due to offering effective improvement in simple, efficient, and stable without interactions with reward models. However, these offline preference optimization methods highly rely on the quality of pairwise preference samples. Meanwhile, numerous iterative methods require additional training of reward models to select positive and negative samples from the model's own generated responses for preference learning. Furthermore, as LLMs' capabilities advance, it is quite challenging to continuously construct high-quality positive and negative preference instances from the model's outputs due to the lack of diversity. To tackle these challenges, we propose TSO, or Self-Training with Scaled Preference Optimization, a framework for preference optimization that conducts self-training preference learning without training an additional reward model. TSO enhances the diversity of responses by constructing a model matrix and incorporating human preference responses. Furthermore, TSO introduces corrections for model preference errors through human and AI feedback. Finally, TSO adopts iterative and dual clip reward strategies to update the reference model and its responses, adaptively adjusting preference data and balancing the optimization process. Experimental results demonstrate that TSO outperforms existing mainstream methods on various alignment evaluation benchmarks, providing practical insight into preference data construction and model training strategies in the alignment domain.

arXiv Computer Science @arxiv_cs@qoto.org

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models https://arxiv.org/abs/2409.02119 #cs.LG #cs.AI #cs.CL

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in LoRA might be focused on its fine-tuning methodologies, with not as much exploration as might be expected into further compression of LoRA. Since most of LoRA's parameters might still be superfluous, this may lead to unnecessary wastage of computational resources. In this paper, we propose \textbf{CoRA}: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our two-fold method includes (1) Freezing the substitute matrix $B$ to halve parameters while training matrix $A$ for specific tasks and (2) Using the substitute matrix $B$ as an enhanced initial state for the original matrix $B$, achieving improved results with the same parameters. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters. At the same time, the second approach has some improvements compared to LoRA's original fine-tuning performance. They generally attest to the effectiveness of our work.

arXiv Computer Science @arxiv_cs@qoto.org

Deep Knowledge-Infusion For Explainable Depression Detection https://arxiv.org/abs/2409.02122 #cs.LG #cs.AI #cs.CL

Deep Knowledge-Infusion For Explainable Depression Detection

Discovering individuals depression on social media has become increasingly important. Researchers employed ML/DL or lexicon-based methods for automated depression detection. Lexicon based methods, explainable and easy to implement, match words from user posts in a depression dictionary without considering contexts. While the DL models can leverage contextual information, their black-box nature limits their adoption in the domain. Though surrogate models like LIME and SHAP can produce explanations for DL models, the explanations are suitable for the developer and of limited use to the end user. We propose a Knolwedge-infused Neural Network (KiNN) incorporating domain-specific knowledge from DepressionFeature ontology (DFO) in a neural network to endow the model with user-level explainability regarding concepts and processes the clinician understands. Further, commonsense knowledge from the Commonsense Transformer (COMET) trained on ATOMIC is also infused to consider the generic emotional aspects of user posts in depression detection. The model is evaluated on three expertly curated datasets related to depression. We observed the model to have a statistically significant (p<0.1) boost in performance over the best domain-specific model, MentalBERT, across CLEF e-Risk (25% MCC increase, 12% F1 increase). A similar trend is observed across the PRIMATE dataset, where the proposed model performed better than MentalBERT (2.5% MCC increase, 19% F1 increase). The observations confirm the generated explanations to be informative for MHPs compared to post hoc model explanations. Results demonstrated that the user-level explainability of KiNN also surpasses the performance of baseline models and can provide explanations where other baselines fall short. Infusing the domain and commonsense knowledge in KiNN enhances the ability of models like GPT-3.5 to generate application-relevant explanations.

arXiv Computer Science @arxiv_cs@qoto.org

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks https://arxiv.org/abs/2409.02123 #physics.ao-ph #cs.LG #cs.AI

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechanisms within the convolutional layers enhances the model's capacity to capture fine-grained spatial details, thereby improving its predictive accuracy for meteorological phenomena. We introduce PuYun, comprising PuYun-Short for 0-5 day forecasts and PuYun-Medium for 5-10 day predictions. This approach enhances the accuracy of 10-day weather forecasting. Through evaluation, we demonstrate that PuYun-Short alone surpasses the performance of both GraphCast and FuXi-Short in generating accurate 10-day forecasts. Specifically, on the 10th day, PuYun-Short reduces the RMSE for Z500 to 720 $m^2/s^2$, compared to 732 $m^2/s^2$ for GraphCast and 740 $m^2/s^2$ for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638 $m^2/s^2$, compared to 641 $m^2/s^2$ for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.

arXiv Computer Science @arxiv_cs@qoto.org

TrajWeaver: Trajectory Recovery with State Propagation Diffusion Model https://arxiv.org/abs/2409.02124 #cs.LG #cs.AI

TrajWeaver: Trajectory Recovery with State Propagation Diffusion Model

With the proliferation of location-aware devices, large amount of trajectories have been generated when agents such as people, vehicles and goods flow around the urban environment. These raw trajectories, typically collected from various sources such as GPS in cars, personal mobile devices, and public transport, are often sparse and fragmented due to limited sampling rates, infrastructure coverage and data loss. In this context, trajectory recovery aims to reconstruct such sparse raw trajectories into their dense and continuous counterparts, so that fine-grained movement of agents across space and time can be captured faithfully. Existing trajectory recovery approaches typically rely on the prior knowledge of travel mode or motion patterns, and often fail in densely populated urban areas where accurate maps are absent. In this paper, we present a new recovery framework called TrajWeaver based on probabilistic diffusion models, which is able to recover dense and refined trajectories from the sparse raw ones, conditioned on various auxiliary features such as Areas of Interest along the way, user identity and waybill information. The core of TrajWeaver is a novel State Propagation Diffusion Model (SPDM), which introduces a new state propagation mechanism on top of the standard diffusion models, so that knowledge computed in earlier diffusion steps can be reused later, improving the recovery performance while reducing the number of steps needed. Extensive experiments show that the proposed TrajWeaver can recover from raw trajectories of various lengths, sparsity levels and heterogeneous travel modes, and outperform the state-of-the-art baselines significantly in recovery accuracy. Our code is available at: https://anonymous.4open.science/r/TrajWeaver/

arXiv Computer Science @arxiv_cs@qoto.org

Detecting Homeomorphic 3-manifolds via Graph Neural Networks https://arxiv.org/abs/2409.02126 #hep-th #cs.LG

Detecting Homeomorphic 3-manifolds via Graph Neural Networks

Motivated by the enumeration of the BPS spectra of certain 3d $\mathcal{N}=2$ supersymmetric quantum field theories, obtained from the compactification of 6d superconformal field theories on three-manifolds, we study the homeomorphism problem for a class of graph-manifolds using Graph Neural Network techniques. Utilizing the JSJ decomposition, a unique representation via a plumbing graph is extracted from a graph-manifold. Homeomorphic graph-manifolds are related via a sequence of von Neumann moves on this graph; the algorithmic application of these moves can determine if two graphs correspond to homeomorphic graph-manifolds in super-polynomial time. However, by employing Graph Neural Networks (GNNs), the same problem can be addressed, at the cost of accuracy, in polynomial time. We build a dataset composed of pairs of plumbing graphs, together with a hidden label encoding whether the pair is homeomorphic. We train and benchmark a variety of network architectures within a supervised learning setting by testing different combinations of two convolutional layers (GEN, GCN, GAT, NNConv), followed by an aggregation layer and a classification layer. We discuss the strengths and weaknesses of the different GNNs for this homeomorphism problem.

arXiv Computer Science @arxiv_cs@qoto.org

Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy https://arxiv.org/abs/2409.00001 #cs.CV #cs.AI #cs.LG

Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy

Early detection of Cerebral Palsy (CP) is crucial for effective intervention and monitoring. This paper tests the reliability and applicability of Explainable AI (XAI) methods using a deep learning method that predicts CP by analyzing skeletal data extracted from video recordings of infant movements. Specifically, we use XAI evaluation metrics -- namely faithfulness and stability -- to quantitatively assess the reliability of Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in this specific medical application. We utilize a unique dataset of infant movements and apply skeleton data perturbations without distorting the original dynamics of the infant movements. Our CP prediction model utilizes an ensemble approach, so we evaluate the XAI metrics performances for both the overall ensemble and the individual models. Our findings indicate that both XAI methods effectively identify key body points influencing CP predictions and that the explanations are robust against minor data perturbations. Grad-CAM significantly outperforms CAM in the RISv metric, which measures stability in terms of velocity. In contrast, CAM performs better in the RISb metric, which relates to bone stability, and the RRS metric, which assesses internal representation robustness. Individual models within the ensemble show varied results, and neither CAM nor Grad-CAM consistently outperform the other, with the ensemble approach providing a representation of outcomes from its constituent models.

arXiv Computer Science @arxiv_cs@qoto.org

Spatio-Temporal Communication Compression for Distributed Prime-Dual Optimization https://arxiv.org/abs/2409.00002 #eess.SY #cs.SY

Spatio-Temporal Communication Compression for Distributed Prime-Dual Optimization

In this paper, for the problem of distributed computing, we propose a general spatio-temporal compressor and discuss its compression methods. This compressor comprehensively considers both temporal and spatial information, encompassing many existing specific compressors. We use the average consensus algorithm as a starting point and further studies distributed optimization algorithms, the Prime-Dual algorithm as an example, in both continuous and discrete time forms. We find that under stronger additional assumptions, the spatio-temporal compressor can be directly applied to distributed computing algorithms, while its default form can also be successfully applied through observer-based differential compression methods, ensuring the linear convergence of the algorithm when the objective function is strongly convex. On this basis, we also discuss the acceleration of the algorithm, filter-based compression methods in the literature, and the addition of randomness to the spatio-temporal compressor. Finally, numerical simulations illustrate the generality of the spatio-temporal compressor, compare different compression methods, and verify the algorithm's performance in the convex objective function scenario.

arXiv Computer Science @arxiv_cs@qoto.org

Enhancing Trustworthiness and Minimising Bias Issues in Leveraging Social Media Data for Disaster Management Response https://arxiv.org/abs/2409.00004 #cs.CY #cs.SI

Enhancing Trustworthiness and Minimising Bias Issues in Leveraging Social Media Data for Disaster Management Response

Disaster events often unfold rapidly, necessitating a swift and effective response. Developing action plans, resource allocation, and resolution of help requests in disaster scenarios is time-consuming and complex since disaster-relevant information is often uncertain. Leveraging real-time data can significantly deal with data uncertainty and enhance disaster response efforts. To deal with real-time data uncertainty, social media appeared as an alternative effective source of real-time data as there has been extensive use of social media during and after the disasters. However, it also brings forth challenges regarding trustworthiness and bias in these data. To fully leverage social media data for disaster management, it becomes crucial to mitigate biases that may arise due to specific disaster types or regional contexts. Additionally, the presence of misinformation within social media data raises concerns about the reliability of data sources, potentially impeding actionable insights and leading to improper resource utilization. To overcome these challenges, our research aimed to investigate how to ensure trustworthiness and address biases in social media data. We aim to investigate and identify the factors that can be used to enhance trustworthiness and minimize bias to make an efficient and scalable disaster management system utilizing real-time social media posts, identify disaster-related keywords, and assess the severity of the disaster. By doing so, the integration of real-time social data can improve the speed and accuracy of disaster management systems

arXiv Computer Science @arxiv_cs@qoto.org

Csi-LLM: A Novel Downlink Channel Prediction Method Aligned with LLM Pre-Training https://arxiv.org/abs/2409.00005 #math.IT #cs.IT #cs.AI

Csi-LLM: A Novel Downlink Channel Prediction Method Aligned with LLM Pre-Training

Downlink channel temporal prediction is a critical technology in massive multiple-input multiple-output (MIMO) systems. However, existing methods that rely on fixed-step historical sequences significantly limit the accuracy, practicality, and scalability of channel prediction. Recent advances have shown that large language models (LLMs) exhibit strong pattern recognition and reasoning abilities over complex sequences. The challenge lies in effectively aligning wireless communication data with the modalities used in natural language processing to fully harness these capabilities. In this work, we introduce Csi-LLM, a novel LLM-powered downlink channel prediction technique that models variable-step historical sequences. To ensure effective cross-modality application, we align the design and training of Csi-LLM with the processing of natural language tasks, leveraging the LLM's next-token generation capability for predicting the next step in channel state information (CSI). Simulation results demonstrate the effectiveness of this alignment strategy, with Csi-LLM consistently delivering stable performance improvements across various scenarios and showing significant potential in continuous multi-step prediction.

arXiv Computer Science @arxiv_cs@qoto.org

Applying Deep Neural Networks to automate visual verification of manual bracket installations in aerospace https://arxiv.org/abs/2409.00006 #cs.CV

Applying Deep Neural Networks to automate visual verification of manual bracket installations in aerospace

In this work, we explore a deep learning based automated visual inspection and verification algorithm, based on the Siamese Neural Network architecture. Consideration is also given to how the input pairs of images can affect the performance of the Siamese Neural Network. The Siamese Neural Network was explored alongside Convolutional Neural Networks. In addition to investigating these model architectures, additional methods are explored including transfer learning and ensemble methods, with the aim of improving model performance. We develop a novel voting scheme specific to the Siamese Neural Network which sees a single model vote on multiple reference images. This differs from the typical ensemble approach of multiple models voting on the same data sample. The results obtained show great potential for the use of the Siamese Neural Network for automated visual inspection and verification tasks when there is a scarcity of training data available. The additional methods applied, including the novel similarity voting, are also seen to significantly improve the performance of the model. We apply the publicly available omniglot dataset to validate our approach. According to our knowledge, this is the first time a detailed study of this sort has been carried out in the automatic verification of installed brackets in the aerospace sector via Deep Neural Networks.

arXiv Computer Science @arxiv_cs@qoto.org

Web Retrieval Agents for Evidence-Based Misinformation Detection https://arxiv.org/abs/2409.00009 #cs.IR #cs.AI

Web Retrieval Agents for Evidence-Based Misinformation Detection

This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the macro F1 of misinformation detection by as much as 20 percent compared to LLMs without search. We also conduct extensive analyses on the sources our system leverages and their biases, decisions in the construction of the system like the search tool and the knowledge base, the type of evidence needed and its impact on the results, and other parts of the overall process. By combining strong performance with in-depth understanding, we hope to provide building blocks for future search-enabled misinformation mitigation systems.

arXiv Computer Science @arxiv_cs@qoto.org

Evolving Text Data Stream Mining https://arxiv.org/abs/2409.00010 #cs.IR #cs.CL

Evolving Text Data Stream Mining

A text stream is an ordered sequence of text documents generated over time. A massive amount of such text data is generated by online social platforms every day. Designing an algorithm for such text streams to extract useful information is a challenging task due to unique properties of the stream such as infinite length, data sparsity, and evolution. Thereby, learning useful information from such streaming data under the constraint of limited time and memory has gained increasing attention. During the past decade, although many text stream mining algorithms have proposed, there still exists some potential issues. First, high-dimensional text data heavily degrades the learning performance until the model either works on subspace or reduces the global feature space. The second issue is to extract semantic text representation of documents and capture evolving topics over time. Moreover, the problem of label scarcity exists, whereas existing approaches work on the full availability of labeled data. To deal with these issues, in this thesis, new learning models are proposed for clustering and multi-label learning on text streams.

arXiv Computer Science @arxiv_cs@qoto.org

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning https://arxiv.org/abs/2409.00012 #cs.HC #cs.AI

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

This work presents an audio-visual interactive chatbot (AVIN-Chat) system that allows users to have face-to-face conversations with 3D avatars in real-time. Compared to the previous chatbot services, which provide text-only or speech-only communications, the proposed AVIN-Chat can offer audio-visual communications providing users with a superior experience quality. In addition, the proposed AVIN-Chat emotionally speaks and expresses according to the user's emotional state. Thus, it enables users to establish a strong bond with the chatbot system, increasing the user's immersion. Through user subjective tests, it is demonstrated that the proposed system provides users with a higher sense of immersion than previous chatbot systems. The demonstration video is available at https://www.youtube.com/watch?v=Z74uIV9k7_k.

arXiv Computer Science @arxiv_cs@qoto.org

DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction https://arxiv.org/abs/2409.00014 #cs.CV #cs.AI

DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction

Diverse human motion prediction (HMP) aims to predict multiple plausible future motions given an observed human motion sequence. It is a challenging task due to the diversity of potential human motions while ensuring an accurate description of future human motions. Current solutions are either low-diversity or limited in expressiveness. Recent denoising diffusion models (DDPM) hold potential generative capabilities in generative tasks. However, introducing DDPM directly into diverse HMP incurs some issues. Although DDPM can increase the diversity of the potential patterns of human motions, the predicted human motions become implausible over time because of the significant noise disturbances in the forward process of DDPM. This phenomenon leads to the predicted human motions being hard to control, seriously impacting the quality of predicted motions and restricting their practical applicability in real-world scenarios. To alleviate this, we propose a novel conditional diffusion-based generative model, called DivDiff, to predict more diverse and realistic human motions. Specifically, the DivDiff employs DDPM as our backbone and incorporates Discrete Cosine Transform (DCT) and transformer mechanisms to encode the observed human motion sequence as a condition to instruct the reverse process of DDPM. More importantly, we design a diversified reinforcement sampling function (DRSF) to enforce human skeletal constraints on the predicted human motions. DRSF utilizes the acquired information from human skeletal as prior knowledge, thereby reducing significant disturbances introduced during the forward process. Extensive results received in the experiments on two widely-used datasets (Human3.6M and HumanEva-I) demonstrate that our model obtains competitive performance on both diversity and accuracy.

arXiv Computer Science @arxiv_cs@qoto.org

Navigating the sociotechnical labyrinth: Dynamic certification for responsible embodied AI https://arxiv.org/abs/2409.00015 #eess.SY #cs.CY #cs.AI #cs.SY

Navigating the sociotechnical labyrinth: Dynamic certification for responsible embodied AI

Sociotechnical requirements shape the governance of artificially intelligent (AI) systems. In an era where embodied AI technologies are rapidly reshaping various facets of contemporary society, their inherent dynamic adaptability presents a unique blend of opportunities and challenges. Traditional regulatory mechanisms, often designed for static -- or slower-paced -- technologies, find themselves at a crossroads when faced with the fluid and evolving nature of AI systems. Moreover, typical problems in AI, for example, the frequent opacity and unpredictability of the behaviour of the systems, add additional sociotechnical challenges. To address these interconnected issues, we introduce the concept of dynamic certification, an adaptive regulatory framework specifically crafted to keep pace with the continuous evolution of AI systems. The complexity of these challenges requires common progress in multiple domains: technical, socio-governmental, and regulatory. Our proposed transdisciplinary approach is designed to ensure the safe, ethical, and practical deployment of AI systems, aligning them bidirectionally with the real-world contexts in which they operate. By doing so, we aim to bridge the gap between rapid technological advancement and effective regulatory oversight, ensuring that AI systems not only achieve their intended goals but also adhere to ethical standards and societal values.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018