Show newer

Unleash The Power of Pre-Trained Language Models for Irregularly Sampled Time Series arxiv.org/abs/2408.08328 .AP .AI .LG

Unleash The Power of Pre-Trained Language Models for Irregularly Sampled Time Series

Pre-trained Language Models (PLMs), such as ChatGPT, have significantly advanced the field of natural language processing. This progress has inspired a series of innovative studies that explore the adaptation of PLMs to time series analysis, intending to create a unified foundation model that addresses various time series analytical tasks. However, these efforts predominantly focus on Regularly Sampled Time Series (RSTS), neglecting the unique challenges posed by Irregularly Sampled Time Series (ISTS), which are characterized by non-uniform sampling intervals and prevalent missing data. To bridge this gap, this work explores the potential of PLMs for ISTS analysis. We begin by investigating the effect of various methods for representing ISTS, aiming to maximize the efficacy of PLMs in this under-explored area. Furthermore, we present a unified PLM-based framework, ISTS-PLM, which integrates time-aware and variable-aware PLMs tailored for comprehensive intra and inter-time series modeling and includes a learnable input embedding layer and a task-specific output layer to tackle diverse ISTS analytical tasks. Extensive experiments on a comprehensive benchmark demonstrate that the ISTS-PLM, utilizing a simple yet effective series-based representation for ISTS, consistently achieves state-of-the-art performance across various analytical tasks, such as classification, interpolation, and extrapolation, as well as few-shot and zero-shot learning scenarios, spanning scientific domains like healthcare and biomechanics.

arxiv.org

CodeMirage: Hallucinations in Code Generated by Large Language Models arxiv.org/abs/2408.08333 .SE .AI .CL

CodeMirage: Hallucinations in Code Generated by Large Language Models

Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. However, LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect. Although there has been a recent surge in research on LLM hallucinations for text generation, similar hallucination phenomenon can happen in code generation. Sometimes the generated code can have syntactical or logical errors as well as more advanced issues like security vulnerabilities, memory leaks, etc. Given the wide adaptation of LLMs to enhance efficiency in code generation and development in general, it becomes imperative to investigate hallucinations in code generation. To the best of our knowledge, this is the first attempt at studying hallucinations in the code generated by LLMs. We start by introducing the code hallucination definition and a comprehensive taxonomy of code hallucination types. We propose the first benchmark CodeMirage dataset for code hallucinations. The benchmark contains 1,137 GPT-3.5 generated hallucinated code snippets for Python programming problems from two base datasets - HumanEval and MBPP. We then propose the methodology for code hallucination detection and experiment with open source LLMs such as CodeLLaMA as well as OpenAI's GPT-3.5 and GPT-4 models using one-shot prompt. We find that GPT-4 performs the best on HumanEval dataset and gives comparable results to the fine-tuned CodeBERT baseline on MBPP dataset. Towards the end, we discuss various mitigation strategies for code hallucinations and conclude our work.

arxiv.org

Plan with Code: Comparing approaches for robust NL to DSL generation arxiv.org/abs/2408.08335 .SE .AI .CL

Plan with Code: Comparing approaches for robust NL to DSL generation

Planning in code is considered a more reliable approach for many orchestration tasks. This is because code is more tractable than steps generated via Natural Language and make it easy to support more complex sequences by abstracting deterministic logic into functions. It also allows spotting issues with incorrect function names with the help of parsing checks that can be run on code. Progress in Code Generation methodologies, however, remains limited to general-purpose languages like C, C++, and Python. LLMs continue to face challenges with custom function names in Domain Specific Languages or DSLs, leading to higher hallucination rates and syntax errors. This is more common for custom function names, that are typically part of the plan. Moreover, keeping LLMs up-to-date with newer function names is an issue. This poses a challenge for scenarios like task planning over a large number of APIs, since the plan is represented as a DSL having custom API names. In this paper, we focus on workflow automation in RPA (Robotic Process Automation) domain as a special case of task planning. We present optimizations for using Retrieval Augmented Generation (or RAG) with LLMs for DSL generation along with an ablation study comparing these strategies with a fine-tuned model. Our results showed that the fine-tuned model scored the best on code similarity metric. However, with our optimizations, RAG approach is able to match the quality for in-domain API names in the test set. Additionally, it offers significant advantage for out-of-domain or unseen API names, outperforming Fine-Tuned model on similarity metric by 7 pts.

arxiv.org

Empathic Responding for Digital Interpersonal Emotion Regulation via Content Recommendation arxiv.org/abs/2408.07704 .IR .AI .HC

Empathic Responding for Digital Interpersonal Emotion Regulation via Content Recommendation

Interpersonal communication plays a key role in managing people's emotions, especially on digital platforms. Studies have shown that people use social media and consume online content to regulate their emotions and find support for rest and recovery. However, these platforms are not designed for emotion regulation, which limits their effectiveness in this regard. To address this issue, we propose an approach to enhance Interpersonal Emotion Regulation (IER) on online platforms through content recommendation. The objective is to empower users to regulate their emotions while actively or passively engaging in online platforms by crafting media content that aligns with IER strategies, particularly empathic responding. The proposed recommendation system is expected to blend system-initiated and user-initiated emotion regulation, paving the way for real-time IER practices on digital media platforms. To assess the efficacy of this approach, a mixed-method research design is used, including the analysis of text-based social media data and a user survey. Digital applications has served as facilitators in this process, given the widespread recognition of digital media applications for Digital Emotion Regulation (DER). The study collects 37.5K instances of user posts and interactions on Reddit over a year to design a Contextual Multi-Armed Bandits (CMAB) based recommendation system using features from user activity and preferences. The experimentation shows that the empathic recommendations generated by the proposed recommendation system are preferred by users over widely accepted ER strategies such as distraction and avoidance.

arxiv.org

Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models arxiv.org/abs/2408.07705 .IR .AI .CL

Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models

In today's globalized economy, comprehensive supply chain visibility is crucial for effective risk management. Achieving visibility remains a significant challenge due to limited information sharing among supply chain partners. This paper presents a novel framework leveraging Knowledge Graphs (KGs) and Large Language Models (LLMs) to enhance supply chain visibility without relying on direct stakeholder information sharing. Our zero-shot, LLM-driven approach automates the extraction of supply chain information from diverse public sources and constructs KGs to capture complex interdependencies between supply chain entities. We employ zero-shot prompting for Named Entity Recognition (NER) and Relation Extraction (RE) tasks, eliminating the need for extensive domain-specific training. We validate the framework with a case study on electric vehicle supply chains, focusing on tracking critical minerals for battery manufacturing. Results show significant improvements in supply chain mapping, extending visibility beyond tier-2 suppliers. The framework reveals critical dependencies and alternative sourcing options, enhancing risk management and strategic planning. With high accuracy in NER and RE tasks, it provides an effective tool for understanding complex, multi-tiered supply networks. This research offers a scalable, flexible method for constructing domain-specific supply chain KGs, addressing longstanding challenges in visibility and paving the way for advancements in digital supply chain surveillance.

arxiv.org

Graph neural network surrogate for strategic transport planning arxiv.org/abs/2408.07726 .LG .AI

Graph neural network surrogate for strategic transport planning

As the complexities of urban environments continue to grow, the modelling of transportation systems become increasingly challenging. This paper explores the application of advanced Graph Neural Network (GNN) architectures as surrogate models for strategic transport planning. Building upon a prior work that laid the foundation with graph convolution networks (GCN), our study delves into the comparative analysis of established GCN with the more expressive Graph Attention Network (GAT). Additionally, we propose a novel GAT variant (namely GATv3) to address over-smoothing issues in graph-based models. Our investigation also includes the exploration of a hybrid model combining both GCN and GAT architectures, aiming to investigate the performance of the mixture. The three models are applied to various experiments to understand their limits. We analyse hierarchical regression setups, combining classification and regression tasks, and introduce fine-grained classification with a proposal of a method to convert outputs to precise values. Results reveal the superior performance of the new GAT in classification tasks. To the best of the authors' knowledge, this is the first GAT model in literature to achieve larger depths. Surprisingly, the fine-grained classification task demonstrates the GCN's unexpected dominance with additional training data. This shows that synthetic data generators can increase the training data, without overfitting issues whilst improving model performance. In conclusion, this research advances GNN based surrogate modelling, providing insights for refining GNN architectures. The findings open avenues for investigating the potential of the newly proposed GAT architecture and the modelling setups for other transportation problems.

arxiv.org

Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies arxiv.org/abs/2408.07728 .CR

Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies

We present Moderator, a policy-based model management system that allows administrators to specify fine-grained content moderation policies and modify the weights of a text-to-image (TTI) model to make it significantly more challenging for users to produce images that violate the policies. In contrast to existing general-purpose model editing techniques, which unlearn concepts without considering the associated contexts, Moderator allows admins to specify what content should be moderated, under which context, how it should be moderated, and why moderation is necessary. Given a set of policies, Moderator first prompts the original model to generate images that need to be moderated, then uses these self-generated images to reverse fine-tune the model to compute task vectors for moderation and finally negates the original model with the task vectors to decrease its performance in generating moderated content. We evaluated Moderator with 14 participants to play the role of admins and found they could quickly learn and author policies to pass unit tests in approximately 2.29 policy iterations. Our experiment with 32 stable diffusion users suggested that Moderator can prevent 65% of users from generating moderated content under 15 attempts and require the remaining users an average of 8.3 times more attempts to generate undesired content.

arxiv.org

DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning arxiv.org/abs/2408.07080 .LG .AI .CV

DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning

Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning framework must handle training and test data that exhibit a modality mismatch, more precisely, training and test data do not cover the same set of data modalities. Traditional approaches for CMKD are based on a teacher/student paradigm where a teacher is trained on multi-modal data with the aim to successively distill knowledge from a multi-modal teacher to a single-modal student. Despite the widespread adoption of such paradigm, recent research has highlighted its inherent limitations in the context of cross-modal knowledge transfer.Taking a step beyond the teacher/student paradigm, here we introduce a new framework for cross-modal knowledge distillation, named DisCoM-KD (Disentanglement-learning based Cross-Modal Knowledge Distillation), that explicitly models different types of per-modality information with the aim to transfer knowledge from multi-modal data to a single-modal classifier. To this end, DisCoM-KD effectively combines disentanglement representation learning with adversarial domain adaptation to simultaneously extract, foreach modality, domain-invariant, domain-informative and domain-irrelevant features according to a specific downstream task. Unlike the traditional teacher/student paradigm, our framework simultaneously learns all single-modal classifiers, eliminating the need to learn each student model separately as well as the teacher classifier. We evaluated DisCoM-KD on three standard multi-modal benchmarks and compared its behaviourwith recent SOTA knowledge distillation frameworks. The findings clearly demonstrate the effectiveness of DisCoM-KD over competitors considering mismatch scenarios involving both overlapping and non-overlapping modalities. These results offer insights to reconsider the traditional paradigm for distilling information from multi-modal data to single-modal neural networks.

arxiv.org

MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability arxiv.org/abs/2408.07081 .LG .AI .CL

MathBridge: A Large-Scale Dataset for Translating Mathematical Expressions into Formula Images

Understanding sentences that contain mathematical expressions in text form poses significant challenges. To address this, the importance of converting these expressions into formula images has been highlighted. For instance, the expression ``x equals minus b plus or minus the square root of b squared minus four a c, all over two a'' is more readily comprehensible when displayed as an image $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. To develop a text-to-image conversion system, we can break down the process into text-to-LaTeX and LaTeX-to-image conversions, with the latter being managed with by existing various LaTeX engines. However, the former approach has been notably hindered by the severe scarcity of text-to-LaTeX paired data, presenting a significant challenge in the field.In this context, we introduce MathBridge, the first extensive dataset for translating mathematical spoken English into LaTeX, which aims to establish a robust baseline for future research in text-to-LaTeX translation. MathBridge comprises approximately 23 million LaTeX formulas paired with corresponding spoken English expressions. Through comprehensive evaluations, including fine-tuning and testing with data, we discovered that MathBridge significantly enhances pre-trained language models' capabilities for text-to-LaTeX translation. Specifically, for the T5-large model, the sacreBLEU score increased from 4.77 to 46.8, demonstrating substantial enhancement. Our findings indicate the necessity for a new metric specifically for text-to-LaTeX conversion evaluation.

arxiv.org

Evaluating Source Code Quality with Large Languagem Models: a comparative study arxiv.org/abs/2408.07082 .SE

Evaluating Source Code Quality with Large Languagem Models: a comparative study

Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of attributes to assess code quality. However, some quality attributes can only be measured by humans in code review activities, readability being an example. Given their natural language text processing capability, we hypothesize that a Large Language Model (LLM) could evaluate the quality of code, including attributes currently not automatable. This paper aims to describe and analyze the results obtained using LLMs as a static analysis tool, evaluating the overall quality of code. We compared the LLM with the results obtained with the SonarQube software and its Maintainability metric for two Open Source Software (OSS) Java projects, one with Maintainability Rating A and the other B. A total of 1,641 classes were analyzed, comparing the results in two versions of models: GPT 3.5 Turbo and GPT 4o. We demonstrated that the GPT 3.5 Turbo LLM has the ability to evaluate code quality, showing a correlation with Sonar's metrics. However, there are specific aspects that differ in what the LLM measures compared to SonarQube. The GPT 4o version did not present the same results, diverging from the previous model and Sonar by assigning a high classification to codes that were assessed as lower quality. This study demonstrates the potential of LLMs in evaluating code quality. However, further research is necessary to investigate limitations such as LLM's cost, variability of outputs and explore quality characteristics not measured by traditional static analysis tools.

arxiv.org

Masked EEG Modeling for Driving Intention Prediction arxiv.org/abs/2408.07083 .LG .AI

Masked EEG Modeling for Driving Intention Prediction

Driving under drowsy conditions significantly escalates the risk of vehicular accidents. Although recent efforts have focused on using electroencephalography to detect drowsiness, helping prevent accidents caused by driving in such states, seamless human-machine interaction in driving scenarios requires a more versatile EEG-based system. This system should be capable of understanding a driver's intention while demonstrating resilience to artifacts induced by sudden movements. This paper pioneers a novel research direction in BCI-assisted driving, studying the neural patterns related to driving intentions and presenting a novel method for driving intention prediction. In particular, our preliminary analysis of the EEG signal using independent component analysis suggests a close relation between the intention of driving maneuvers and the neural activities in central-frontal and parietal areas. Power spectral density analysis at a group level also reveals a notable distinction among various driving intentions in the frequency domain. To exploit these brain dynamics, we propose a novel Masked EEG Modeling framework for predicting human driving intentions, including the intention for left turning, right turning, and straight proceeding. Extensive experiments, encompassing comprehensive quantitative and qualitative assessments on public dataset, demonstrate the proposed method is proficient in predicting driving intentions across various vigilance states. Specifically, our model attains an accuracy of 85.19% when predicting driving intentions for drowsy subjects, which shows its promising potential for mitigating traffic accidents related to drowsy driving. Notably, our method maintains over 75% accuracy when more than half of the channels are missing or corrupted, underscoring its adaptability in real-life driving.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.