Show newer

Playing Language Game with LLMs Leads to Jailbreaking arxiv.org/abs/2411.12762 .CL .AI

Playing Language Game with LLMs Leads to Jailbreaking

The advent of large language models (LLMs) has spurred the development of numerous jailbreak techniques aimed at circumventing their security defenses against malicious attacks. An effective jailbreak approach is to identify a domain where safety generalization fails, a phenomenon known as mismatched generalization. In this paper, we introduce two novel jailbreak methods based on mismatched generalization: natural language games and custom language games, both of which effectively bypass the safety mechanisms of LLMs, with various kinds and different variants, making them hard to defend and leading to high attack rates. Natural language games involve the use of synthetic linguistic constructs and the actions intertwined with these constructs, such as the Ubbi Dubbi language. Building on this phenomenon, we propose the custom language games method: by engaging with LLMs using a variety of custom rules, we successfully execute jailbreak attacks across multiple LLM platforms. Extensive experiments demonstrate the effectiveness of our methods, achieving success rates of 93% on GPT-4o, 89% on GPT-4o-mini and 83% on Claude-3.5-Sonnet. Furthermore, to investigate the generalizability of safety alignments, we fine-tuned Llama-3.1-70B with the custom language games to achieve safety alignment within our datasets and found that when interacting through other language games, the fine-tuned models still failed to identify harmful content. This finding indicates that the safety alignment knowledge embedded in LLMs fails to generalize across different linguistic formats, thus opening new avenues for future research in this area.

arXiv.org

Education in the Era of Neurosymbolic AI arxiv.org/abs/2411.12763 .HC .AI .CY

Education in the Era of Neurosymbolic AI

Education is poised for a transformative shift with the advent of neurosymbolic artificial intelligence (NAI), which will redefine how we support deeply adaptive and personalized learning experiences. NAI-powered education systems will be capable of interpreting complex human concepts and contexts while employing advanced problem-solving strategies, all grounded in established pedagogical frameworks. This will enable a level of personalization in learning systems that to date has been largely unattainable at scale, providing finely tailored curricula that adapt to an individual's learning pace and accessibility needs, including the diagnosis of student understanding of subjects at a fine-grained level, identifying gaps in foundational knowledge, and adjusting instruction accordingly. In this paper, we propose a system that leverages the unique affordances of pedagogical agents -- embodied characters designed to enhance learning -- as critical components of a hybrid NAI architecture. To do so, these agents can thus simulate nuanced discussions, debates, and problem-solving exercises that push learners beyond rote memorization toward deep comprehension. We discuss the rationale for our system design and the preliminary findings of our work. We conclude that education in the era of NAI will make learning more accessible, equitable, and aligned with real-world skills. This is an era that will explore a new depth of understanding in educational tools.

arXiv.org

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance arxiv.org/abs/2411.11853 -fin.GN .CY .AI .CL

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.

arXiv.org

Can EDA Tool Feedback Improve Verilog Generation by LLMs? arxiv.org/abs/2411.11856 .AR .AI .PL

Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback

Traditionally, digital hardware designs are written in the Verilog hardware description language (HDL) and debugged manually by engineers. This can be time-consuming and error-prone for complex designs. Large Language Models (LLMs) are emerging as a potential tool to help generate fully functioning HDL code, but most works have focused on generation in the single-shot capacity: i.e., run and evaluate, a process that does not leverage debugging and, as such, does not adequately reflect a realistic development process. In this work, we evaluate the ability of LLMs to leverage feedback from electronic design automation (EDA) tools to fix mistakes in their own generated Verilog. To accomplish this, we present an open-source, highly customizable framework, AutoChip, which combines conversational LLMs with the output from Verilog compilers and simulations to iteratively generate and repair Verilog. To determine the success of these LLMs we leverage the VerilogEval benchmark set. We evaluate four state-of-the-art conversational LLMs, focusing on readily accessible commercial models. EDA tool feedback proved to be consistently more effective than zero-shot prompting only with GPT-4o, the most computationally complex model we evaluated. In the best case, we observed a 5.8% increase in the number of successful designs with a 34.2% decrease in cost over the best zero-shot results. Mixing smaller models with this larger model at the end of the feedback iterations resulted in equally as much success as with GPT-4o using feedback, but incurred 41.9% lower cost (corresponding to an overall decrease in cost over zero-shot by 89.6%).

arXiv.org

Strategic Optimization and Demand Response for Thermal Load Management in Multi-Regional Integrated Energy Systems: A Stackelberg Game Approach arxiv.org/abs/2411.11868 .SY .SY

Strategic Optimization and Demand Response for Thermal Load Management in Multi-Regional Integrated Energy Systems: A Stackelberg Game Approach

In the context of high fossil fuel consumption and inefficiency within China's energy systems, effective demand-side management is essential. This study examines the thermal characteristics of various building types across different functional areas, utilizing the concept of body coefficient to integrate their unique structural and energy use traits into a demand response framework supported by real-time pricing. We developed a Stackelberg game-based bi-level optimization model that captures the dynamic interplay of costs and benefits between integrated energy providers and users. This model is formulated into a Mixed Integer Linear Programming (MILP) problem using Karush-Kuhn-Tucker (KKT) conditions and linearized with the Big M method, subsequently solved using MATLAB and CPLEX. This approach enables distinctive management of heating loads in public and residential areas, optimizing energy efficiency while balancing the interests of both providers and users. Furthermore, the study explores how the proportion of different area types affects the potential for reducing heat loads, providing insights into the scalability and effectiveness of demand response strategies in integrated energy systems. This analysis not only highlights the economic benefits of such strategies but also their potential in reducing dependency on traditional energy sources, thus contributing to more sustainable energy system practices.

arXiv.org

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System arxiv.org/abs/2411.11871 .OC .IR .LG

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System

In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them separately. To carefully balance the optimization, we propose a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems. It balances the per-task gradients to alleviate the negative transfer, while saving the huge cost for grid search or manual explorations for appropriate task weights. Moreover, compared with prior work that normally balance the per-task gradients of shared parameters, MultiBalance is more efficient since only requiring to access per-task gradients with respect to the shared feature representations. We conduct experiments on Meta's large-scale ads and feeds multi-task recommendation system, and observe that MultiBalance achieves significant gains (e.g., 0.738% improvement for normalized entropy (NE)) with neutral training cost in Queries Per Second (QPS), which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

arXiv.org

Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval arxiv.org/abs/2411.11875 -bio.BM .IR .AI .CL

Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval

The field of bioinformatics has seen significant progress, making the cross-modal text-molecule retrieval task increasingly vital. This task focuses on accurately retrieving molecule structures based on textual descriptions, by effectively aligning textual descriptions and molecules to assist researchers in identifying suitable molecular candidates. However, many existing approaches overlook the details inherent in molecule sub-structures. In this work, we introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA), a novel approach that facilitates multi-grained alignments between textual descriptions and molecules. Our model features a text encoder and a molecule encoder. The text encoder processes textual descriptions to generate both token-level and sentence-level representations, while molecules are modeled as hierarchical heterogeneous graphs, encompassing atom, motif, and molecule nodes to extract representations at these three levels. A key innovation in ORMA is the application of Optimal Transport (OT) to align tokens with motifs, creating multi-token representations that integrate multiple token alignments with their corresponding motifs. Additionally, we employ contrastive learning to refine cross-modal alignments at three distinct scales: token-atom, multitoken-motif, and sentence-molecule, ensuring that the similarities between correctly matched text-molecule pairs are maximized while those of unmatched pairs are minimized. To our knowledge, this is the first attempt to explore alignments at both the motif and multi-token levels. Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.