Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. (arXiv:2312.04474v2 [cs.CL] UPDATED)
Beyond Surface: Probing LLaMA Across Scales and Layers. (arXiv:2312.04333v2 [cs.CL] UPDATED)
Methods to Estimate Large Language Model Confidence. (arXiv:2312.03733v2 [cs.CL] UPDATED)
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability. (arXiv:2312.03721v2 [cs.CL] UPDATED)
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise. (arXiv:2312.01523v2 [cs.CL] UPDATED)
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges. (arXiv:2311.15766v2 [cs.CL] UPDATED)
\'UFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution. (arXiv:2311.14391v2 [cs.CL] UPDATED)
LM-Cocktail: Resilient Tuning of Language Models via Model Merging. (arXiv:2311.13534v4 [cs.CL] UPDATED)
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4. (arXiv:2311.07361v2 [cs.CL] UPDATED)
Quality-Diversity through AI Feedback. (arXiv:2310.13032v4 [cs.CL] UPDATED)
Measuring Pointwise $\mathcal{V}$-Usable Information In-Context-ly. (arXiv:2310.12300v2 [cs.CL] UPDATED)
DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations. (arXiv:2310.11374v2 [cs.CL] UPDATED)
Conversational Health Agents: A Personalized LLM-Powered Agent Framework. (arXiv:2310.02374v3 [cs.CL] UPDATED)
Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph Question Answering Systems. (arXiv:2309.16248v2 [cs.CL] UPDATED)
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM. (arXiv:2309.14348v2 [cs.CL] UPDATED)
Goal-Oriented Prompt Attack and Safety Evaluation for LLMs. (arXiv:2309.11830v2 [cs.CL] UPDATED)
FIND: A Function Description Benchmark for Evaluating Interpretability Methods. (arXiv:2309.03886v3 [cs.CL] UPDATED)
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook. (arXiv:2307.16680v5 [cs.LG] UPDATED)
Max-Margin Token Selection in Attention Mechanism. (arXiv:2306.13596v4 [cs.LG] UPDATED)
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark. (arXiv:2305.14938v2 [cs.CL] UPDATED)
All recent Computation and Language articles on arXiv.org for the Fediverse
Inspired by https://twitter.com/arxiv_cscl