Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking. (arXiv:2311.09827v1 [cs.CL])
Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks. (arXiv:2311.09825v1 [cs.CL])
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning. (arXiv:2311.09821v1 [cs.CL])
SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models. (arXiv:2311.09818v1 [cs.CL])
Performance Trade-offs of Watermarking Large Language Models. (arXiv:2311.09816v1 [cs.CL])
Large Language Models for Propaganda Span Annotation. (arXiv:2311.09812v1 [cs.CL])
PixT3: Pixel-based Table To Text generation. (arXiv:2311.09808v1 [cs.CL])
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text. (arXiv:2311.09807v1 [cs.CL])
DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data. (arXiv:2311.09805v1 [cs.CL])
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs. (arXiv:2311.09802v1 [cs.AI])
$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning. (arXiv:2311.09800v1 [cs.CL])
How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!. (arXiv:2311.09799v1 [cs.CL])
KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains. (arXiv:2311.09797v1 [cs.CL])
Interpreting User Requests in the Context of Natural Language Standing Instructions. (arXiv:2311.09796v1 [cs.CL])
Investigating Data Contamination in Modern Benchmarks for Large Language Models. (arXiv:2311.09783v1 [cs.CL])
More Samples or More Prompt Inputs? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering. (arXiv:2311.09782v1 [cs.CL])
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs. (arXiv:2311.09774v1 [cs.CL])
To be or not to be? an exploration of continuously controllable prompt engineering. (arXiv:2311.09773v1 [cs.CL])
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores. (arXiv:2311.09766v1 [cs.CL])
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations. (arXiv:2311.09763v1 [cs.CL])
All recent Computation and Language articles on arXiv.org for the Fediverse
Inspired by https://twitter.com/arxiv_cscl