X-Mark: Towards Lossless Watermarking Through Lexical Redundancy. (arXiv:2311.09832v1 [cs.CL])
AutoPlanBench: : Automatically generating benchmarks for LLM planners from PDDL. (arXiv:2311.09830v1 [cs.AI])
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models. (arXiv:2311.09829v1 [cs.CL])
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages. (arXiv:2311.09828v1 [cs.CL])
Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking. (arXiv:2311.09827v1 [cs.CL])
Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks. (arXiv:2311.09825v1 [cs.CL])
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning. (arXiv:2311.09821v1 [cs.CL])
SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models. (arXiv:2311.09818v1 [cs.CL])
Performance Trade-offs of Watermarking Large Language Models. (arXiv:2311.09816v1 [cs.CL])
Large Language Models for Propaganda Span Annotation. (arXiv:2311.09812v1 [cs.CL])
PixT3: Pixel-based Table To Text generation. (arXiv:2311.09808v1 [cs.CL])
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text. (arXiv:2311.09807v1 [cs.CL])
DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data. (arXiv:2311.09805v1 [cs.CL])
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs. (arXiv:2311.09802v1 [cs.AI])
$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning. (arXiv:2311.09800v1 [cs.CL])
How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!. (arXiv:2311.09799v1 [cs.CL])
KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains. (arXiv:2311.09797v1 [cs.CL])
Interpreting User Requests in the Context of Natural Language Standing Instructions. (arXiv:2311.09796v1 [cs.CL])
Investigating Data Contamination in Modern Benchmarks for Large Language Models. (arXiv:2311.09783v1 [cs.CL])
More Samples or More Prompt Inputs? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering. (arXiv:2311.09782v1 [cs.CL])
All recent Computation and Language articles on arXiv.org for the Fediverse
Inspired by https://twitter.com/arxiv_cscl