Show newer

X-Mark: Towards Lossless Watermarking Through Lexical Redundancy. (arXiv:2311.09832v1 [cs.CL]) 

AutoPlanBench: : Automatically generating benchmarks for LLM planners from PDDL. (arXiv:2311.09830v1 [cs.AI]) 

FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models. (arXiv:2311.09829v1 [cs.CL]) 

AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages. (arXiv:2311.09828v1 [cs.CL]) 

Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking. (arXiv:2311.09827v1 [cs.CL]) 

Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks. (arXiv:2311.09825v1 [cs.CL]) 

Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning. (arXiv:2311.09821v1 [cs.CL]) 

SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models. (arXiv:2311.09818v1 [cs.CL]) 

Performance Trade-offs of Watermarking Large Language Models. (arXiv:2311.09816v1 [cs.CL]) 

Large Language Models for Propaganda Span Annotation. (arXiv:2311.09812v1 [cs.CL]) 

PixT3: Pixel-based Table To Text generation. (arXiv:2311.09808v1 [cs.CL]) 

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text. (arXiv:2311.09807v1 [cs.CL]) 

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data. (arXiv:2311.09805v1 [cs.CL]) 

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs. (arXiv:2311.09802v1 [cs.AI]) 

$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning. (arXiv:2311.09800v1 [cs.CL]) 

How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!. (arXiv:2311.09799v1 [cs.CL]) 

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains. (arXiv:2311.09797v1 [cs.CL]) 

Interpreting User Requests in the Context of Natural Language Standing Instructions. (arXiv:2311.09796v1 [cs.CL]) 

Investigating Data Contamination in Modern Benchmarks for Large Language Models. (arXiv:2311.09783v1 [cs.CL]) 

More Samples or More Prompt Inputs? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering. (arXiv:2311.09782v1 [cs.CL]) 

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.