Show newer

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data. (arXiv:2311.09805v1 [cs.CL]) 

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs. (arXiv:2311.09802v1 [cs.AI]) 

$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning. (arXiv:2311.09800v1 [cs.CL]) 

How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!. (arXiv:2311.09799v1 [cs.CL]) 

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains. (arXiv:2311.09797v1 [cs.CL]) 

Interpreting User Requests in the Context of Natural Language Standing Instructions. (arXiv:2311.09796v1 [cs.CL]) 

Investigating Data Contamination in Modern Benchmarks for Large Language Models. (arXiv:2311.09783v1 [cs.CL]) 

More Samples or More Prompt Inputs? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering. (arXiv:2311.09782v1 [cs.CL]) 

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs. (arXiv:2311.09774v1 [cs.CL]) 

To be or not to be? an exploration of continuously controllable prompt engineering. (arXiv:2311.09773v1 [cs.CL]) 

LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores. (arXiv:2311.09766v1 [cs.CL]) 

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations. (arXiv:2311.09763v1 [cs.CL]) 

Graph-Guided Reasoning for Multi-Hop Question Answering in Large Language Models. (arXiv:2311.09762v1 [cs.CL]) 

MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification. (arXiv:2311.09761v1 [cs.CL]) 

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking. (arXiv:2311.09758v1 [cs.CL]) 

FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives. (arXiv:2311.09756v1 [cs.CL]) 

How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?. (arXiv:2311.09755v1 [cs.CL]) 

Translation Aligned Sentence Embeddings for Turkish Language. (arXiv:2311.09748v1 [cs.CL]) 

Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks. (arXiv:2311.09743v1 [cs.CL]) 

What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization. (arXiv:2311.09741v1 [cs.CL]) 

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.