Evaluating Object Hallucination in Large Vision-Language Models. (arXiv:2305.10355v2 [cs.CV] UPDATED)
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models. (arXiv:2305.10276v2 [cs.CL] UPDATED)
A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot. (arXiv:2305.09758v2 [cs.CV] UPDATED)
LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. (arXiv:2305.07507v2 [cs.CL] UPDATED)
WebCPM: Interactive Web Search for Chinese Long-form Question Answering. (arXiv:2305.06849v2 [cs.CL] UPDATED)
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks. (arXiv:2305.06626v2 [cs.CL] UPDATED)
RECKONING: Reasoning through Dynamic Knowledge Encoding. (arXiv:2305.06349v2 [cs.CL] UPDATED)
Putting Natural in Natural Language Processing. (arXiv:2305.04572v2 [cs.CL] UPDATED)
MGR: Multi-generator based Rationalization. (arXiv:2305.04492v4 [cs.LG] UPDATED)
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements. (arXiv:2305.03695v2 [cs.CL] UPDATED)
Automated Code generation for Information Technology Tasks in YAML through Large Language Models. (arXiv:2305.02783v4 [cs.SE] UPDATED)
Answering Questions by Meta-Reasoning over Multiple Chains of Thought. (arXiv:2304.13007v2 [cs.CL] UPDATED)
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology. (arXiv:2304.11957v3 [physics.med-ph] UPDATED)
Towards Responsible AI in the Era of ChatGPT: A Reference Architecture for Designing Foundation Model-based AI Systems. (arXiv:2304.11090v2 [cs.CL] UPDATED)
Large language models effectively leverage document-level context for literary translation, but critical errors persist. (arXiv:2304.03245v3 [cs.CL] UPDATED)
Affect as a proxy for literary mood. (arXiv:2304.02894v2 [cs.CL] UPDATED)
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval. (arXiv:2304.01982v2 [cs.CL] UPDATED)
A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube. (arXiv:2303.16281v2 [cs.CY] UPDATED)
DeltaScore: Story Evaluation with Perturbations. (arXiv:2303.08991v3 [cs.CL] UPDATED)
MUX-PLMs: Data Multiplexing for High-throughput Language Models. (arXiv:2302.12441v2 [cs.LG] UPDATED)
All recent Computation and Language articles on arXiv.org for the Fediverse
Inspired by https://twitter.com/arxiv_cscl