The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges. (arXiv:2306.04765v1 [cs.AI])
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models. (arXiv:2306.04757v1 [cs.CL])
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. (arXiv:2306.04751v1 [cs.CL])
Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning. (arXiv:2306.04746v1 [stat.ME])
ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems. (arXiv:2306.04743v1 [cs.DB])
Soft-prompt Tuning for Large Language Models to Evaluate Bias. (arXiv:2306.04735v1 [cs.CL])
Prompter: Zero-shot Adaptive Prefixes for Dialogue State Tracking Domain Adaptation. (arXiv:2306.04724v1 [cs.CL])
Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts. (arXiv:2306.04723v1 [cs.CL])
Improving Open Language Models by Learning from Organic Interactions. (arXiv:2306.04707v1 [cs.CL])
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models. (arXiv:2306.04695v1 [cs.CV])
Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense Knowledge. (arXiv:2306.04657v1 [cs.CL])
Improving Conversational Recommendation Systems via Counterfactual Data Simulation. (arXiv:2306.02842v1 [cs.CL] CROSS LISTED)
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning. (arXiv:2306.04387v2 [cs.CV] UPDATED)
GPT Self-Supervision for a Better Data Annotator. (arXiv:2306.04349v2 [cs.CL] UPDATED)
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset. (arXiv:2306.03030v2 [cs.CL] UPDATED)
UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception Detection. (arXiv:2306.02827v2 [cs.CL] UPDATED)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models. (arXiv:2306.01506v2 [cs.CL] UPDATED)
An Empirical Study on Challenging Math Problem Solving with GPT-4. (arXiv:2306.01337v2 [cs.CL] UPDATED)
Supplementary Features of BiLSTM for Enhanced Sequence Labeling. (arXiv:2305.19928v3 [cs.CL] UPDATED)
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets. (arXiv:2305.18486v3 [cs.CL] UPDATED)
All recent Computation and Language articles on arXiv.org for the Fediverse
Inspired by https://twitter.com/arxiv_cscl