Show newer

The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges. (arXiv:2306.04765v1 [cs.AI]) 

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models. (arXiv:2306.04757v1 [cs.CL]) 

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. (arXiv:2306.04751v1 [cs.CL]) 

Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning. (arXiv:2306.04746v1 [stat.ME]) 

ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems. (arXiv:2306.04743v1 [cs.DB]) 

Soft-prompt Tuning for Large Language Models to Evaluate Bias. (arXiv:2306.04735v1 [cs.CL]) 

Prompter: Zero-shot Adaptive Prefixes for Dialogue State Tracking Domain Adaptation. (arXiv:2306.04724v1 [cs.CL]) 

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts. (arXiv:2306.04723v1 [cs.CL]) 

Improving Open Language Models by Learning from Organic Interactions. (arXiv:2306.04707v1 [cs.CL]) 

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models. (arXiv:2306.04695v1 [cs.CV]) 

Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense Knowledge. (arXiv:2306.04657v1 [cs.CL]) 

Improving Conversational Recommendation Systems via Counterfactual Data Simulation. (arXiv:2306.02842v1 [cs.CL] CROSS LISTED) 

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning. (arXiv:2306.04387v2 [cs.CV] UPDATED) 

GPT Self-Supervision for a Better Data Annotator. (arXiv:2306.04349v2 [cs.CL] UPDATED) 

Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset. (arXiv:2306.03030v2 [cs.CL] UPDATED) 

UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception Detection. (arXiv:2306.02827v2 [cs.CL] UPDATED) 

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models. (arXiv:2306.01506v2 [cs.CL] UPDATED) 

An Empirical Study on Challenging Math Problem Solving with GPT-4. (arXiv:2306.01337v2 [cs.CL] UPDATED) 

Supplementary Features of BiLSTM for Enhanced Sequence Labeling. (arXiv:2305.19928v3 [cs.CL] UPDATED) 

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets. (arXiv:2305.18486v3 [cs.CL] UPDATED) 

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.