Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm arxiv.org/abs/2603.18007 .CL .AI

TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots arxiv.org/abs/2603.18008 .CL .AI .CY

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding arxiv.org/abs/2603.18009 .CL .AI

Controllable Evidence Selection in Retrieval-Augmented Question Answering via Deterministic Utility Gating arxiv.org/abs/2603.18011 .CL .IR

DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation arxiv.org/abs/2603.18012 .CL .AI .IR

Learned but Not Expressed: Capability-Expression Dissociation in Large Language Models arxiv.org/abs/2603.18013 .CL

Real-Time Trustworthiness Scoring for LLM Structured Outputs and Data Extraction arxiv.org/abs/2603.18014 .CL .LG

Beyond Accuracy: An Explainability-Driven Analysis of Harmful Content Detection arxiv.org/abs/2603.18015 .CL .AI

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.