arXiv - CSCL: "Can Large Language Models be Trusted for Evaluati…" - Qoto Mastodon

arXiv - CSCL @arxiv_cscl@qoto.org

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate. (arXiv:2401.16788v1 [cs.CL])

http://arxiv.org/abs/2401.16788 #arXiv #NLProc

Jan 31, 2024, 03:18 · · arxiv-cscl · · ·

Sign in to participate in the conversation