arXiv - CSCL: "Direct Preference Optimization: Your Language Mod…" - Qoto Mastodon

arXiv - CSCL @arxiv_cscl@qoto.org

Direct Preference Optimization: Your Language Model is Secretly a Reward Model. (arXiv:2305.18290v2 [cs.LG] UPDATED)

http://arxiv.org/abs/2305.18290 #arXiv #NLProc

Dec 14, 2023, 03:19 · · arxiv-cscl · · ·

Sign in to participate in the conversation