Profile directory About Mobile apps
Log in Sign up
arXiv Statistics @arxiv_stats@qoto.org
Follow

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation https://arxiv.org/abs/2505.17288 #stat.ML #cs.LG

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

arXiv.org
May 27, 2025 at 3:20 AM · · feed2toot · 0 · 0 · 0
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.

Trending now

#news0 people talking
0
#photography0 people talking
0

Resources

  • Terms of service
  • Privacy policy

Developers

  • Documentation
  • API

What is Mastodon?

qoto.org

  • About
  • v3.5.19-qoto

More…

  • Source code
  • Mobile apps
v3.5.19-qoto · Privacy policy