I found the papers "Scaling Laws for Neural Language Models" (OpenAI, 2020) and "Training Compute-Optimal Large Language Models" (DeepMind, 2022) interesting:
arxiv.org/pdf/2001.08361.pdf
arxiv.org/pdf/2203.15556.pdf
They do a LOT of experiments training large language models (causal transformers) with varying hyperparameters, in particular model size, shape, batch size, and training data set size over many orders of magnitude. 1/?

Follow

@kristinmbranson
thanks for porting Kristin! very interesting.

And happy new year

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.