Follow

🔴 💻 **Will we run out of data? Limits of LLM scaling based on human-generated data**

_“Our findings indicate that if current LLM development trends continue, models will be trained on datasets roughly equal in size to the available stock of public human text data between 2026 and 2032, or slightly earlier if models are overtrained.”_

Villalobos, P. et al. (2022) Will we run out of data? Limits of LLM scaling based on human-generated data. arxiv.org/abs/2211.04325v2.

@ai

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.