@ct_bergstrom Disagree. They're designed to mimic what a human would write. If they end up bullshitting it's because the models aren't good enough, not because that's what they're designed to do.
@ct_bergstrom @moultano I'm starting to have doubts about the idea tha LLMs are "stochastic parrots" that can't generalize after watching a short talk from Francois Charton of Meta at NeurIPS 2022.
TL;DR - he trained a small LLM to learn how to diagonalize matrices using only triplets of the similarity transform. No hallucinations were observed.
The talk was "Leveraging Maths to Understand Transformers" https://neurips.cc/virtual/2022/workshop/50015#wse-detail-63846
@ct_bergstrom @moultano Ugh, not small LLM - small transformer based language model.