“Potemkin Understanding in Large Language Models”
A detailed analysis of the incoherent application of concepts by LLMs, showing how benchmarks that reliably establish domain competence in humans can be passed by LLMs lacking similar competence.
H/T @acowley
"incoherent application of concepts"
Reminder that no concepts are involved. in a random (Markov) walk through word space. Shannon 1948.
From Pogo: "We could eat this picture of a chicken, if we had a picture of some salt."
@glc @gregeganSF @acowley But that is not the word space they're walking...
I suppose you must be referring to Pogo, which is not, for the present purposes, even a word space (or: not fruitfully treated as such).
@glc @gregeganSF @acowley no, the LLMs aren't operating in **word**-space.
@glc @gregeganSF @acowley No, bytes/tokens/words/whatever is irrelevant. The important part that's wrong in the "word-space" model is that it misses the context. The "language" part is a red herring. What's really going on is a tangle of suspended code that's getting executed step by step. And yes there are concepts, entities, and all that stuff in there.
@glc Perhaps. I just hope this not another "X is/has/... Y" claim.
What's your favorite or most important consequence of this distinction?
That no concepts are involved, and the numerous corollaries of that, I suppose. At least, that's what I find myself harping on now and then.
I have no strong interest in the details. though considerable interest in watching this play out.
—Someone like Cosma Shalizi is going to actually get into the weeds a bit more:
http://bactra.org/notebooks/nn-attention-and-transformers.html
You'll probably find much to agree with and much to disagree with there. And at adequate length.
@glc > I find this literature irritating and opaque.
That's a promising start! (8
@dpwiz
I'd say there is syntax without semantics (in the traditional sense of formal logic, that is).
You have some other view evidently.
That much is now clear.
I don't see much difference from Markov and Shannon, apart from some compression tricks which are needed to get a working system.