**John David Pressman** @jdp.extropian.net@bsky.brid.gy · Jan 05, 2026, 23:29

**John David Pressman** @jdp.extropian.net@bsky.brid.gy · Jan 05, 2026, 23:29

John David Pressman @jdp.extropian.net@bsky.brid.gy

Jan 05, 2026, 23:29

John David Pressman @jdp.extropian.net@bsky.brid.gy

Fantastic mechinterp paper showing that you can identify specific neurons which cause hallucinated answers in LLMs, and that these neurons are specifically associated with the language model trying to follow instructions too hard. arxiv.org/pdf/2512.01797

com.atproto.sync.getBlob?did=did:plc:5hhbgkktcpofpw4ovz6kjjau&cid=bafkreian63c2a7eeaubktqkgqugv2tmsqkptt3ihbpuderikmireonlone

**Martin Ruskov** @mapto@qoto.org · 2026-01-06T04:35:19Z

Martin Ruskov @mapto@qoto.org

@jdp.extropian.net I personally am very sceptical that such a data-driven approach ("generating a balanced dataset of faithful (green check) and hallucinatory (red cross) responses using the TriviaQA benchmark") could even capture what model hallucinations really are. I'd think one first needs a taxonomy of hallucination types (one oldish for text-to-image is found in the limitations here: https://arxiv.org/abs/2206.10789). But even calling weights "neurons" is not of my liking and a flag.

Jan 06, 2026, 04:35 · · Moshidon · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…