The case of “vegetative electron microscopy” illustrated here shows what is badly needed in current #LLM research and has implications far beyond. We need tools that help us curate huge corpora. We need to be able to trace #hallucinations back to the training data and understand what are the specific (to a surprise, often #deterministic) reasons in the model input that cause that particular output.
If anyone is interested in collaborating on this, I'm in, have done some small-scale experiments and have already submitted a grant proposal.
https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463