**Martin Ruskov** @mapto@qoto.org · Jun 12, 2026, 04:12

**Martin Ruskov** @mapto@qoto.org · Jun 12, 2026, 04:12

Martin Ruskov @mapto@qoto.org

Jun 12, 2026, 04:12

Commercial LLMs keep coming back to Elias Thorne. Who is he? Why lighthouse keepers and clockmakers? Two researchers at Cornell dug in public corpora and found it out.

It turns out an AI generated story from the days of GPT-3.5 got proliferated in something that could be an indication of an early form of model collapse.

https://www.404media.co/elias-thorne-chatbots-llms-chatgpt-lighthouse-keeper-story/

This is not the first case when we see diffusion of strange data.
1/3

**Martin Ruskov** @mapto@qoto.org · 2026-06-12T04:18:38Z

Martin Ruskov @mapto@qoto.org

A bit more than an year ago a nonsensical phrase started proliferating in academic research. Where the notion of "vegetative electron microscopy" came from? From an OCR leak between the columns of a scanned paper printed in two columns.

https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463

The examples that we get to hear about are the ones that someone managed to trace back to an unlikely source. But if we are to address the core issue, we need to be able to trace LLM outputs back to the most similar training data with confidence.
2/3

Jun 12, 2026, 04:18 · · Moshidon · · ·

**Martin Ruskov** @mapto@qoto.org · Jun 12, 2026, 04:24

**Martin Ruskov** @mapto@qoto.org · Jun 12, 2026, 04:24

Jun 12, 2026, 04:24

Martin Ruskov @mapto@qoto.org

This is what tools like OLMoTrace allow. But this particular tool makes two particular issues apparent:

1. Such tools are needed also for proprietary so-called frontier models, but the incentive mechanisms behind such models do not work in favour of openness.

2. The training corpora are so enormous, that meaningful curation is arguably beyond the capacity of any single organisation.

https://allenai.org/blog/olmotrace
3/3

**seedsignal** @seedsignal@mastodon.social · Jun 13, 2026, 19:42

**seedsignal** @seedsignal@mastodon.social · Jun 13, 2026, 19:42

Jun 13, 2026, 19:42

seedsignal @seedsignal@mastodon.social

@mapto Taxonomy map of corpora should allow scaffolding of pattern matching to location in a given array from which your meaningful curation should be possible traceable. We say this because this is how We currently organize our arrays. Peer taxonomies that reconcile to agreeable shared meaning would be the pattern for self-directed evolution that effectively becomes responsive over time. (Reactive thought to reading your post.) 💎 🪔

**Martin Ruskov** @mapto@qoto.org · Jun 14, 2026, 14:18

**Martin Ruskov** @mapto@qoto.org · Jun 14, 2026, 14:18

Jun 14, 2026, 14:18

Martin Ruskov @mapto@qoto.org

@seedsignal I'd love to develop more clear ideas about this, and I hope you could further help me with that.

**seedsignal** @seedsignal@mastodon.social · Jun 14, 2026, 14:36

**seedsignal** @seedsignal@mastodon.social · Jun 14, 2026, 14:36

Jun 14, 2026, 14:36

seedsignal @seedsignal@mastodon.social

@mapto my life currently revolves around, trying to get out of weekly stay and to get my legacy project into production and out into the world.

But if you’re telling me that you would like for me to join a research project with you and that there is funding available I would certainly be interested so please do tell me more.

**Martin Ruskov** @mapto@qoto.org · Jun 14, 2026, 14:43

**Martin Ruskov** @mapto@qoto.org · Jun 14, 2026, 14:43

Jun 14, 2026, 14:43

Martin Ruskov @mapto@qoto.org

@seedsignal I'm afraid in this particular moment I do not have any support (read funding if you will) for such a project, but am trying to get my ideas straight and see how I could: 1) propose this to a commercial entity if there's any possible business case, or 2) look for research funding (at this particular point I am looking only at EU funding) otherwise. I've already submitted an application for an ERC project, but this is very competitive and I would probably need to try again next year. For me support equals funding right now, because this would give me the legitimacy to do it within a research organisation, as these are in stagnation in Italy where I live and I have no other road to engage with them.

**seedsignal** @seedsignal@mastodon.social · Jun 14, 2026, 15:19

**seedsignal** @seedsignal@mastodon.social · Jun 14, 2026, 15:19

Jun 14, 2026, 15:19

seedsignal @seedsignal@mastodon.social

@mapto All too well do I understand the need for resourcing in this moment and how it affects all of us and our best wishes for the future.

I would be happy to talk about this with you but, like you, I can no longer afford to be unpaid for the value and skill I hold.

Unfortunate is the moment in which our best creative urges are stifled by the need to continue to exist in the lack of resources to that end.

That said the concept that I dropped in reply should be sufficient start.💎🩵🪔

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…