The case of “vegetative electron microscopy” illustrated here shows what is badly needed in current #LLM research and has implications far beyond. We need tools that help us curate huge corpora. We need to be able to trace #hallucinations back to the training data and understand what are the specific (to a surprise, often #deterministic) reasons in the model input that cause that particular output.
If anyone is interested in collaborating on this, I'm in, have done some small-scale experiments and have already submitted a grant proposal.
https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463
I think I now know where to draw the line between "good" and "bad" #GenAI, and possibly (or rather obviously) the same for #machineLearning. It's simply whether the input data has been constructed rigorously. Put this way it's the most obvious statement ever, but somehow #BigTech have convinced us all that they advance research by recklessly scraping #twitter, #4chan and who knows what else (they keep their training data secret).
What is good science in computational linguistics? Well, open data is a step towards it. But open and crap is not a solution. We need to actually _know_ and manage the data. And nobody in their right mind would want to plough through toxic data to clean it. We've all heard the horrors of Kenyan data workers who do it for money and still suffer doing it.
But better (yes, also smaller) corpora are of interest to scholars in the humanities and the social sciences. Think of https://textcreationpartnership.org or https://mlat.uzh.ch. Yes, they are too big for individual researchers or even teams to handle, but we have the organisational and technological infrastructure to work on them collectively. We've been doing it for ages and we will continue doing it. We just need to do it together.
And this is the goal of the European Research Council project proposal I'm submitting in this very moment.
Today at #CHR2025, I will be presenting our work on the evaluation of the historical adequacy of masked language models (MLMs) for #Latin. There are several models like this, and they represent the current state of the art for a number of downstream tasks, like semantic change and text reuse detection. However, a historical researcher, philologist or else would want to be sure that such models really represent the historical period of interest. For example, it would be an embarrasing hallucination if St. Augustine showed up in the context of the Roman senate.
Our evaluation confirms a known problem: LLMs and masked models in particular are trained on corpora without attention to historical periods. Unlike other research we've done on Early Modern English, this problem leads to models being barely distinguishable when it comes to their ability to generate based on a historical period. Even though history is a case where it is most obvious when models go wrong, this type of contamination is a known problem for LLM training overall, think of different legal jurisdictions using the same language, dialects in programming languages, etc.
This research was generously supported by AgileLab.
The full paper is available at:
https://anthology.ach.org/volumes/vol0003/the-latin-language-evolved-over-time-masked-models/
Our paper on the values found in fairy tales from some European countries has been published. We studied how values are explicitly present in tales from Germany, Italy and Portugal using various NLP techniques, but most notably Word2Vec and Word Embedding with a Compass. We visualise synchronic semantic variation to show certain differences based on observations of the corpus, some of them already observed in previous literature. A discussed example in our findings is how motherhood in Germany is strongly related to generosity, whereas in Italy and Portugal it has stronger relationship to wisdom.
Fulltext available at: https://aclanthology.org/2023.nlp4dh-1.8/
In the morning session today Sara Sullam and I will be presenting our work on exploring nominal (in our case study - bibliographical) data. We do it by borrowing a method from educational research - the notion of phenomenographic variation. #CHR2023🧵
Fragments: Dodgy metrics for AI usage, history of tech removing jobs, benchmarking closed and open models, LLMs multiply existing cruft, AI slop driving us crazy, I am the Global Interpreter Lock for agents
“The European Commission’s explicit recognition of ‘Public Money? Public Code!’ in this strategy, nine years after the FSFE launched the initiative, could become a major step forward for software freedom in Europe. However, the Commission still falls short on concrete goals, milestones, and secure funding for Free Software. The procurement reform will be a test: ‘Public Money? Public Code!’ must become a mandatory requirement for public tendering. Redirecting even half of Europe’s €264 billion in public IT spending from proprietary lock-in to Free Software would boost European tech sovereignty”,
says Johannes Näder, FSFE Senior Policy Project Manager.
Your phone screen can't reproduce the full range of colors the human eye can see, and AI-generated images may widen that gap even further.
A design and media arts scholar with deuteranomaly, a form of color blindness that remaps rather than removes color distinctions, dives into the complications.
I signed this declaration about AI in mathematics, and you might also want to:
I had a comment about this passage:
"Technologies which affect the way in which mathematics is practiced may disturb the current system of incentives. The use of artificial intelligence — and thus also the sort of problems which it can address — may become incentivized for its own sake, disrupting our mechanisms for hiring, funding, and recognition."
Though I'm sure it wasn't meant to, this comes across as a bit complacent. Our current system of incentives is seriously flawed, so we should be working to improve it, not merely defending the status quo. We don't want AI companies to be twisting the incentives in mathematics - but university administrators, big journal oligopolies, and the military have been doing this for a long time, and that's no good either.
How can open source AI support public administrations while staying transparent, reusable, interoperable and aligned with public values?
Before you say: 'It can't', join OSOR workshop 'Driving public value through open source artificial intelligence'.
We’ll explore:
🔹 open source AI in the public sector
🔹 real use cases for public services
🔹 barriers to adoption and scaling
🔹 practical ways forward for governments
📅 30 June 2026, 10:00–12:00 CEST
Register here 👉 https://interoperable-europe.ec.europa.eu/form/driving-public-value-through-ope
In Iran war’s shadow, Israel’s renewed Lebanon campaign risks repeating failed lessons – and occupations – of the past
https://theconversation.com/in-iran-wars-shadow-israels-renewed-lebanon-campaign-risks-repeating-failed-lessons-and-occupations-of-the-past-284052
Your phone screen doesn’t have the same color range as the human eye – and AI widens the gap between digital images and the real thing
https://theconversation.com/your-phone-screen-doesnt-have-the-same-color-range-as-the-human-eye-and-ai-widens-the-gap-between-digital-images-and-the-real-thing-283252
Bat in the house? 🦇
A bat biologist walks through the steps for persuading a bat to leave your home, and what to do when a whole family decides to roost in your attic.
https://theconversation.com/bat-in-the-house-heres-how-to-remove-it-safely-283456
Tyrannosaurus Rex and Other Terrifying Predatory Dinosaurs Had Itty-Bitty Arms. Scientists May Have Finally Figured Out Why
https://www.smithsonianmag.com/smart-news/tyrannosaurus-rex-and-other-terrifying-predatory-dinosaurs-had-itty-bitty-arms-scientists-may-have-finally-figured-out-why-180988803/?utm_source=flipboard&utm_medium=activitypub
Posted into Smartnews @smartnews-Smithsonianmag
Scientists Used A.I. to Redesign a Microbe’s Machinery to Function Without a Key Ingredient of Life
https://www.smithsonianmag.com/smart-news/scientists-used-ai-to-redesign-a-microbes-machinery-to-function-without-a-key-ingredient-of-life-180988802/?utm_source=flipboard&utm_medium=activitypub
Posted into Smartnews @smartnews-Smithsonianmag
Atlantic hurricane season starts Monday, and preparedness experts say older adults living alone face added risks during major storms.
5 key steps to help your loved ones prepare:
https://theconversation.com/5-tips-for-hurricane-disaster-planning-with-aging-parents-starting-now-before-the-storms-254917
Via Neel Krishnaswami (https://semantic-domain.blogspot.com/2013/04/john-c-reynolds-june-1-1935-april-28.html):
"So this was John [Reynolds]'s definition of a successful language design: if you have a user who has used it to write a program you couldn't have, your language has succeeded, since it has helped a fellow human being solve one of their own particular problems.
I've always liked his definition, since it manages to avoid an obsession with nose-counting popularity metrics, while still remembering the essentially social purpose of language design."
Nearly 1/3 of urban water is lost before it reaches the tap 🚰
A new study explores circular water systems and shows that fixing leaks and recycling treated wastewater could reduce urban freshwater withdrawals by 60%.
Learn more 👉 https://link.europa.eu/cN7fQ9
#WaterWiseEU
---
https://nitter.net/EU_ENV/status/2059217613666849222#m
There is no digital sovereignty without ODF
"The HUGO Gene Nomenclature Committee was forced in 2020 to rename dozens of human genes – including SEPT1 and MARCH1 – because Excel kept silently converting their symbols to dates. Rather than going to Microsoft and demanding a bug fix, scientists preferred to throw years of established nomenclature down the drain to avoid upsetting Redmond. A revealing precedent."
https://blog.documentfoundation.org/blog/2026/05/15/no-digital-sovereignty-without-odf/
@resist
"1. AI can erode human judgment by offering instant answers that weaken creativity, discernment and the patience needed to seek truth.
2. AI can simulate care without relationship, making vulnerable users mistake artificial empathy for genuine human connection.
3. AI can deepen inequality because data, computing power and regulatory influence are concentrated among a small number of actors.
4. AI can destabilize democracy by amplifying disinformation and blurring the line between fact and fiction.
5. AI can make war easier by speeding up lethal decisions and distancing humans from responsibility. Leo's starkest line: "No algorithm can make war morally acceptable."
https://www.axios.com/2026/05/25/pope-leo-xiv-ai-humanity-war-jobs-warning
Killing a country’s leader may disrupt a government for a moment, but history shows it rarely makes the government collapse.
https://theconversation.com/the-war-in-iran-again-points-to-the-strategic-shortcomings-of-assassination-as-policy-of-foreign-affairs-282743
Yet another meaningful intervention
In Italia nel 2025 più di 220 morti in bici, in Spagna 46. Quale paese modifica le leggi per migliorarne la sicurezza?
Ovviamente la Spagna, perché 46 (su 48 milioni di abitanti) sono comunque considerate troppi.
Visto che molti muoiono su strade extraurbane, la nuova legge consente di trasformare le spalle della strada in ciclabili segregate per aumentare la #sicurezzaStradale:
Da noi invece mettiamo le targhe ai monopattini: una misura inutile, per un mezzo coinvolto in percentuali omeopatiche degli scontri stradali 🤦♂️
Today is International Academic Freedom Day 🔍
At the ERC, science is free to follow the evidence wherever it leads. Researchers choose the question. Excellence is the only criterion.
>@EUScienceInnov @StudentAFAF #ProtectWhatMatters
Today is International Academic Freedom Day 🔍
At the ERC, science is free to follow the evidence wherever it leads. Researchers choose the question. Excellence is the only criterion.
👉 bit.ly/4toSHTf
@EUScienceInnov @StudentAFAF #ProtectWhatMatters
---
https://nitter.net/ERC_Research/status/2057109649791545670#m
Studying how people interact, in the past (#CulturalAnalytics) and today (#EdTech #Crowdsourcing). Researcher at @IslabUnimi, University of Milan. Bulgarian activist for legal reform with @pravosadiezv. I use dedicated accounts for different languages.
My profile is searchable with https://www.tootfinder.ch/