Pinned post

The case of “vegetative electron microscopy” illustrated here shows what is badly needed in current research and has implications far beyond. We need tools that help us curate huge corpora. We need to be able to trace back to the training data and understand what are the specific (to a surprise, often ) reasons in the model input that cause that particular output.

If anyone is interested in collaborating on this, I'm in, have done some small-scale experiments and have already submitted a grant proposal.
theconversation.com/a-weird-ph

Pinned post

I think I now know where to draw the line between "good" and "bad" , and possibly (or rather obviously) the same for . It's simply whether the input data has been constructed rigorously. Put this way it's the most obvious statement ever, but somehow have convinced us all that they advance research by recklessly scraping , and who knows what else (they keep their training data secret).

What is good science in computational linguistics? Well, open data is a step towards it. But open and crap is not a solution. We need to actually _know_ and manage the data. And nobody in their right mind would want to plough through toxic data to clean it. We've all heard the horrors of Kenyan data workers who do it for money and still suffer doing it.

But better (yes, also smaller) corpora are of interest to scholars in the humanities and the social sciences. Think of textcreationpartnership.org or mlat.uzh.ch. Yes, they are too big for individual researchers or even teams to handle, but we have the organisational and technological infrastructure to work on them collectively. We've been doing it for ages and we will continue doing it. We just need to do it together.

And this is the goal of the European Research Council project proposal I'm submitting in this very moment.

Pinned post

Today at , I will be presenting our work on the evaluation of the historical adequacy of masked language models (MLMs) for . There are several models like this, and they represent the current state of the art for a number of downstream tasks, like semantic change and text reuse detection. However, a historical researcher, philologist or else would want to be sure that such models really represent the historical period of interest. For example, it would be an embarrasing hallucination if St. Augustine showed up in the context of the Roman senate.

Our evaluation confirms a known problem: LLMs and masked models in particular are trained on corpora without attention to historical periods. Unlike other research we've done on Early Modern English, this problem leads to models being barely distinguishable when it comes to their ability to generate based on a historical period. Even though history is a case where it is most obvious when models go wrong, this type of contamination is a known problem for LLM training overall, think of different legal jurisdictions using the same language, dialects in programming languages, etc.

This research was generously supported by AgileLab.

The full paper is available at:
anthology.ach.org/volumes/vol0

Pinned post

Our paper on the values found in fairy tales from some European countries has been published. We studied how values are explicitly present in tales from Germany, Italy and Portugal using various NLP techniques, but most notably Word2Vec and Word Embedding with a Compass. We visualise synchronic semantic variation to show certain differences based on observations of the corpus, some of them already observed in previous literature. A discussed example in our findings is how motherhood in Germany is strongly related to generosity, whereas in Italy and Portugal it has stronger relationship to wisdom.

Fulltext available at: aclanthology.org/2023.nlp4dh-1

@folklore @linguistics @bookstodon

Pinned post

In the morning session today Sara Sullam and I will be presenting our work on exploring nominal (in our case study - bibliographical) data. We do it by borrowing a method from educational research - the notion of phenomenographic variation. 🧵

Hey! If you'd like to watch the Canadian Oscar winner for Best Short Animated Film “The Girl Who Cried Pearls” you can find it on the National Film Board's website!

It's super cool.
Just 17 minutes.

Congrats to the Montreal-based creators and all who were involved!!
#Oscars #AnimatedMovie #animation #Montreal
nfb.ca/film/the-girl-who-cried

Why do I have the feeling that the word "nihilism" was difficult to avoid in this quote:
“I think Silicon Valley is immersed in a titanic battle between the hippie value system of the Steve Jobs generation and the Ayn Randian libertarian values of the Peter Thiel generation.”
techcrunch.com/2026/03/15/the-

Was intended to be about Kremlin propaganda, which is by far the strongest, but Trump and the world are getting so crappy that it’s difficult to abstain ridiculing them too.

Writing out a conversation I’ve been having a lot at this conference:

Things in US science are far, far worse than people know.

Far worse than even other scientists know.

1/

@jerry I am an aerospace engineer and literally have a PhD in this stuff.
I can confirm that space is essentially the worst place imaginable you could locate a datacenter.
Nvidia is 100% cashing in on this trend/bubble just because people are willing to entertain the idea, and by the time it all crashes, Nvidia doesn't care because they already sold their units.

David Borenstein, Pawel Talankin – „Ein Nobody gegen Putin“ (2025)

Autoritäre Systeme beginnen selten mit Gewalt. Sie beginnen mit Ritualen, Fahnenappellen, Liedern und Lehrplänen. Mit Sätzen, die in Klassenzimmern gesprochen werden, bis sie wie Wahrheit klingen. Der Film von David Borenstein und Pawel Talankin versteht die banale Mechanik der Macht erstaunlich gut. Nicht als spektakuläre Enthüllung, sondern als langsame Verschiebung der Wirklichkeit. Es kommt nicht oft vor, dass Sie einen Oscar prämierten Dokumentarfilm bereits vor der Preisverleihung schon in der Mediathek sehen können. Noch dazu einer, der vom @ZDF und ARTE koproduziert wurde. (ARTE, Neu!)

Zum Blog: nexxtpress.de/mediathekperlen/

If you're a young researcher with a great idea looking for a challenge, apply for the EU TalentOn contest by 30 April!

Don't miss this unique experience to develop market-ready solutions in teams, boost your entrepreneurship spirit & win prizes.

➡️ eutalentonbrest2026.eu

Despite its name, “The Tibetan Book of the Dead” isn’t just about death.

The famous Tibetan Buddhist text is really about the “bardos” — life’s in-between states — and how moments of uncertainty can become opportunities for transformation.
theconversation.com/the-tibeta

It’s a message Talankin originally hoped to share with fellow Russians. But he now believes the film speaks to a far wider audience than he could ever have anticipated when he began filming.

He points to a joke circulating in Eastern Europe: the Belarusians say they and the Russians are watching the same TV series- only Russia is a few episodes behind.

“I am sorry to tell you,” he says, “that America has begun watching this series, too.”

La prima volta abbiamo contato le macchine un sosta irregolare. Questa volta abbiamo cercato di vedere come la gente si immagina la città. Più di 100 squadre di intervistatori hanno raccolto le idee e desideri di 1000 cittadini.

Un’anteprima degli risultati è disponibile qui www.saichepuoi.it/vialibera-immaginazione/

37 million metric tons of food waste end up in U.S. landfills each year.

Sending that waste to wastewater treatment plants instead would eliminate climate emissions, and generate renewable energy and fertilizers.

And most modern sewage plants wouldn’t need any modifications to do it.
theconversation.com/how-sewage

theconversation.com/how-sewage

A philosophy professor taught Aristotle's virtue ethics while conducting her own master class in deception. She shares how small lies and omissions nearly destroyed her relationship—and what she learned about truthfulness.
theconversation.com/i-was-teac

(2/2) What I did know was that it was to be guided by the overarching values of fuelling creativity, driving collaboration and igniting compassion. These values are even more important in the age of AI, a technology that has the same potential to liberate and cause harm #Web37

Show thread

(1/2) 37 years ago today I submitted my proposal for the World Wide Web 🎂. Today, Rosemary & I spoke with students in New Orleans at Walter Isaacson's Digital History Class at Tulane University. I was asked, as I often am, if I ever could have foreseen where we’d be today. I could not.

This is a soundbite from a recent webinar we hosted, titled "What Americans can learn from other nonviolent civil activism movements"

The full recording: youtube.com/watch?v=DtUO2eEF3N

Show thread

New apps, updates and AI tools always promise to make tech easier.

But for many older adults, constant changes make devices harder to use, shifting the burden onto family members who provide “technology caregiving.”

theconversation.com/constant-t

The privacy disaster of meta glasses...

but it is already around us with any camera-enabled device, such as phones and smart TVs

svd.se/a/K8nrV4/metas-ai-smart

Waterfall ‘recycling’ at Neist on the Isle of Skye today. The strong westerly wind blowing the falling water back onto the cliffs #IsleOfSkye #Scotland

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.