Pinned post

The case of “vegetative electron microscopy” illustrated here shows what is badly needed in current research and has implications far beyond. We need tools that help us curate huge corpora. We need to be able to trace back to the training data and understand what are the specific (to a surprise, often ) reasons in the model input that cause that particular output.

If anyone is interested in collaborating on this, I'm in, have done some small-scale experiments and have already submitted a grant proposal.
theconversation.com/a-weird-ph

Pinned post

I think I now know where to draw the line between "good" and "bad" , and possibly (or rather obviously) the same for . It's simply whether the input data has been constructed rigorously. Put this way it's the most obvious statement ever, but somehow have convinced us all that they advance research by recklessly scraping , and who knows what else (they keep their training data secret).

What is good science in computational linguistics? Well, open data is a step towards it. But open and crap is not a solution. We need to actually _know_ and manage the data. And nobody in their right mind would want to plough through toxic data to clean it. We've all heard the horrors of Kenyan data workers who do it for money and still suffer doing it.

But better (yes, also smaller) corpora are of interest to scholars in the humanities and the social sciences. Think of textcreationpartnership.org or mlat.uzh.ch. Yes, they are too big for individual researchers or even teams to handle, but we have the organisational and technological infrastructure to work on them collectively. We've been doing it for ages and we will continue doing it. We just need to do it together.

And this is the goal of the European Research Council project proposal I'm submitting in this very moment.

Pinned post

Today at , I will be presenting our work on the evaluation of the historical adequacy of masked language models (MLMs) for . There are several models like this, and they represent the current state of the art for a number of downstream tasks, like semantic change and text reuse detection. However, a historical researcher, philologist or else would want to be sure that such models really represent the historical period of interest. For example, it would be an embarrasing hallucination if St. Augustine showed up in the context of the Roman senate.

Our evaluation confirms a known problem: LLMs and masked models in particular are trained on corpora without attention to historical periods. Unlike other research we've done on Early Modern English, this problem leads to models being barely distinguishable when it comes to their ability to generate based on a historical period. Even though history is a case where it is most obvious when models go wrong, this type of contamination is a known problem for LLM training overall, think of different legal jurisdictions using the same language, dialects in programming languages, etc.

This research was generously supported by AgileLab.

The full paper is available at:
anthology.ach.org/volumes/vol0

Pinned post

Our paper on the values found in fairy tales from some European countries has been published. We studied how values are explicitly present in tales from Germany, Italy and Portugal using various NLP techniques, but most notably Word2Vec and Word Embedding with a Compass. We visualise synchronic semantic variation to show certain differences based on observations of the corpus, some of them already observed in previous literature. A discussed example in our findings is how motherhood in Germany is strongly related to generosity, whereas in Italy and Portugal it has stronger relationship to wisdom.

Fulltext available at: aclanthology.org/2023.nlp4dh-1

@folklore @linguistics @bookstodon

Pinned post

In the morning session today Sara Sullam and I will be presenting our work on exploring nominal (in our case study - bibliographical) data. We do it by borrowing a method from educational research - the notion of phenomenographic variation. 🧵

Today is International Academic Freedom Day 🔍

At the ERC, science is free to follow the evidence wherever it leads. Researchers choose the question. Excellence is the only criterion.

👉 bit.ly/4toSHTf

>@EUScienceInnov @StudentAFAF #ProtectWhatMatters

Today is International Academic Freedom Day 🔍

At the ERC, science is free to follow the evidence wherever it leads. Researchers choose the question. Excellence is the only criterion.

👉 bit.ly/4toSHTf

@EUScienceInnov @StudentAFAF #ProtectWhatMatters
---
nitter.net/ERC_Research/status

Qualitative research matters because some things can’t be measured with numbers alone.

It helps explain how people actually feel, think and experience the world — and that’s why researchers say #AI still can’t replace researchers, even if it can organize some data.
theconversation.com/ai-intervi

"If you want to speed up processes, you need to make sure that the people that need to do the work have all the means to actually do the work.

This means that if your legal approval process is going slow, you take a look at what is needed to start a legal approval process. If they need to chase five different people for incomplete documents, you’re not going to speed up said process by adding more lawyers to the department.

One of the big lessons of The Goal is: ”bottlenecks should receive predictable, high-quality inputs”."

frederickvanbrabant.com/blog/2

No Starch Press has all their books sale at 40% off until May 26th. You can get my books and many other great titles at nostarch.com Use code LONGWEEKEND

⚡️☀️🌱More homegrown clean energy = less dependence on imported oil and gas.

AccelerateEU supports this shift with measures setting electrification targets & removing barriers across the industrial, transport & building sectors.

➡️ link.europa.eu/Gmb4PF

#AffordableEnergy
---
nitter.net/Energy4Europe/statu

Join us in Bologna, Italy, from 4 to 6 November 2026 for Visualising Climate – the first global conference dedicated to data visualisation and the climate crisis.

visualisingclimate.com

In this era of an accelerating climate crisis, the role of clear, compelling and actionable communication has never been more critical.

Together, scientists, artists, designers, communicators, and journalists will explore the space where data meets storytelling, and art meets science.

#VisualisingClimate2026

Giving foreign aid may not boost a country’s image.

In South Korea, people who received U.S. COVID-19 vaccines didn’t view the U.S. more favorably, but they were more likely to support giving aid to other countries themselves.

theconversation.com/foreign-ai

theconversation.com/foreign-ai

You know exercise is good for you. So why is it still so hard to make yourself do it?

The biggest barrier often isn’t knowledge — it’s believing you can stick with it when life gets busy, stressful or exhausting.

🏃‍♀️ 🏋️‍♂️ 🚴‍♀️ 💪 🏃

theconversation.com/you-know-e

Teens aren't as disengaged as you may think.

Young people don’t all contribute in the same way, and understanding the broader picture is the starting point for adults who want to support them.
theconversation.com/teens-aren

re: hci.social/@chrisamaphone/1165

a cool trick i once learned is that you can often decipher the pragmatics of corporatespeak (and academic adminspeak) by negating its semantics

"Israel and Russia are also the only two countries in the world whose leaders are wanted by the International Criminal Court for alleged war crimes. (In both countries, one other official is also sought.)

Both countries hold occupied territories, and both have declared annexations that are not recognized internationally and appear to violate the UN Charter. The similarities go further, says Popova.

"Russia has no right to be confiscating and trading this grain," she said. "It's not Russia's grain, and so it's an international law violation known as the crime of pillage."

Israel, like Russia, exploits the resources of occupied lands for its own benefit."
cbc.ca/news/politics/grain-dis

Ladybirds might be your garden’s best ally.

These tiny red beetles feed on aphids and other harmful insects that damage your plants, acting as a natural, chemical-free form of pest control.

More: link.europa.eu/9V7vT9

Small actions. Big impact
#ForOurPlanet
---
nitter.net/EU_ENV/status/20523

Danish parents encourage unstructured play (especially the kind that lets children test their limits) as a way to help them grow into competent, independent adults.

More from a psychologist from Denmark:
theconversation.com/denmarks-h

@TheConversationUS "Chip manufacturing behaves less like a competitive commodity market and more like a layered oligopoly. Scale matters because the leading firms can reinvest in research, improve yields, secure equipment and deepen customer relationships. In the case of graphics processor chips, designers such as NVIDIA, which has 85% market share, depend on advanced semiconductor foundries such as TSMC, which has more than 70% market share, to manufacture chips using extreme ultraviolet lithography machines from ASML, a monopoly."

🌡️Europe is warming faster than any other continent.

In 2025, about 95% of the continent saw above-average temperatures.

To fight climate change we are building a digital twin of our planet to monitor & predict natural phenomena.

Explore DestinE → destination-earth.eu
---
nitter.net/DigitalEU/status/20

Tag:harassment_prevention=ask_angela - OpenStreetMap Wiki
wiki.openstreetmap.org/wiki/Ta
Amazing! I had just been wondering whether this can be mapped or whether I would have to start a proposal. Yay! It already exists. Obviously sad that it has to.
#OpenStreetMap

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.