**Marian** @sendung@mastodon.social · Dec 22, 2021, 14:52

**Marian** @sendung@mastodon.social · Dec 22, 2021, 14:52

Marian @sendung@mastodon.social

Dec 22, 2021, 14:52

In the recent weeks I've been experimenting with an approach to find people worth following on Mastodon.

It's pretty brute force and not very clever, but better than nothing. Roughly it goes like:

1. Crawl the public timelines of many public instances to find active accounts.
2. Per account found, collect the 100 latest toots.
3. Remove a few obvious non-people (bots)
4. Make it searchable, rank the results by some factors.

The results are quite mixed. Will write about a few things I found.

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:03

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:03

Dec 22, 2021, 15:03

Marian @sendung@mastodon.social

The method yielded 28K active accounts. An active account by my definition is one that has published a status in the last 30 days (at the time of crawling).

Of these, 21K have published 100 toots or more.

When removing bots, 19K remain.

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:07

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:07

Dec 22, 2021, 15:07

Marian @sendung@mastodon.social

The crawl has started on December 2nd and lasted until now, with interruptions. It covers currently 4200 instances. By far not all of those are Mastodon instances.

There is an unexpected lot of servers that responds to some of the Mastodon API requests, but not all. Some are Mastodon forks. Anyway, for my purpose (finding people to follow from a Mastodon account) that wasn't really important.

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:10

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:10

Dec 22, 2021, 15:10

Marian @sendung@mastodon.social

The first obvious challenge I ran into: how to find people who post in the languages I understand?

Mastodon instances have a language code. But that code doesn't mean a lot, because users can post in whatever language they like. Also users have a language code (I currently don't know where this is configured.) Again, users post in whatever language they like. Many users use several language, dependeing on who they want to reach, or they write one language but re-blog content in others.

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:14

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:14

Dec 22, 2021, 15:14

Marian @sendung@mastodon.social

My solution to this is "dominant language detection". Basically I throw all 100 recent toots of an account into language detection. The top guess by the detector is the language I consider dominant. As a first attempt, this works well enough.

A better approach would likely detect all the languages an account uses. Then, at some point, it's up to me as a user to decide whether I want to follow someone who posts in English (OK for me) and Japanese (not readable for me).

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:16

**Marian** @sendung@mastodon.social · Dec 22, 2021, 15:16

Dec 22, 2021, 15:16

Marian @sendung@mastodon.social

I learned more about the difficulties of language detection.

(1) The more languages you have as candidates, the less confident your detection result will be.

(2) The shorter the text, the less confident the result. Some toots are definitely too short to guess. (Some toots don't even contain language, they are all media.)

**Martin Ruskov** @mapto@qoto.org · 2024-05-23T09:19:25Z

Martin Ruskov @mapto@qoto.org

@sendung hey, I'm interested in studying how language communities develop over the fediverse. Ideally, I'd like to do something similar to what you're doing and publish (both academic publication and web service along the lines of https://fedidb.org/ and https://fedistats.cc/) some analysis of multilingualism across instances and individuals. I personally participate in Italian and Bulgarian communities, but also follow a number of English, German and Russian accounts.

Clearly this interest is entirely at the aggregated level, but for small communities, we'll need to be careful regarding privacy implication of published data.

So, a few questions:
1. Do you still check your account?
2. Are you interested in collaborating on this?
3. Are you willing to share some work you've done (code, data, whatever)?

Thanks!

May 23, 2024, 09:19 · · · ·

**℣𐎊𐌂§** @vics@pleroma.debian.social · May 23, 2024, 10:38

**℣𐎊𐌂§** @vics@pleroma.debian.social · May 23, 2024, 10:38

May 23, 2024, 10:38

℣𐎊𐌂§ @vics@pleroma.debian.social

@mapto @sendung I would like to find posts (i.e. potential friends) in my exotic language: be. Are there any services available to help that?

**Martin Ruskov** @mapto@qoto.org · May 23, 2024, 11:30

**Martin Ruskov** @mapto@qoto.org · May 23, 2024, 11:30

May 23, 2024, 11:30

Martin Ruskov @mapto@qoto.org

@vics @sendung I'm certainly looking for the same and there's no real easy of doing that.

The workaround I'm currently using is:
1. Find instances with language of interest, e.g. https://fedistats.cc/nodes?sort=daily_posts&q=lang%3Abe
2. Check out their timeline, e.g. https://vkl.world/public/local

But I suppose your language is like mine, and you've exhausted the few relevant instances long ago. Plus, most of the community is dispersed elsewhere anyways.

Possibly a step 3. Could be https://followgraph.vercel.app/ , but stupidly it looks up who your contacts are following, and not who follows them. If you also think it should be otherwise, consider this issue https://github.com/gabipurcaru/followgraph/issues/18

**Marian** @sendung@mastodon.social · Jul 02, 2024, 19:30

**Marian** @sendung@mastodon.social · Jul 02, 2024, 19:30

Jul 02, 2024, 19:30

Marian @sendung@mastodon.social

@mapto Hey there!

> Do you still check your account?

The honest answer here has to be: no. I just saw your message from more than one year ago. Sorry about that.

Did your work yield any results you'd like to share? I haven't invested any more effort here since I posted initially in Dec 2021.

Resources

Developers

What is Mastodon?

qoto.org

More…