In the recent weeks I've been experimenting with an approach to find people worth following on Mastodon.
It's pretty brute force and not very clever, but better than nothing. Roughly it goes like:
1. Crawl the public timelines of many public instances to find active accounts.
2. Per account found, collect the 100 latest toots.
3. Remove a few obvious non-people (bots)
4. Make it searchable, rank the results by some factors.
The results are quite mixed. Will write about a few things I found.
The crawl has started on December 2nd and lasted until now, with interruptions. It covers currently 4200 instances. By far not all of those are Mastodon instances.
There is an unexpected lot of servers that responds to some of the Mastodon API requests, but not all. Some are Mastodon forks. Anyway, for my purpose (finding people to follow from a Mastodon account) that wasn't really important.
The first obvious challenge I ran into: how to find people who post in the languages I understand?
Mastodon instances have a language code. But that code doesn't mean a lot, because users can post in whatever language they like. Also users have a language code (I currently don't know where this is configured.) Again, users post in whatever language they like. Many users use several language, dependeing on who they want to reach, or they write one language but re-blog content in others.
My solution to this is "dominant language detection". Basically I throw all 100 recent toots of an account into language detection. The top guess by the detector is the language I consider dominant. As a first attempt, this works well enough.
A better approach would likely detect all the languages an account uses. Then, at some point, it's up to me as a user to decide whether I want to follow someone who posts in English (OK for me) and Japanese (not readable for me).
@sendung hey, I'm interested in studying how language communities develop over the fediverse. Ideally, I'd like to do something similar to what you're doing and publish (both academic publication and web service along the lines of https://fedidb.org/ and https://fedistats.cc/) some analysis of multilingualism across instances and individuals. I personally participate in Italian and Bulgarian communities, but also follow a number of English, German and Russian accounts.
Clearly this interest is entirely at the aggregated level, but for small communities, we'll need to be careful regarding privacy implication of published data.
So, a few questions:
1. Do you still check your account?
2. Are you interested in collaborating on this?
3. Are you willing to share some work you've done (code, data, whatever)?
Thanks!
@vics @sendung I'm certainly looking for the same and there's no real easy of doing that.
The workaround I'm currently using is:
1. Find instances with language of interest, e.g. https://fedistats.cc/nodes?sort=daily_posts&q=lang%3Abe
2. Check out their timeline, e.g. https://vkl.world/public/local
But I suppose your language is like mine, and you've exhausted the few relevant instances long ago. Plus, most of the community is dispersed elsewhere anyways.
Possibly a step 3. Could be https://followgraph.vercel.app/ , but stupidly it looks up who your contacts are following, and not who follows them. If you also think it should be otherwise, consider this issue https://github.com/gabipurcaru/followgraph/issues/18
@mapto Hey there!
> Do you still check your account?
The honest answer here has to be: no. I just saw your message from more than one year ago. Sorry about that.
Did your work yield any results you'd like to share? I haven't invested any more effort here since I posted initially in Dec 2021.