@freemo or anyone: I'm trying to find out how connected the graph of fediverse nodes is. For instance, if it's fully connected (every node connected to every other), that might imply a scalability limit to the number of nodes.
I guess another way of framing the question is whether two nodes A and B are necessarily directly connected because a user in node A follows a user in node B.
@freemo @zpartacoos Very interesting. So the graph of nodes is "induced" by the graph of user follows.
So a node with a larger number of users would probably need to connect to correspondingly more other nodes because of the disparate social networks of its users. On the other hand, small nodes with, say, a hundred users who each follow less than a hundred others would only need to connect to at most 10k other nodes, and probably a lot less. (If there were billions of users, I think discovery would become a problem unless there are some nodes with very many users.)
@underlap
There are relays to help with discovery.
@zpartacoos
@freemo I can't find a definition of an ActivityPub relay - just various lists of them and nothing in the spec. Do you have a good reference, please?
@zpartacoos
@underlap
relay is just instance without UI
@odesa Ok. So presumably any actors are predefined or similar?
Hi glyn, in case you hadn't found them, I co-maintain some #ActivityPub watchlists:
https://git.feneas.org/feneas/fediverse/-/wikis/Watchlist-for-ActivityPub-developer-resources
https://git.feneas.org/feneas/fediverse/-/wikis/Watchlist-for-ActivityPub-apps
And the central community hub of fedi devs is #SocialHub at:
https://socialhub.activitypub.rocks
There's a crowdsourced Guide for AP implementers here:
https://socialhub.activitypub.rocks/t/guide-for-new-activitypub-implementers/479
@underlap @freemo I don't know the specific scaling limitations which you have in mind but I suspect that if you have two users, say, X in server A and Y in server B then if one follows the other we have a link which in this hypothetical fediverse of cardinality 2 is not a problem as every post/publish is only a single channel but if we have even just 10,000 fully connected nodes in distinct servers that'd be 49,995,000 connections/edges to publish to. Is this the problem you're concerned about?
@zpartacoos @freemo I haven't got a specific limit in mind, except perhaps for the number of users to be in the billions (for mass migration from Facebook, Twitter, etc.), but my thinking so far is as follows.
Suppose the graph of nodes needs to be fully connected (note: this is still an open question for me). Let's also suppose each node has one IP address (it may be possible to use a cluster of IP addresses to represent a single node, but let's keep things simple for now).
Then a given node can have at most 64k (probably less in practice) TCP/IP connections open at any one time (because each requires a 32 bit ephemeral port number). So if there are, say, 100k nodes or more (e.g. to host 1-2 billion users), then connections will need to be re-established periodically in order to keep "subscribers" up to date with "publishers" (pardon my terminology - I must read the ActivityPub spec!).
This will result in a trade-off between CPU consumption and the timeliness of delivery of messages. E.g. if we want to display any new post within one minute of it being published, we'd potentially need to cycle round all possible connections once per minute.
(If, on the other hand, the graph of nodes isn't fully connected, then some intermediate nodes would need to do more work to forward messages.)
@underlap
The main problem to answer that question is to obtain the graph... I'm guessing that by node you mean instances, so you'd need to know which instances federate with which other instances... And there's no way to do that exactly.
If you had the graph, though, you'd want to look at the strongly/weakly connected components.
On the other hand, I'm not sure what you mean by scalability limit so can't help you with that.
@freemo
By scalability limit I mean some practical limit on the number of instances and/or users. It's looking like there really are no such limits unless we end up with some kind of uber instances with millions of users, which might then need to connect to hundreds of thousands of other instances (in a fictional future when the fediverse has taken over from traditional social media ;-) ).
@underlap
OK, I think I see what you mean... I'm not sure that this would be an issue if we move to multicast for broadcast but it may be a problem for a big instance that may need to handle several millions incoming messages from other instances... Maybe that will help ensure that we stick to instances with circa 1k users (in which case we would still need on the order of a million servers, but they would probably be unlikely to all federate due to language and topic preferences)
@freemo
@underlap
It is not fully connected. Nodes federate with eachother whenever a user from one node follows a user from another. Only federated servers that share atleast one pair of mutual followers sync up but in that case thry sync up all users and all users from one server will appear in the federated timeline of the other.
You also have servers that operate whitelists or blacklists that block other servers.