@freemo or anyone: I'm trying to find out how connected the graph of fediverse nodes is. For instance, if it's fully connected (every node connected to every other), that might imply a scalability limit to the number of nodes.
I guess another way of framing the question is whether two nodes A and B are necessarily directly connected because a user in node A follows a user in node B.
@freemo @zpartacoos Very interesting. So the graph of nodes is "induced" by the graph of user follows.
So a node with a larger number of users would probably need to connect to correspondingly more other nodes because of the disparate social networks of its users. On the other hand, small nodes with, say, a hundred users who each follow less than a hundred others would only need to connect to at most 10k other nodes, and probably a lot less. (If there were billions of users, I think discovery would become a problem unless there are some nodes with very many users.)
@underlap
There are relays to help with discovery.
@zpartacoos
@freemo I can't find a definition of an ActivityPub relay - just various lists of them and nothing in the spec. Do you have a good reference, please?
@zpartacoos
@underlap
relay is just instance without UI
@odesa Ok. So presumably any actors are predefined or similar?
@underlap @freemo I don't know the specific scaling limitations which you have in mind but I suspect that if you have two users, say, X in server A and Y in server B then if one follows the other we have a link which in this hypothetical fediverse of cardinality 2 is not a problem as every post/publish is only a single channel but if we have even just 10,000 fully connected nodes in distinct servers that'd be 49,995,000 connections/edges to publish to. Is this the problem you're concerned about?
@zpartacoos @freemo I haven't got a specific limit in mind, except perhaps for the number of users to be in the billions (for mass migration from Facebook, Twitter, etc.), but my thinking so far is as follows.
Suppose the graph of nodes needs to be fully connected (note: this is still an open question for me). Let's also suppose each node has one IP address (it may be possible to use a cluster of IP addresses to represent a single node, but let's keep things simple for now).
Then a given node can have at most 64k (probably less in practice) TCP/IP connections open at any one time (because each requires a 32 bit ephemeral port number). So if there are, say, 100k nodes or more (e.g. to host 1-2 billion users), then connections will need to be re-established periodically in order to keep "subscribers" up to date with "publishers" (pardon my terminology - I must read the ActivityPub spec!).
This will result in a trade-off between CPU consumption and the timeliness of delivery of messages. E.g. if we want to display any new post within one minute of it being published, we'd potentially need to cycle round all possible connections once per minute.
(If, on the other hand, the graph of nodes isn't fully connected, then some intermediate nodes would need to do more work to forward messages.)
@underlap
It is not fully connected. Nodes federate with eachother whenever a user from one node follows a user from another. Only federated servers that share atleast one pair of mutual followers sync up but in that case thry sync up all users and all users from one server will appear in the federated timeline of the other.
You also have servers that operate whitelists or blacklists that block other servers.