@nielso specifically, one problem is that any distributed system involves overhead, necessarily. It trades efficiency and resource usage for advantages of being distributed.
So the problem is how do you take an already inefficient system and use it for relatively resource intensive media like video content? It's bad enough trading text around, but video increases resource requirements exponentially.
It's a hard problem to solve, and it's not one that ActivityPub seemed really focused on addressing. So in the end this is a technical issue that the underlying technology of this platform isn't really cut out for video.