On multiple occasions I've listened to instance admins speak about high S3 costs. The sheer amount of data absolutely balloons the more activity your server sees, I get it.
What I don't get is whether there's some unknown fedi ethical reason everybody insists on setting up an S3 cache (followed immediately by complaining about it).
Y'all want to know what the rest of the web does? Hosts their own uploaded media, and links out to the rest...
Am I wrong for thinking that this established expectation (especially for smaller bootstrapped instances) is perfectly cromulent from an ops perspective? Honestly asking because I come from a time before DevOps and Microservices were a thing, and we all hosted our crud on servers we had physical access to (though VPSes are great!)
Yes, I totally get the benefits of having a CDN. Especially with global access, but nobody's setting up a globally distributed CDN for their dinky Mastodon instance.
@devnull I suspect it's because if something goes "viral", then there might be a stampede of requests to that resource, which might accidentally wind up DoSing an instance.
Actually, though, there are quite a few instances (or there was) which don't host resources themselves, and which load them from the original instance, although they seem to be on the smaller side.
From a liability perspective (although, there is already a certain amount of lee-way there), linking is superior to proxying content.