A case where I wanted to make software reliable, which caused downtime.
In lib.rs, when #Rust had only 20K crates, I've used 16-bit counters for stats. I knew it'll eventually outgrow them, so I've used checked arithmetic and made it panic on overflow, "so that I will notice when that happens".

Well, it overflowed, and because the stats were used in so many places now, it basically took the site down, at the worst possible moment when I had no time to make a release.

@kornel I got to watch a hell of an Atlas-lift at Google years back.

Google used 16 bits as the unique identifier code for fields in protobuffer.

It turns out they have one titanic proto that represents the sparse-formatted data in their central log infrastructure---you can log anything you want there, as long as it matches to a field in that proto.

... I can't remember right now if the issue was that they found more than 65,000 things to log or that they'd actually made the field ID signed in the proto interpreter code and they found more than 32,000 things to log...

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.