At present it appears the FAA NOTAMS outage was created by a corrupt input file.

Jesus wept.

It's 2023 and we still haven't learned that ALL INPUTS MUST BE VALIDATED.

I have a rant and a half on the subject, but if you mutants could please tattoo that backwards on your foreheads so you see it each time you look in the mirror that'd make me really happy, you dig?

@rob (Some details changed to protect the actual implementation, and definitely not because it's been years and I forgot them. ;) ).

Once upon a time, Google had a more-than-a-few-minutes significant outage because someone pushed a configuration file for the outer-layer web traffic handlers that included an empty array as the set of valid handlers.

At the time, the system would cheerfully take that input to mean that every single frontend server was in maintenance mode and should hand off 100% of its traffic to other available servers. Google.com requests slowed to a trickle as servers began thrashing, desperately searching for a peer that wasn't in maintenance mode, just sloshing requests back and forth like a bubble bath. Damage was, fortunately, mitigated by the rollout being slow (whether that was intentional for self-protection or an inefficiency that nobody ever got around to fixing, I can't remember).

Anyway, that file was valid but they narrowed the definition of "valid" to "the set of active handlers should be more than zero." I can't remember what they did about the rare circumstance where it was *correct* to say none were valid though.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.