#TIL The IBM Blue Gene/L supercomputer (2005) has a Mean-Time-Before-Failure of 6.16 days... When you have a 65536-node cluster, it's like the vacuum tube era again, replacing parts every week, except that a failing node doesn't stop the entire machine.

Follow

@niconiconi I have encountered similar scene when I was developing programs for domestic supercomputer and yes, that is why failure tolerance is necessary for real-world scientific computing programs. (

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.