Bad news: my router is still unstable and the NIC or the PCIe controller fails every few weeks for seemingly no reason, locking up the whole thing.

Good news: systemd's hardware watchdog support works really well, and when my router's NIC crashed last night it ended up causing unavailability for less than 1min as the whole device rebooted after being frozen for 30s!

(github.com/delroth/infra.delro)

The weird thing is that the only change that correlates with the instability is migrating to nftables... but the crash symptoms don't match this at all (PCIe transactions timing out). I wonder if it's just heat / lack of airflow which triggers now because of the warm summer, instead of being something caused by a software change...

Follow

@delroth I don't know how hot the chips are now, but maybe making it hotter and seeing if the problem appears more often might help?

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.