**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 17:28

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 17:28

Pierre Bourdon @delroth@delroth.net

Dec 23, 2022, 17:28

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 00% 22598 306666512

Welp, I guess I'm getting a new 8TB HDD for Christmas. Luckily the failing one is still under its 2 years warranty!

**robryk** @robryk@qoto.org · 2022-12-23T17:35:30Z

robryk @robryk@qoto.org

@delroth Isn't this "just" a sector that failed and wasn't reallocated yet, because it wasn't written to?

Dec 23, 2022, 17:35 · · · ·

**robryk** @robryk@qoto.org · Dec 23, 2022, 17:41

**robryk** @robryk@qoto.org · Dec 23, 2022, 17:41

Dec 23, 2022, 17:41

robryk @robryk@qoto.org

@delroth Or even s/failed/is unreadable due to crc mismatch/

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 17:49

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 17:49

Dec 23, 2022, 17:49

Pierre Bourdon @delroth@delroth.net

@robryk No clue. The disk is telling me it's broken, I'm not particularly interested in trying out how much it's broken before it eats my data. I'm guessing that an extended offline test would take care of reallocating if it could too.

**robryk** @robryk@qoto.org · Dec 23, 2022, 17:50

**robryk** @robryk@qoto.org · Dec 23, 2022, 17:50

Dec 23, 2022, 17:50

robryk @robryk@qoto.org

@delroth No, it won't.

The sector won't be reallocated until it's written to. The reasoning behind that is that maybe the next read will actually succeed, and we should never trash that possibility without explicit instructions to do so.

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:02

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:02

Dec 23, 2022, 18:02

robryk @robryk@qoto.org

@delroth Take a look at Reallocated_Sector_Ct (and Offline_Uncorrectable and Current_Pending_Sector) counters. If there are few remaining spare sectors, then the disk is really close to failure. This is indicated by Reallocated_Sector_Ct being marked as dangerously high.

Other than that, sectors that cannot be corrected with the error correction code happen at some rate. This rate can be increased by various issues that make the drive arguably broken, but it's nonzero even with totally operational drive.

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:11

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:11

Dec 23, 2022, 18:11

Pierre Bourdon @delroth@delroth.net

@robryk [nix-shell:~]# dd if=/dev/zero of=/dev/sdh bs=512 seek=8896601104
dd: error writing '/dev/sdh': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 3.38237 s, 0.0 kB/s

Can't be written to at all, from what I can tell.

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:15

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:15

Dec 23, 2022, 18:15

robryk @robryk@qoto.org

@delroth Huh. That's really surprising (the read error was not immediate, so it's not _totally_ borked, but then why it seems totally borked for writes? is that the read-errored sector that you're trying to write to?). Would you mind pasting `smartctl -a /dev/sdh` and the presentation of this error in dmesg for my curiosity?

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:18

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:18

Dec 23, 2022, 18:18

Pierre Bourdon @delroth@delroth.net

@robryk https://gist.github.com/delroth/75f812010f871ff907fe033dc4b99905

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:21

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:21

Dec 23, 2022, 18:21

robryk @robryk@qoto.org

@delroth

I think this sector would work if you wrote to it. Sadly, you end up reading from it first (probably due to some readahead/caching/other bullshit) -- see `failed command: READ FPDMA QUEUED`.

I remember having a similar problem with a PATA drive >10yrs ago, which I fixed by rebuilding a kernel that just never issued reads to HDD. I expect that there's some way to make block IO layer actually issue only a write with some flags to open (O_DIRECT?).

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:24

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 18:24

Dec 23, 2022, 18:24

Pierre Bourdon @delroth@delroth.net

@robryk hmm indeed, oflag=direct seems to have cleared the failure. Nice, thanks.

Not sure how much I still trust this drive, and now I don't even have an excuse to get a warranty replacement :P

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:26

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:26

Dec 23, 2022, 18:26

robryk @robryk@qoto.org

@delroth I see ~no reason to count this against that drive in light of Reallocated_Event_Count that was equal to 0 (so, IIUC no sectors were found not to be usable anymore yet).

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 19:19

**Pierre Bourdon** @delroth@delroth.net · Dec 23, 2022, 19:19

Dec 23, 2022, 19:19

Pierre Bourdon @delroth@delroth.net

@robryk FWIW Reallocated_Event_Count is still 0 now so uh... SMART being very accurate as usual I guess.

**robryk** @robryk@qoto.org · Dec 23, 2022, 19:30

**robryk** @robryk@qoto.org · Dec 23, 2022, 19:30

Dec 23, 2022, 19:30

robryk @robryk@qoto.org

@delroth It might truly be terribly inaccurate. However, there might have been no reallocation: we only know that there was an error when reading that ECC could not correct. It's possible (and likely) that the sector was physically OK and was still writeable (i.e. after writing you'd read the same thing back). In that case we just keep using the same sector (I don't know how the detection of that works exactly; I'd imagine that writing to a known-uncorrectable sector would involve an immediate readback, but does the drive know that? we surely can't read back everything).

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:24

**robryk** @robryk@qoto.org · Dec 23, 2022, 18:24

Dec 23, 2022, 18:24

robryk @robryk@qoto.org

@delroth s/never issued/redirected all reads to one sector :P/

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…