Re: [opensuse] smartctl: is this HD really OK?

15 Mar 2015

      On Sun, Mar 15, 2015 at 3:50 AM, Felix Miata  wrote:
...
NAICT, smartctl thinks
this HD is OK, but is it really?
As long as the drive reallocates on writes, and none of the prefail
attribute values are at threshold, the drive will say it's healthy.
There are now quite a few published papers showing that the drive's
self assessment isn't helpful a significant minority of the time. Of
all the problems SMART reports, increasing numbers of bad sectors is
correlated with prefailure. The fact you now have a corrupt file
system is consistent with that.

a. Dispose of the drive. If you do that, consider using hdparm to
leverage the drive's built-in ATA Security Erase command. This is the
only way to erase data on sectors that no longer have LBA mapping.
This is also a ton faster than writing zeros with dd.
http://mackonsti.wordpress.com/2011/11/22/ssd-secure-erase-ata-command

b. Use badblocks -svw. This is destructive, and does ~4 passes, write
followed by read. I'd let it do at least two write/read passes before
canceling it, or just let it complete. If the drive is actually
working normally, no errors will be recorded in dmesg or badblocks.
The drive will remap those bad sectors internally at write time.

Any failures are essentially fatal. A write fail means there are no
more reserve sectors for reallocation. A read failure means the drive
firmware incorrectly assessed a persistent write failure for a
transient one. And a corruption count means some kind of silent data
corruption, like a torn write. So if it comes up clean, honestly I
still wouldn't trust it I'd relegate it to Btrfs use only (which would
have self healed in this instance so long as the default DUP metadata
was used). If it has errors, then obliterate it with hdparm and
retired it.

-- 
Chris Murphy
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org