Re: [opensuse] smartctl - Help with smartctl output - should I be concerned?

21 Jan 2010

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday, 2010-01-21 at 12:31 -0000, Dave Howorth wrote:
...
Carlos E. R. wrote:
...
Except... The HD tries to move the data in a bad sector to a spare
sector; but the data itself might be bad (read failure). Notice that
this is an operation that happens totally inside the HD, without any
consideration for the mirrored disk. Nobody outside of the HD knows
about the problem.
Surely the user (or operating system at least) learns of a read failure?
Wouldn't that lead to system/user-level retries and if they failed to
rewriting of the sector from the mirror, which in turn would cause the
remap?
Just trying to understand how all the cogs interlock :)
No, the operating system doesn't know a thing, because this is completely 
internal to the HD firmware. I don't know the details, that is, I haven't 
seen a paper from a manufacturer explaining how exactly they do it. From 
what I gathered, when the HD attempts to write to a sector and it fails, 
and determines (somehow?) that that sector is bad and not recoverable, it 
decides to write the data to another sector, a spare sector defined as 
such during design by the manufacturer. Somehow, somewhere, external to 
the filesystem data, it stores that any read/write operation destined to 
the "bad" sector will happen instead on the remapped sector: meaning that 
the head has to move there, and operation is a tad slower.

All the system notices is that the original write operation went slower. 
The HD disk reports success... nothing happened. Afterward, if you run 
smartctl, you see the remap counter has gone one up, that's all.

It is possible that there is a protocol for the operating system to learn 
of this in real time, and perhaps do something. I'm not aware of that, but 
then, I'm not that expert :-)

It is different, though, if the problem occurs during a read. The system 
will probably get a read failure code, but the HD will do no remapping; I 
guess because it doesn't know what the correct data to write should be.

Again, it is possible that there is a protocol defined (perhaps it is 
manufacturer dependent) for the operating system to intervene and trigger 
a remap. I haven't heard of such, but certainly, in case of a raid, it 
would be very interesting to have.

- -- 
Cheers,
        Carlos E. R.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAktYvjIACgkQtTMYHG2NR9UpogCgk9MNuEKerqUEDsvF64h7tgYO
R00An0eQDtuRc8IRqIukXmMFRozkU4O/
=ojCK
-----END PGP SIGNATURE-----
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org