-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2010-01-21 at 12:31 -0000, Dave Howorth wrote:
Carlos E. R. wrote:
Except... The HD tries to move the data in a bad sector to a spare sector; but the data itself might be bad (read failure). Notice that this is an operation that happens totally inside the HD, without any consideration for the mirrored disk. Nobody outside of the HD knows about the problem.
Surely the user (or operating system at least) learns of a read failure? Wouldn't that lead to system/user-level retries and if they failed to rewriting of the sector from the mirror, which in turn would cause the remap?
Just trying to understand how all the cogs interlock :)
No, the operating system doesn't know a thing, because this is completely internal to the HD firmware. I don't know the details, that is, I haven't seen a paper from a manufacturer explaining how exactly they do it. From what I gathered, when the HD attempts to write to a sector and it fails, and determines (somehow?) that that sector is bad and not recoverable, it decides to write the data to another sector, a spare sector defined as such during design by the manufacturer. Somehow, somewhere, external to the filesystem data, it stores that any read/write operation destined to the "bad" sector will happen instead on the remapped sector: meaning that the head has to move there, and operation is a tad slower. All the system notices is that the original write operation went slower. The HD disk reports success... nothing happened. Afterward, if you run smartctl, you see the remap counter has gone one up, that's all. It is possible that there is a protocol for the operating system to learn of this in real time, and perhaps do something. I'm not aware of that, but then, I'm not that expert :-) It is different, though, if the problem occurs during a read. The system will probably get a read failure code, but the HD will do no remapping; I guess because it doesn't know what the correct data to write should be. Again, it is possible that there is a protocol defined (perhaps it is manufacturer dependent) for the operating system to intervene and trigger a remap. I haven't heard of such, but certainly, in case of a raid, it would be very interesting to have. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktYvjIACgkQtTMYHG2NR9UpogCgk9MNuEKerqUEDsvF64h7tgYO R00An0eQDtuRc8IRqIukXmMFRozkU4O/ =ojCK -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org