-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2010-01-23 at 13:20 -0600, David C. Rankin wrote:
On 01/22/2010 11:01 AM, Carlos E. R. wrote:
On Friday, 2010-01-22 at 17:23 +0100, Anders Johansson wrote:
...
As far as I know, fsck never does badblock analysis
As a matter of fact, it does. Kind of. For instance, for ext3 it calls e2fsck, and this one has options for badblock handling.
There in lies an Achilles' of dmraid that I am interested in. If there isn't a specific kernel level workaround to temporarily disable a dmraid array to check and correct any normally easily correctable 'disk' errors such as bad blocks, etc.., then that means a dmraid setup will suppress/prevent the correction of disk errors on each 'disk' in the array allowing simple correctable errors to propagate or grow into multiple compounded errors resulting in disk deterioration to the point of data loss.
Whoa! Hold on. The posibility of using fsck to mark badblocks is _NOT_ used on contemporary hard disks. Period :-) Badblocks are left to be managed by the hard disk firmware internally and transparently, when attemting to write to a known (by the HD) bad block.
Simply put, if small disk errors go uncorrected in the dmraid array and are allowed to remain uncorrected and multiply, then dmraid has a gaping hole in its robustness making it look far inferior to software raid.
I can't believe that it the way dmraid works. In the past, I have actually preferred it to software raid because I can rebuild a new disk following a drive failure and remake the array before I ever have to boot the operating system. But, if you can never e2fsck -fcy /dev/sda, sdb, etc.. without first disabling the array in the bios, you are basically playing Russian roulette with disk errors on any single disk in the array.
If you do that, you will corrupt your filesystem, and not be able to reenable the array. Both images will be different, and, I guess, when the array is reenabled the newer copy will simply be overwritten to the older copy, including the badblocks; ie, bad blocks in side 0 will be copied and marked bad to side 1, even if there are none there. And perhaps, good blocks on 0 will overwrite bad blocks on 1. Notice that fsck bad blocks works on the filesystem level, it does never see raid elements, which is correct. The raid, any type of raid, presents a unique device to the filesystem layer. If you disable the array and run fsck on the elements, fsck will work on the apparent filesystem, both sides will be different as a result, and the array will be corrupted in the end. As best, the newer side will overwrite the older side. Do not attempt to correct badblocks from the operating system on an array; leave that to the disk firmware. Just use smartctl to trigger the long test (on each element), which includes a surface scan. If there are uncorrected sectors after that, the procedure would be to rewrite the affected sectors (or the entire disk). - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktbVNwACgkQtTMYHG2NR9UHRwCcDDp2+4EkNQs2ceUgs6pHpY4O 2+oAnirvL/sfAYI0xhjexYbi1rbkblcV =mJM4 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org