Re: [opensuse] Does openSuSE Ever run fsck on disks in dmraid array with nvidia controller?

23 Jan 2010

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday, 2010-01-23 at 13:20 -0600, David C. Rankin wrote:
...
On 01/22/2010 11:01 AM, Carlos E. R. wrote:
...
On Friday, 2010-01-22 at 17:23 +0100, Anders Johansson wrote:
...
...
As far as I know, fsck never does badblock analysis
As a matter of fact, it does. Kind of. For instance, for ext3 it calls
e2fsck, and this one has options for badblock handling.
There in lies an Achilles' of dmraid that I am interested in. If there isn't a
specific kernel level workaround to temporarily disable a dmraid array to check
and correct any normally easily correctable 'disk' errors such as bad blocks,
etc.., then that means a dmraid setup will suppress/prevent the correction of
disk errors on each 'disk' in the array allowing simple correctable errors to
propagate or grow into multiple compounded errors resulting in disk
deterioration to the point of data loss.
Whoa! Hold on.

The posibility of using fsck to mark badblocks is _NOT_ used on 
contemporary hard disks.

Period :-)

Badblocks are left to be managed by the hard disk firmware internally and 
transparently, when attemting to write to a known (by the HD) bad block.
...
Simply put, if small disk errors go uncorrected in the dmraid array and are
allowed to remain uncorrected and multiply, then dmraid has a gaping hole in its
robustness making it look far inferior to software raid.
I can't believe that it the way dmraid works. In the past, I have actually
preferred it to software raid because I can rebuild a new disk following a drive
failure and remake the array before I ever have to boot the operating system.
But, if you can never e2fsck -fcy /dev/sda, sdb, etc.. without first disabling
the array in the bios, you are basically playing Russian roulette with disk
errors on any single disk in the array.
If you do that, you will corrupt your filesystem, and not be able to 
reenable the array. Both images will be different, and, I guess, when the 
array is reenabled the newer copy will simply be overwritten to the older 
copy, including the badblocks; ie, bad blocks in side 0 will be copied and 
marked bad to side 1, even if there are none there. And perhaps, good 
blocks on 0 will overwrite bad blocks on 1.

Notice that fsck bad blocks works on the filesystem level, it does never 
see raid elements, which is correct. The raid, any type of raid, presents 
a unique device to the filesystem layer. If you disable the array and run 
fsck on the elements, fsck will work on the apparent filesystem, both 
sides will be different as a result, and the array will be corrupted in 
the end. As best, the newer side will overwrite the older side.

Do not attempt to correct badblocks from the operating system on an array; 
leave that to the disk firmware. Just use smartctl to trigger the long 
test (on each element), which includes a surface scan. If there are 
uncorrected sectors after that, the procedure would be to rewrite the 
affected sectors (or the entire disk).

- -- 
Cheers,
        Carlos E. R.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAktbVNwACgkQtTMYHG2NR9UHRwCcDDp2+4EkNQs2ceUgs6pHpY4O
2+oAnirvL/sfAYI0xhjexYbi1rbkblcV
=mJM4
-----END PGP SIGNATURE-----
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org