On 21/12/17 19:17, cagsm wrote:
Thanks for all the replies in this thread, one final question about fault resistance on file system formats against block errors on the same physical disk. Is there no fsutil parameter when creating say ext file systems or any other non-complicated file system these days that for example would in their most simple for write two bytes instead of just one for every byte or similar foolish ideas I can come up with just right now. Two bytes consecutively or two bytes even randomly placed on the physical disk (but then you would need some kind of look-up map or directory for that again I guess). You get the idea. Filesystem tweak or fine tuning for writing redundancy onto this disk for better block error resiliency. Thanks for any hints and ideas.
Don't bother? At the disk level, bear in mind that a disk nowadays is a small computer in its own right. In the old (pre-ATA - that's your old parallel interface) days, the kernel (or rather the driver) would explicitly tell your drive which Cylinder, Head and Sector to use. Now in the days of LBA, stuff gets moved around to avoid bad spots and the drive has all sorts of error correction built in. If it gives up then it's likely either it's hit a manufacturing defect, or your platters are beginning to disintegrate. As an example of that error correction, I remember the company I worked for buying a HUGE (800MB - that's not huge nowadays!) drive. It had the frontage of a full-height 5.1/4 drive (modern DVDs are half-height) and was about 2 foot deep. One thing I picked up was its error correction involved writing two bytes to disk for every byte the computer asked. I don't remember the details, but if you had a single-bit-flip error it could work out whether the data byte or check byte was wrong, and correct it. If you had a double-bit-flip error there was a 90% chance it could work it out. Modern drives almost certainly have that. If you've got raid-6 you can recover from any single-disk corruption/failure - just make sure you run regular scrubs to detect it. And when you look at filesystems, check to see whether they protect EVERYTHING, or just the metadata. Most kernel/filesystem developers seem to concentrate on filesystem metadata, reasoning that the most important thing is to get the computer back up and running asap. imho that's actually arse-about-face - there's no point being able to boot the computer quicker (getting it back to ops staff), if they then have to run a data integrity check before giving it back to the users! Just look for a filesystem that does a checksum or similar on the *data* so it can detect corruption. You'll probably have to switch it on because it will damage performance and be disabled by default. I want to make that an option for raid, so that does an integrity check and will return a read error if there's an integrity failure. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org