Greg Freemyer wrote:
Depends... you can have a background task run and read through the entire disk. If anything comes up as different between the two copies, you'd know -- at least that's what my lsi card does about once a week....
---
I don't know if your also card does this, but mdraid does better than that.
If the scan detects a media error, mdraid recreates the data from via raid1/5/6 redundant raid members and re-writes the bad sector. The drive itself should use the rewrite of a known bad sector to force a sector reallocate, thus the member drive is restored from silently degraded to "perfect" in short order.
---- Practical issue: you can't do RAID5 or RAID6 with 2 disks, which is what I thought the original poster had. Secondly when you remap sectors, it slows down the drive -- so that drive won't be within 'tolerance' for hardware RAID, so you'd need to replace the drive most likely (if you had a hot spare in the machine, it could immediately start a rebuild). Third -- RAID1 is safer than RAID6 for the same # of total disks in all cases w/same # of even spindles (need an even number for RAID1, obviously). If you only have 3 disks total, you could probably do RAID6 and it would be safer, but 1 data + 2 parity...ouch. But the 1st minimum size demonstrates the problem (and it gets worse w/more disks). With min number of spindles that will work for both: 4, RAID6 would have 2 data disks and 2 parity. while raid10 would have 2 data disks and 2 mirrors. If your failure rate is 'x', then the no-fail rate is 1-x per disk. In the raid 6, if any of the other 3 disks fail (in addition to 1st failure), the raid is toast. With a RAID10, there is only 1 disk that can cause the whole array to fail (the mirror of the one that went bad). If you get a 2nd fail on another pair, you can still rebuild. With only 1 disk "exposed" on RAID10, the % chance of the whole RAID failing is the %chance of 1 disk failing. However, w/raid6, there are 3 disks that are critical to rebuild the bad one -- their "no fail case is (1 - x)**3. I.e. say it was a 1% fail, 99% nofail. In RAID10 chances of whole RAID being safe would be 99% (the no fail rate for the 1 partner of the one that failed). In RAID 6, though it's .99**3 -- or .970. In that best case RAID6 has 3x the 'total fail' chances as the RAID10's. If you have 8 total spindles (6 data/2parity for raid6 vs. 4data+4mirror for raid10), then 1 disk goes. For RAID10, the total fail case only happens if the failed-drive's pair go's so it's still 99% chance of not failing. Vs. RAID6, .99**7 == .932) -- or a almost a 7% chance of a total fail vs. RAID10's 1% chance. The next point up is moving from 512-byte sect disks to 4k... It takes less space for a more power ECC on 4k than the one on 512: (from a Hitachi brief:) .... The second benefit is that a larger and more powerful error correction code (ECC) can be utilized, providing better integrity of user data....
Only if you are using media scrubbing routinely can you have confidence that the member drives go from "perfect" to failed without a silently degraded state in the middle.
LSI defaults them to once/week.
When TB size drives first hit the market, a lot of raid5 rebuilds were failing due to the silent bad sector degradation. That's when the community started strongly urging everyone to either use media scrubbing or raid-6. (Raid 6 has 2 redundant drives so it can rebuild a failed member drive even in the presence of media errors on the surviving members).
---- But w/raid6 after 1 drive fails the chances of non-failure have to be multiplied by # drives left. With RAID10, it's a constant. (Combinatorics) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org