On 28/02/18 20:54, Greg Freemyer wrote:
On Tue, Feb 27, 2018 at 4:57 PM, Wols Lists <antlists@youngman.org.uk> wrote:
On 27/02/18 00:57, Greg Freemyer wrote:
Raid 61 would also be interesting. It seems to me the rebuild time on a raid 61 could be greatly faster than on just a 6 (or 60). That assumes the failed drive could just be copied over from the mirror pair of that drive.
Actually, it would be faster even than that. Do you know the difference between Raid-1+0, and linux md-raid-10? linux-10 has the disadvantage (at least from the developer's point of view) that the drives are mirrors of each other, and thus rebuilding one drive places a lot of stress on said mirror.
The point of the work I've spec'd is that the blocks are scattered according to a pseudo-random algorithm, such that there is no such mirror!
Unusual!
So if you have say 20 drives, with your raid-61 configured as 8,2, that would mean you have two logical 8-drive raid-6 arrays, mirrored. But the blocks are scattered at random across your 20 drives. So if a drive fails, let's say it's 10TB, the rebuild can copy 0.5TB from EVERY other drive, and rebuild the failed one.
Say what?
Putting thinking hat on!
Whoa, that is very cool if I have it right!
Somebody posted to the linux-raid list about a CRUSH algorithm, I think it was called. This enables you to spec local storage, different controllers, network storage etc, and ensure that blocks are scattered over all of them. The intent was that you could lose a controller, or a network link, or whatever, and still guarantee that a complete stripe of blocks could be found elsewhere. But I get the impression that it's computationally expensive - I wanted a simple algorithm that got you most of the benefits for a tiny fraction of the cost.
The standard algorithm would hammer one other drive and quite possibly tip that over the edge too.
The only snag with my algorithm is that, iirc, you can get a pathological failure if you don't have at least twice the drives. So an 8,2 setup might need 33 drives for the algorithm to work.
I'm confused here.
If the number of drives is high enough, it's easy to prove that the pathological setup cannot occur. Unfortunately, every simulation I've run with less than that IS pathological :-( (By that, I mean that a single drive failure could destroy all copies of some blocks :-(
Let's say I decide to be intentional about building a 80TB usable LV with your setup. If I use 10TB drives, does that mean I'd have to buy 33 x 10TB drives.At $400/drive, that's $13.2K just for the drives (chassis, controllers, etc not included). That seems like a lot of money for 80TB useable.
I'm trying to remember my maths. That's 8 drives for data plus 2 parity, twice. 20 drives. You would need either 21 or 41 drives. But 41 sounds wrong, it should certainly work with 31. It should be possible to do it with 21, maybe I just need to improve my algorithm.
Of course, if that's the case, it would fall back to a simpler algorithm, probably the one that leads to a mirror. Or at least for raid-6, it would know that if all copies of a block were stored on the one drive, it could rebuild that block from parity. But that's not a good idea :-(
I attach my test code. Have a play. Note that you need to make sure that the primes aren't pathological - they must not be a factor of any of the other numbers. Any queries I'll try to remember what I was doing and say. There should be an email from me on the raid list that explains it all, I'll hunt it up later, but it's now my bed time ... :-) Cheers, Wol