[opensuse] Disk i/o's needed for raid 5 [WAS: duplicating current drive to new drive via usb and dd?]
On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R.
On 2014-07-09 20:00, Greg Freemyer wrote:
fyi: Raid 5 and 6 have the same issue, but the most efficient write size is the size of a full raid stripe. XFS as an example will try to structure it's writes to be full stripes when working with Raid. Thus a full stripe write becomes: write all data and parity chunks.
A partial stripe write becomes:
- read data about to be overwritten, - read old parity info. - Calculate new parity info. - - Write new data, - write new parity info
Thus a single partial stripe write to a raid 5 requires at least 4 i/o operations. For a raid 6, it is a minimum of 5 i/o's. Having the filesystem invoke properly aligned full stripe writes is significantly more efficient.
Doh! So that's why!
And you mean that number of i/o per disk on the raid? No, that can not be. It has to be one read per disk, that is, 3 reads (which should happen simultaneously). Calculate parity. Do 3 writes, one per disk (which should also happen simultaneously).
Mmm... it does not match what you say, so I must be getting it wrong :-?
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math. == Beware: math below == Lets talk about a raid 5 with 5 disks with one of the stripes laid out as: D1, D2, D3 ,D4, P By definition P = D1 ^ D2 ^ D3 ^ D4 (that's just how raid 5 works) ^ is the xor operator as defined in the c programming language but applied to an entire stride's worth of bytes. If I want to change the data on D2, then I can back it out of the calculation by: P ^ D2 = (D1 ^ D2 ^ D3 ^ D4) ^ D2 because of the way ^ works that can be simplified to P ^ D2 = D1 ^ D3 ^ D4 (ie. the D2 xor operations effectively cancel themselves out). Now if I call the new D2 data D2n I can write a new equation as: (P ^ D2) ^ D2n = (D1 ^ D3 ^ D4) ^ D2n or by simple math P ^ D2 ^ D2n = D1 ^ D2n ^ D3 ^ D4 ==== Alright time to talk about disks. We know before updating D2, this is true: P = D1 ^ D2 ^ D3 ^ D4 And after updating D2 this must be true: Pn = D1 ^ D2n ^ D3 ^ D4 The obvious approach is to read D1, D3 and D4 calculate Pn. That means 3 reads and 2 writes or 5 i/o operations. Put lets do some math to that last equation: Pn = D1 ^ D2n ^ D3 ^ D4 Pn = (D1 ^ D3 ^ D4) ^ D2n But remember from the earlier math we know: P ^ D2 = D1 ^ D3 ^ D4 So let's replace (D1 ^ D3 ^ D4) with P ^ D2, we know have: Pn = (P ^ D2) ^ D2n Note that only requires 2 of the old data values. So what raid system does is it reads the original P stride and the original D2 stride. Then it xor's them together to remove the influence of the old D2 value. Then it xor's in the new D2n stride to calculate the new Pn. The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n). The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes. ===== I don't actually know how raid 6 works, so I can't do the same walk thru, but my understanding is a single data stride update with raid 6 involves 3 reads (P1, P2, D2) and 3 writes (P1n, P2n, D2n). The rest of the data strides don't have to be read to do the calculations. ===== That was a fun exercise. I hope at least a couple of people learned something. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-07-09 22:44, Greg Freemyer wrote:
On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R.
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math.
:-} !
== Beware: math below ==
Ok... My maths are rusty, but I understand.
==== Alright time to talk about disks.
...
The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n).
I see...
The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes.
I thought that in the Linux raid 5, there is not a fixed or dedicated parity disk, but that a stride may be in one disk, the next on another. That should distribute the load, and not force to always read write to the P disk, on all ops.
===== I don't actually know how raid 6 works, so I can't do the same walk thru, but my understanding is a single data stride update with raid 6 involves 3 reads (P1, P2, D2) and 3 writes (P1n, P2n, D2n).
The rest of the data strides don't have to be read to do the calculations.
=====
That was a fun exercise. I hope at least a couple of people learned something.
I did. As long as there is not a quiz coming ;-) Thanks :-) -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
В Wed, 09 Jul 2014 23:29:17 +0200
"Carlos E. R."
I thought that in the Linux raid 5, there is not a fixed or dedicated parity disk, but that a stride may be in one disk, the next on another.
That is definition of RAID5, unrelated to Linux.
That should distribute the load, and not force to always read write to the P disk, on all ops.
And RAID with single parity disk is RAID4.
On July 9, 2014 5:29:17 PM EDT, "Carlos E. R."
On 2014-07-09 22:44, Greg Freemyer wrote:
On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R.
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math.
:-} !
== Beware: math below ==
Ok...
My maths are rusty, but I understand.
==== Alright time to talk about disks.
...
The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n).
I see...
The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes.
I thought that in the Linux raid 5, there is not a fixed or dedicated parity disk, but that a stride may be in one disk, the next on another. That should distribute the load, and not force to always read write to the P disk, on all ops.
What I was trying to say is if you have a 10 disk raid 5 then one specific stripe might look like: D1 D2 D3 D4 P D5 D6 D7 D8 D9 A write to D2 would only require a read of D2 and P, then a write of D2n and Pn. That D2 and P are not on fixed disks is an unrelated truth.
===== I don't actually know how raid 6 works, so I can't do the same walk thru, but my understanding is a single data stride update with raid 6 involves 3 reads (P1, P2, D2) and 3 writes (P1n, P2n, D2n).
The rest of the data strides don't have to be read to do the
calculations.
=====
That was a fun exercise. I hope at least a couple of people learned
something.
I did.
As long as there is not a quiz coming ;-)
Thanks :-)
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote:
On 2014-07-09 22:44, Greg Freemyer wrote:
On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R.
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math.
:-} !
== Beware: math below ==
Ok...
My maths are rusty, but I understand.
==== Alright time to talk about disks.
...
The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n).
I see...
The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes.
I thought that in the Linux raid 5, there is not a fixed or dedicated parity disk, but that a stride may be in one disk, the next on another. That should distribute the load, and not force to always read write to the P disk, on all ops.
That's RAID 6. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On July 9, 2014 11:54:59 PM EDT, Dirk Gently
Carlos E. R. wrote:
On 2014-07-09 22:44, Greg Freemyer wrote:
On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R.
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math.
:-} !
== Beware: math below ==
Ok...
My maths are rusty, but I understand.
==== Alright time to talk about disks.
...
The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n).
I see...
The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes.
I thought that in the Linux raid 5, there is not a fixed or dedicated parity disk, but that a stride may be in one disk, the next on another. That should distribute the load, and not force to always read write to the P disk, on all ops.
That's RAID 6.
Raid 6 has 2 parity drive and can survive the failure of 2 strides out of a single stripe. That is important if working with large drives (1 TB) because with raid 5 it is relatively common to have a read error with one of the drives during rebuild. With raid 5 some arrays abort the rebuild as soon as they hit a single read error. Raid 6 will continue on. Thus I think of raid 5 as surviving a single drive failure, but not being tolerant of localized sector read errors. I think of raid 6 as being able to survive a single drive error while simultaneously handling localized sector read errors on the surviving disks. Greg -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 10/07/2014 13:21, Greg Freemyer a écrit :
Thus I think of raid 5 as surviving a single drive failure, but not being tolerant of localized sector read errors.
I think of raid 6 as being able to survive a single drive error while simultaneously handling localized sector read errors on the surviving disks.
important because when ine disk fails, the opthers may have also silently done RAID is only a hotplug thing to be even more secure, not the universal secure system some people think (and that do not exist :-() jdd -- http://www.dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 09/07/2014 22:44, Greg Freemyer a écrit :
That was a fun exercise. I hope at least a couple of people learned something.
Greg
for sure, thanks :-) jdd -- http://www.dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Very interesting and clearly explained.
Thanks Greg.
On July 9, 2014 1:44:49 PM PDT, Greg Freemyer
Pn = (P ^ D2) ^ D2n
Note that only requires 2 of the old data values.
So what raid system does is it reads the original P stride and the original D2 stride. Then it xor's them together to remove the influence of the old D2 value. Then it xor's in the new D2n stride to calculate the new Pn.
The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n).
The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes.
===== I don't actually know how raid 6 works, so I can't do the same walk thru, but my understanding is a single data stride update with raid 6 involves 3 reads (P1, P2, D2) and 3 writes (P1n, P2n, D2n).
The rest of the data strides don't have to be read to do the calculations.
=====
That was a fun exercise. I hope at least a couple of people learned something.
Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Andrey Borzenkov
-
Carlos E. R.
-
Dirk Gently
-
Greg Freemyer
-
jdd
-
John Andersen