On Wed, Jul 9, 2014 at 2:42 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:
On 2014-07-09 20:00, Greg Freemyer wrote:
fyi: Raid 5 and 6 have the same issue, but the most efficient write size is the size of a full raid stripe. XFS as an example will try to structure it's writes to be full stripes when working with Raid. Thus a full stripe write becomes: write all data and parity chunks.
A partial stripe write becomes:
- read data about to be overwritten, - read old parity info. - Calculate new parity info. - - Write new data, - write new parity info
Thus a single partial stripe write to a raid 5 requires at least 4 i/o operations. For a raid 6, it is a minimum of 5 i/o's. Having the filesystem invoke properly aligned full stripe writes is significantly more efficient.
Doh! So that's why!
And you mean that number of i/o per disk on the raid? No, that can not be. It has to be one read per disk, that is, 3 reads (which should happen simultaneously). Calculate parity. Do 3 writes, one per disk (which should also happen simultaneously).
Mmm... it does not match what you say, so I must be getting it wrong :-?
To understand this we have to dig into some mathematical theory, so forget about reads and writes for a second and just focus on the math. == Beware: math below == Lets talk about a raid 5 with 5 disks with one of the stripes laid out as: D1, D2, D3 ,D4, P By definition P = D1 ^ D2 ^ D3 ^ D4 (that's just how raid 5 works) ^ is the xor operator as defined in the c programming language but applied to an entire stride's worth of bytes. If I want to change the data on D2, then I can back it out of the calculation by: P ^ D2 = (D1 ^ D2 ^ D3 ^ D4) ^ D2 because of the way ^ works that can be simplified to P ^ D2 = D1 ^ D3 ^ D4 (ie. the D2 xor operations effectively cancel themselves out). Now if I call the new D2 data D2n I can write a new equation as: (P ^ D2) ^ D2n = (D1 ^ D3 ^ D4) ^ D2n or by simple math P ^ D2 ^ D2n = D1 ^ D2n ^ D3 ^ D4 ==== Alright time to talk about disks. We know before updating D2, this is true: P = D1 ^ D2 ^ D3 ^ D4 And after updating D2 this must be true: Pn = D1 ^ D2n ^ D3 ^ D4 The obvious approach is to read D1, D3 and D4 calculate Pn. That means 3 reads and 2 writes or 5 i/o operations. Put lets do some math to that last equation: Pn = D1 ^ D2n ^ D3 ^ D4 Pn = (D1 ^ D3 ^ D4) ^ D2n But remember from the earlier math we know: P ^ D2 = D1 ^ D3 ^ D4 So let's replace (D1 ^ D3 ^ D4) with P ^ D2, we know have: Pn = (P ^ D2) ^ D2n Note that only requires 2 of the old data values. So what raid system does is it reads the original P stride and the original D2 stride. Then it xor's them together to remove the influence of the old D2 value. Then it xor's in the new D2n stride to calculate the new Pn. The end result is 2 simultaneous reads (P and D2) followed by 2 simultaneous writes (Pn and D2n). The cool part about that is works regardless of how many disks are in the raid 5 array. A single data stride update always requires exactly 2 reads and 2 writes. ===== I don't actually know how raid 6 works, so I can't do the same walk thru, but my understanding is a single data stride update with raid 6 involves 3 reads (P1, P2, D2) and 3 writes (P1n, P2n, D2n). The rest of the data strides don't have to be read to do the calculations. ===== That was a fun exercise. I hope at least a couple of people learned something. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org