On Dec 7, 2007 7:09 AM, Greg Freemyer
On Dec 6, 2007 1:26 PM, Chris Worley
wrote: On Dec 5, 2007 1:50 PM, Greg Freemyer
wrote: <snip>
Single threaded access to a raid array may not be helped by adding drives. Drive access can end up being sequential and your not really buying anything.
Multi-threaded storage performance is definitely positively affected by adding disks to an array.
For multi-threaded, effectively each disk can do N IOPS (IOs per Second.)
So if you have M drives, you can do M*N IOPS.
The trouble with Raid 5 is that it typically requires 4 IOs to update a single sector.
ie. Read checksum, Read original sector, (so you can remove it from the checksum) write updated sector write new checksum.
So it ends up being M*N / 4 IOPS.
Greg,
Doesn't that assume a sector/block mismatch? If your sectors and blocks are aligned (sectors are some multiple of blocks), then no read-mask-write is necessary.
Even if there is a misalignment, if the amount of data being written is large, the read-mask-write operation is only at the beginning and tail ends of the entire operation.
The above does not assume misalignment. It think what your talking about is if your are doing a large write that spans the entire raid5 stripe, then the existing parity data can be ignored. Linux is smart enough to do this, but raid5 stripes are pretty large. Typically 64K * (M - 1) I believe. So if you have a 5-disk raid 5, your entire stripe is 256KB. And that ignores alignment issues you mention, that means to guarantee a full stripe is written you need to write 512KB at a time. Not many programs do that from user space. I'm not sure how efficient the Linux kernel is a coalescing individual sequential writes to a raid5 array and trying to create full stripe updates.
Granted: in my line of work, an app doing a single 1MB read/write call is small; anything smaller would be too trivial to mention.
Also, the writes are all in parallel. The above makes it sound like the writes of updated stripes, and the write of the checksum are serial... they should all be posted nearly simultaneously (some serialization introduced by the CPU).
That above is a max throughput calculation, not an individual write calculation. ie. Which is faster a sports car or a sem-itruck. The semi-truck is if you have lots to move, so it effectively has a higher throughput than a sports car. (but nowhere near as fast for small loads).
So the above assumes a busy server with lots going on. ie every disk in the array is running at full capacity. The IOPS is obviously effected by the workload and the seeking, but once the workload is set, the IOPS per disk can be characterized and used to feed the equation.
So from a performance perspective on _writes_ you need at least a 4 drive array just to be as fast as a single disk.
Reads OTOH just need to read the sector they want (unless you have a failed drive).
So _read_ performance is M*N. Or always faster than a single drive.
On a RAID5 you only need M-1 (or M-2 for RAID6) completions of parallel operations... you can discard the slowest disks results, as that can be recreated without all the data.
No idea what you meant there. In a non-degraded raid5 every drive has valid, non-parity data on it. If you have a heavy multi-threaded read load, all disks can be actively providing valid data at one time. i.e M * IOPS
If "M" is the number of disks, and you are, for example, reading 1 stride, then, in a RAID5, you only need to get the stripes from M-1 disks, and you can complete the single stride I/O w/o having yet received the Mth stripe, which you can discard when it shows up. Chris
Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org