Re: [opensuse] Raid5/LVM2/XFS alignment
On Jan 30, 2008 5:41 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
Thanks Neil,
I did not think about the on disk cache when I set the count. You should also do a sync after called dd and include that in your timing. There may even be a way from user space to tell the drives to flush their caches. hdparm?
Isn't that usually done when shutting down? I'd imagine it could cause a lot of trouble when a pc shuts down with some data in the drive cache. I dunno much about these things, but to find a program that forces the driver caches to be flushed I'd look in the shutdown scripts
As to the block size, dd will invoke a kernel write call for each block. In theory the kernel can coalesce those into bigger blocks, so there is not an easy way to say what is being sent to the disk. But the kernel should not be breaking down individual writes I don't think.
So as I put in another e-mail, if dd is called with bs = 3x chunksize (for a 4-disk raid5), and the writes are stripe aligned, then the kernel has the ability to fully optimize the parity calculation. And parity calculation is by far the biggest performance issue related to raid5.
Greg
On Jan 30, 2008 6:12 AM, Neil <hok.krat@gmail.com> wrote:
On 1/28/08, Greg Freemyer <greg.freemyer@gmail.com> wrote:
On Jan 28, 2008 11:25 AM, Ciro Iriarte <cyruspy@gmail.com> wrote:
Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE 10.3@x86_64.
Regards, Ciro --
I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k)
You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel.
I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test..
I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time.
As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements.
So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k
And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc.
I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack.
Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Isn't there a minimum to the total transmitted data to get a reliable reading? Something like (4 (number of disks)-1(redundant disk)) * 16(MB of disk cache) wich would result in 48MB minimum send in this case. I belive this because all data send to a disk is first cached (as far as I know) in the disk cache (most high volume drives have 16 MB of it). Or does DD circomvent that?
Also I belive the dd block size should be above (4(number of disks)-1(redundand disk))*256k(array chunck size) = 768k. If you use a block size smaller than 256k the increased speed will not be visible: the system is still writing to 1 disk at a time! ie: bs=4k, so there are 64 blocks in each chunk What does dd do? It writes 64 blocks to disk 1, continues to write 64 blocks to disk 2, continues to write 64 blocks to disk 3, and so forth. (this is without paying mind to the redundancy).
Correct me if I am worng, but that's the way I used to test my old raid0 array
Neil
-- There are two kinds of people: 1. People who start their arrays with 1. 1. People who start their arrays with 0.
--
Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com
-- There are two kinds of people: 1. People who start their arrays with 1. 1. People who start their arrays with 0. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (1)
-
Neil