[opensuse] Write time performance - slightly out of topic
Hi I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s. When I write only 330 MB, the write time using Round Robin real-time scheduler is about 1.5 s, almost two times better than using the default scheduler. If I use 10 times 330 MB data and then I average the write time I get about 7 s regardless of the scheduling policy. My first thought is that the disk defragmentation should be held responsible for such performance penalty. Is this the right explanation ? Are there ways to improve the write time when dealing with large amounts of data ? thanks -- Bogdan Cristea http://sites.google.com/site/cristeab/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday November 15 2010, Bogdan Cristea wrote:
Hi
I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s. When I write only 330 MB, the write time using Round Robin real-time scheduler is about 1.5 s, almost two times better than using the default scheduler. If I use 10 times 330 MB data and then I average the write time I get about 7 s regardless of the scheduling policy. My first thought is that the disk defragmentation should be held responsible for such performance penalty. Is this the right explanation ? Are there ways to improve the write time when dealing with large amounts of data ?
The CPU scheduling is probably not the primary issue. The short burst is completing more quickly because the data is simply being transferred to the in-memory disk sector cache. Once you overflow that cache (with the long burst), the kernel is forced to wait for physical I/O operations to complete so those cache blocks can be reassigned to new data from your application.
Bogdan Cristea
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Bogdan Cristea said the following on 11/15/2010 09:13 AM:
Hi
I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s.
Either use the raw disk, not the file system, so that the cache doesn't get in the way, or use a Solid State Disk flush often :-) -- "Be brave enough to live creatively. The creative is the place where no one else has ever been. You have to leave the city of your comfort and go into the wilderness of your intuition. You can't get there by bus, only by hard work, risking, and by not quite knowing what you"re doing. What you"ll discover will be wonderful: yourself." -- Alan Alda. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2010-11-15 at 09:38 -0500, Anton Aylward wrote:
Either use the raw disk, not the file system, so that the cache doesn't get in the way, or use a Solid State Disk flush often :-)
Raw disk on an independent drive and cable, sequential write. Or distributed writes on several (non system) disks. Each dump should be smaller that the ondisk ram. Solid state disk... I guess it would be pretty expensive, more than a hundred megs per second. Flash memory no, it is slow continuous write. - -- Cheers, Carlos E. R. (from 11.2 x86_64 "Emerald" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkzhn0EACgkQtTMYHG2NR9X/jACfcagNacvXQqSidcczYEHGFjRu S9QAn1gnQsIgu+hobChVLIxthmBkBDj9 =AhrD -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
2010/11/15 Bogdan Cristea <cristeab@gmail.com>:
Hi
I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s. When I write only 330 MB, the write time using Round Robin real-time scheduler is about 1.5 s, almost two times better than using the default scheduler. If I use 10 times 330 MB data and then I average the write time I get about 7 s regardless of the scheduling policy. My first thought is that the disk defragmentation should be held responsible for such performance penalty. Is this the right explanation ? Are there ways to improve the write time when dealing with large amounts of data ?
thanks -- Bogdan Cristea http://sites.google.com/site/cristeab/
Making your disk subsystem faster?, what disks are you using? Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
I have a HP Pavilion Slimline s7605.fr PC. I don't know exactly the hard disk type, anyway this is not the system I have to use for real time processing. I have done some preliminary tests to see what should be set up for a fast disk access. Thank you everyone for your help. regards Bogdan On Mon, Nov 15, 2010 at 4:59 PM, Ciro Iriarte <cyruspy@gmail.com> wrote:
2010/11/15 Bogdan Cristea <cristeab@gmail.com>:
Hi
I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s. When I write only 330 MB, the write time using Round Robin real-time scheduler is about 1.5 s, almost two times better than using the default scheduler. If I use 10 times 330 MB data and then I average the write time I get about 7 s regardless of the scheduling policy. My first thought is that the disk defragmentation should be held responsible for such performance penalty. Is this the right explanation ? Are there ways to improve the write time when dealing with large amounts of data ?
thanks -- Bogdan Cristea http://sites.google.com/site/cristeab/
Making your disk subsystem faster?, what disks are you using?
Regards,
-- Ciro Iriarte http://cyruspy.wordpress.com -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- Bogdan Cristea -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, Nov 15, 2010 at 6:13 AM, Bogdan Cristea <cristeab@gmail.com> wrote:
Hi
I intend to use openSUSE for a real time application which requires to write from memory to hard disk 330 MB of data continuously in about 2.56 s.
That's about 130 MB/sec. That's really fast. If bursting it to kernel cache counts, you can do it with a single disk and plenty of RAM. If you really need to write to continuously at that speed, you need to really pay attention to your storage system design. I don't believe you will find any SATA or SSD standalone drives that have the ability to run that fast continuously, even with purely sequential writes. You may find a SAS drive that can. I don't know their performance criteria very well, but I'd be surprised to see that speed even with SAS drives. More realistically, you will need to create a raid 0 with 3 or 4 disks. The cost is not that high, and you should get good performance. (A 4-disk raid-0 should theoretically be 4x as fast as a single disk.) Note that a 4 disk raid 0 is 4 times as unreliable as a single disk. So if reliability is a concern, now you need to create a 8-disk raid 10. At $100 per disk, that's less than $1K, so not a bad option. Obviously you need a MB with 8-ports (or more) to make that doable.
When I write only 330 MB, the write time using Round Robin real-time scheduler is about 1.5 s, almost two times better than using the default scheduler.
I'm willing to bet your benchmark is not timing the full write process. ie. It is not forcing cache's to flush prior to calling the writes done.
If I use 10 times 330 MB data and then I average the write time I get about 7 s regardless of the scheduling policy.
Yes, now the caching is no longer giving you falsely fast results.
My first thought is that the disk defragmentation should be held responsible for such performance penalty. Is this the right explanation ?
No, 130 MB/sec is simply not achievable with a single disk.
Are there ways to improve the write time when dealing with large amounts of data ?
As I said, you need raid. Note that for sequential writes rotating disk should be just as fast as SSD, so don't spend money on that unless you have benchmarks to show its faster.
thanks -- Bogdan Cristea http://sites.google.com/site/cristeab/ --
Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2010-11-15 at 14:00 -0800, Greg Freemyer wrote: ...
More realistically, you will need to create a raid 0 with 3 or 4 disks. The cost is not that high, and you should get good performance. (A 4-disk raid-0 should theoretically be 4x as fast as a single disk.)
Note that a 4 disk raid 0 is 4 times as unreliable as a single disk. So if reliability is a concern, now you need to create a 8-disk raid 10. At $100 per disk, that's less than $1K, so not a bad option.
That would also slow things down. Instead, after the process finishes, copy the data to a backup, or process it, or whatever. I think would instead write raw data to several disks, in burst of disk cache sizes, using my code, not a raid, so that if one disk is damaged I can reconstruct the rest with holes. Anyway, if one disk breaks during the process, the kernel would be stuck for some time till it puts that disk out of commission, so the data capture session would fail, anyway, so I don't think a raid 8 is an advantage here. If security is needed, use two computers in parallel.
No, 130 MB/sec is simply not achievable with a single disk.
Telcontar:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 12930 MB in 2.00 seconds = 6471.97 MB/sec Timing buffered disk reads: 370 MB in 3.01 seconds = 123.02 MB/sec Yes, the maximum I get is 126, I suppose better hardware can get more speed. - -- Cheers, Carlos E. R. (from 11.2 x86_64 "Emerald" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkzh1MMACgkQtTMYHG2NR9Vx+ACfZNdXto74yhPbQOciVxGOwuot 3CwAoIOvaAbja09qYnE6Q51auyWr90gm =Oo4v -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
El 15/11/10 21:47, Carlos E. R. escribió:
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
/dev/sdb: Timing cached reads: 17492 MB in 2.00 seconds = 8754.09 MB/sec Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec ;-) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/15/2010 05:11 PM, Cristian Rodríguez wrote:
El 15/11/10 21:47, Carlos E. R. escribió:
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
/dev/sdb: Timing cached reads: 17492 MB in 2.00 seconds = 8754.09 MB/sec Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec
/dev/md126: Timing cached reads: 20404 MB in 2.00 seconds = 10210.58 MB/sec Timing buffered disk reads: 1582 MB in 3.00 seconds = 527.19 MB/sec Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
El 15/11/10 22:25, Lew Wolfgang escribió:
/dev/md126: Timing cached reads: 20404 MB in 2.00 seconds = 10210.58 MB/sec Timing buffered disk reads: 1582 MB in 3.00 seconds = 527.19 MB/sec
What hardware is this ? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 11/15/2010 05:29 PM, Cristian Rodríguez wrote:
El 15/11/10 22:25, Lew Wolfgang escribió:
/dev/md126: Timing cached reads: 20404 MB in 2.00 seconds = 10210.58 MB/sec Timing buffered disk reads: 1582 MB in 3.00 seconds = 527.19 MB/sec What hardware is this ?
It's a new Super-Micro desktop. I've got the full particulars at my office, but for now here's what I recall: CPU: One-each Intel Xeon W3680 @ 3.33GHz (top shows 12 cpus, with threading turned on) RAM: 12-GB RAID: Super-Micro (Intel compatible) Disk: Two-each Intel SSD SSDSA2M040G2GC 40-GB disks configured as RAID-0, 128-KB stripe size I have no idea if hdparm -tT is valid with a raid, I just thought I'd throw the number out. It also has a 2-TB Seagate Constellation drive, which shows: /dev/sda1: Timing cached reads: 19990 MB in 2.00 seconds = 10003.89 MB/sec Timing buffered disk reads: 380 MB in 3.00 seconds = 126.64 MB/sec This box is a bit of a test case, with the OS (openSuSE 11.3) and swap on the SSD Raid-0. It's working well so far. I don't really care about the SSD "trim" issue, since OS support doesn't work with RAIDs yet. I've got another identical box with the SSD RAID-0 configured with 4-KB stripes. It shows: /dev/md126: Timing cached reads: 21024 MB in 2.00 seconds = 10521.17 MB/sec Timing buffered disk reads: 1160 MB in 3.00 seconds = 386.62 MB/sec So it looks like the larger stripe sizes are better with SSDs, if you can believe hdparm. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2010-11-15 at 22:11 -0300, Cristian Rodríguez wrote:
El 15/11/10 21:47, Carlos E. R. escribió:
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
/dev/sdb: Timing cached reads: 17492 MB in 2.00 seconds = 8754.09 MB/sec Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec
;-)
Raid 0? - -- Cheers, Carlos E. R. (from 11.2 x86_64 "Emerald" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkziagQACgkQtTMYHG2NR9XWbgCglZ5OAdlsqi2T0bgqZkg3WmNQ LTEAnj8f2wHYS+FkkEkNqH/xxMQKJhb0 =ly6k -----END PGP SIGNATURE-----
On Tue, Nov 16, 2010 at 3:24 AM, Carlos E. R. <robin.listas@telefonica.net> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Monday, 2010-11-15 at 22:11 -0300, Cristian Rodríguez wrote:
El 15/11/10 21:47, Carlos E. R. escribió:
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
/dev/sdb: Timing cached reads: 17492 MB in 2.00 seconds = 8754.09 MB/sec Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec
;-)
Raid 0?
- -- Cheers, Carlos E. R.
Carlos, You made mention of Raid 8 earlier in this thread, which I've never heard of. I suspect you think a Raid 5 is any raid made of 5 disks. That's not how it works. You might find the Wikipedia article worth perusing. http://en.wikipedia.org/wiki/RAID#Standard_levels Raid 0, 1, 5, 6 are the most common. And you can combine them, so a Raid 1 + 0 is a mirrored set of striped disks (or is it a striped set of mirrored disks?). People get lazy, so they often call that a raid 10. Wiki to the rescue again: http://en.wikipedia.org/wiki/RAID#Nested_.28hybrid.29_RAID As to an earlier comment of yours that a Raid 1 + 0 made of 8 disks would be slower than a Raid 0 made of 4 disks, that can be true, but in theory the actual writing is done in parallel, so the slow down should be negligible. And on the read side, you have 2 sources for each piece of data, so in theory you can read from a raid 1 + 0 twice as fast as from a raid 0. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2010-11-16 at 12:24 -0800, Greg Freemyer wrote:
On Tue, Nov 16, 2010 at 3:24 AM, Carlos E. R. <> wrote:
Carlos,
You made mention of Raid 8 earlier in this thread, which I've never heard of.
Opps! I intended to say raid 10, made of 8 disks, which is what you proposed.
I suspect you think a Raid 5 is any raid made of 5 disks.
No, never.
As to an earlier comment of yours that a Raid 1 + 0 made of 8 disks would be slower than a Raid 0 made of 4 disks, that can be true, but in theory the actual writing is done in parallel, so the slow down should be negligible.
The bus would be used double time. He has to write all that data, which has to enter the system from somewhere, and perhaps treated. It is a lot of data. If redundancy is needed, better use two computers, which has the advantage of having really dual hardware. If one disk of the raid 10 fails during the process, there will be a severe hiccup till it is removed.
And on the read side, you have 2 sources for each piece of data, so in theory you can read from a raid 1 + 0 twice as fast as from a raid 0.
Yes, but there has been no mention yet of that requirement :-) - -- Cheers, Carlos E. R. (from 11.2 x86_64 "Emerald" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkzjHAoACgkQtTMYHG2NR9Xr/QCfZGtDc5gSFnSqAGID4hZjELUI Ru4AnizSYSpgwdNP/5fCwvYMRUctMfeL =ZYOE -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, Nov 16, 2010 at 4:04 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tuesday, 2010-11-16 at 12:24 -0800, Greg Freemyer wrote:
On Tue, Nov 16, 2010 at 3:24 AM, Carlos E. R. <> wrote:
Carlos,
You made mention of Raid 8 earlier in this thread, which I've never heard of.
Opps! I intended to say raid 10, made of 8 disks, which is what you proposed.
I suspect you think a Raid 5 is any raid made of 5 disks.
No, never.
As to an earlier comment of yours that a Raid 1 + 0 made of 8 disks would be slower than a Raid 0 made of 4 disks, that can be true, but in theory the actual writing is done in parallel, so the slow down should be negligible.
The bus would be used double time. He has to write all that data, which has to enter the system from somewhere, and perhaps treated. It is a lot of data. If redundancy is needed, better use two computers, which has the advantage of having really dual hardware. If one disk of the raid 10 fails during the process, there will be a severe hiccup till it is removed.
Severe hiccup? Not really. Assuming mdraid 1+0, if a drive is marked failed, then it would simply not be written to. No hiccup at all. And if it is marginal and occasionally failing on writes, mdraid should identify it rather quickly and mark it as failed. ie. mdraid is not very tolerant of drive issues. If a drive is reporting occasional errors, it gets hard failed almost immediately. FYI: Are you aware of "Enterprise SATA" drives? They sound more robust, but in reality they are less robust by design. It's basically just a different set of firmware. Their firmware is specifically coded to "fail fast". Thus if you read from a enterprise drive and it gets an internal CRC error due to a media issue, it will immediately fail that back to the kernel which should immediately retry the read from the other mirror half. Thus the failover is millisecs. If you use standard desktop drives with standard read retry logic, it can take as long as 30 seconds for media error to trigger a drive failure. Not good. With both desktop and enterprise firmware, I think the latest mdraid code will then re-write the data to the drive that reported the error, but with the enterprise drives it all happens very quickly and so it should not be much of a hiccup at all. fyi2: media errors never happen on write. The drive simply seeks to the right place and starts writing. The drive does not attempt to read/verify that data, so a write media error is not generated even if the media has a bad spot where the write went to. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
I just saw that Drobo has announced a USB-3 external raid enclosure in the last day or two. USB-3 is a 4.8 Gbit/sec interface, so that should be able to deliver over 200MB/sec of real world speed. The empty chassis in $800 I think. You can get a USB-3 add-in card for about $50, so for under $900 you have a really nice solution. (You still have to buy drives for the chassis. I'd go for the enterprise ones mentioned in my last post.) Carlos, I think this is the ultimate openSUSE team member give away! Forget stickers. "Join the openSUSE team and get a free USB-3 Drobo!" ;) Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
El 16/11/10 08:24, Carlos E. R. escribió:
/dev/sdb: Timing cached reads: 17492 MB in 2.00 seconds = 8754.09 MB/sec Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec
;-)
Raid 0?
No, Intel SSD 80GB 2G. it kinda rocks. ;) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
No, 130 MB/sec is simply not achievable with a single disk.
Telcontar:~ # hdparm -tT /dev/sdc
/dev/sdc: Timing cached reads: 12930 MB in 2.00 seconds = 6471.97 MB/sec Timing buffered disk reads: 370 MB in 3.01 seconds = 123.02 MB/sec
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
Not for SATA: According to Tom's Hardware the fastest SATA disk is 127 MB/SEC. http://www.tomshardware.com/charts/3.5-hard-drive-charts/Maximum-Write-Trans... But that is measured at the fastest part of the disk (the outer cylinders). As you move towards the center, you see a significant drop off because the rpms is fixed, but the data per rotation drops off significantly near the spindle. In the real world, I have never seen a drive run at faster than 6GB/min for any extended time. That's 100MB/sec. I suppose if you only need a few GB of capacity and you restrict your usage to the "start" of the disk then you might get close to 130MB/sec for real. ie. cylinder 0 is on the outer edge. ==> SAS Tom's says they are indeed faster: http://www.tomshardware.com/charts/enterprise-hard-drive-charts-2010/Streami... So the OP could look into a Cheetah and see if it will work. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2010-11-15 at 19:31 -0800, Greg Freemyer wrote:
Timing buffered disk reads: 370 MB in 3.01 seconds = 123.02 MB/sec
Yes, the maximum I get is 126, I suppose better hardware can get more speed.
Not for SATA:
According to Tom's Hardware the fastest SATA disk is 127 MB/SEC.
Ah, then I'm happy, I'm getting the best possible :-)
http://www.tomshardware.com/charts/3.5-hard-drive-charts/Maximum-Write-Trans...
But that is measured at the fastest part of the disk (the outer cylinders).
I made some measurements, and mine is faster at 1/3.
As you move towards the center, you see a significant drop off because the rpms is fixed, but the data per rotation drops off significantly near the spindle.
Ah, that's why.
==> SAS
Tom's says they are indeed faster:
Well, that's what I meant by better hardware :-) - -- Cheers, Carlos E. R. (from 11.2 x86_64 "Emerald" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkziadcACgkQtTMYHG2NR9XwqACeL9g7wq/7VXmsO80c3SvIu/eL F6EAn0SjTz0dOO2+xwTHrC7ZDsI7Dlbi =sG4s -----END PGP SIGNATURE-----
participants (8)
-
Anton Aylward
-
Bogdan Cristea
-
Carlos E. R.
-
Ciro Iriarte
-
Cristian Rodríguez
-
Greg Freemyer
-
Lew Wolfgang
-
Randall R Schulz