[opensuse-factory] Btrfs under heavy write makes system sluggish-er than ext
So, the subject says it all. I've been testing a huge database on 13.1 RC1, well actually factory, with a btrfs partition (not the system partition, just a dedicated partition), and under heavy write it chokes the system much more than ext4 does. It also seems to have lower thoughput on what should be mostly parallel sequential access (ie: multiple sequential threads), but I'll have to perform more accurate benchmarks to be sure. It may be an fsync issue. It's not a critical issue, of course, but I'm in a position to assist with tests, benchmarks and diagnostics... so... what do I need to do to provide useful info here? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am 31.10.2013 20:18, schrieb Claudio Freire:
So, the subject says it all.
I've been testing a huge database on 13.1 RC1, well actually factory, with a btrfs partition (not the system partition, just a dedicated partition), and under heavy write it chokes the system much more than ext4 does.
It also seems to have lower thoughput on what should be mostly parallel sequential access (ie: multiple sequential threads), but I'll have to perform more accurate benchmarks to be sure. It may be an fsync issue.
It's not a critical issue, of course, but I'm in a position to assist with tests, benchmarks and diagnostics... so... what do I need to do to provide useful info here? It is known (e.g. [1], [2]), that btrfs has a performance problem with writes in large files - typical for databases and virtual machines. As long as those problems don't get fixed, I guess there is nothing you can do about it, if you want to use the features of btrfs.
Hendrik [1] https://wiki.archlinux.org/index.php/Btrfs#Copy-On-Write_.28CoW.29 [2] http://www.ilsistemista.net/index.php/linux-a-unix/36-btrfs-mount-options-an... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
El 31/10/13 16:18, Claudio Freire escribió:
It's not a critical issue, of course, but I'm in a position to assist with tests, benchmarks and diagnostics... so... what do I need to do to provide useful info here?
You have to disable COW for databases and virtual machines, this is a known issue. -- "Judging by their response, the meanest thing you can do to people on the Internet is to give them really good software for free". - Anil Dash -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Oct 31, 2013 at 5:33 PM, Cristian Rodríguez <crrodriguez@opensuse.org> wrote:
El 31/10/13 16:18, Claudio Freire escribió:
It's not a critical issue, of course, but I'm in a position to assist with tests, benchmarks and diagnostics... so... what do I need to do to provide useful info here?
You have to disable COW for databases and virtual machines, this is a known issue.
Yeah, did that. chattr -R +C pgdata -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Oct 31, 2013 at 5:37 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Thu, Oct 31, 2013 at 5:33 PM, Cristian Rodríguez <crrodriguez@opensuse.org> wrote:
El 31/10/13 16:18, Claudio Freire escribió:
It's not a critical issue, of course, but I'm in a position to assist with tests, benchmarks and diagnostics... so... what do I need to do to provide useful info here?
You have to disable COW for databases and virtual machines, this is a known issue.
Yeah, did that.
chattr -R +C pgdata
I know nobody is paying attention, but I wanted to post some concrete results of a database restore of approximately 200GB, before I forgot: btrfs restore, async commit off: real 315m12.373s user 6m41.757s sys 0m39.985s thoughput sample Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 8.20 18.47 118.78 83.15 34.53 40.34 759.35 1904.12 9155.32 49.37 22163.54 4.95 100.00 sda 1.12 0.22 163.68 24.13 80.89 11.81 1010.85 2491.56 14155.78 73.21 109670.21 5.32 100.00 ext4 restore, async commit off: real 329m5.622s user 6m41.842s sys 0m40.509s thoughput sample Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 5.77 68.65 90.58 136.20 11.74 66.05 702.52 367.35 1609.36 15.17 2669.62 3.88 88.09 sda 15.60 7.75 268.27 52.50 33.77 25.09 375.84 780.36 2081.75 10.70 12664.46 3.12 100.00 sda 0.80 0.40 879.40 0.42 109.17 0.00 254.13 1.99 2.27 1.94 691.84 1.11 97.35 This restore has been performed with 4 parallel jobs, as usually recommended for this kind of task that ought to be somewhat CPU-bound. See that ext4 reflects this CPU-boundness by showing at least 2 samples out of 3 with %util below 100%. So, at least one conclusion can be drawn from this: Under heaving writing, btrfs induces noticeably higher read await times than ext4, resulting in the considerably sluggish system I reported initially. This could be due to the lower wrqm/s noticeable above, though those numbers could be prominently noise. I should emphasize, that COW has been disabled on this partition, and the partition is dedicated to this database (but not the whole disk, just the partition). I should repeat the test with async commit on, but I currently need this test DB for real work, so that could take some time. Yes, ext4 took longer (14 minutes longer). That could be noise, the system wasn't totally idle (I was browsing with firefox, painfully slowly, but inducing some extra load). It does look though to be an I/O scheduling issue more than a performance issue, because restore times are comparable. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello Claudio and all, On 2013-11-18 T 16:57 -0200 Claudio Freire wrote:
I know nobody is paying attention,
Thanks for sharing the tests! I definitely do pay attention, and I know that others do as well, even if there is not always a direct response (based on time constraints for example).
but I wanted to post some concrete results of a database restore of approximately 200GB, before I forgot:
That said, allow me one comment or question:
So, at least one conclusion can be drawn from this:
Under heaving writing, btrfs induces noticeably higher read await times than ext4, resulting in the considerably sluggish system I reported initially. [...] Yes, ext4 took longer (14 minutes longer). That could be noise, the system wasn't totally idle (I was browsing with firefox, painfully slowly, but inducing some extra load). It does look though to be an I/O scheduling issue more than a performance issue, because restore times are comparable.
Well, so, in the end, btrfs was faster, but the system less responsible to user interaction. Is that the right summary? If yes, that would not be too bad of a result, as for a system dedicated to a database, direct user interaction is not the primary use case. :-/ Just for my curiosity: Which type of backend did you use (HDD or SSD?) and which IO Scheduler associated? so long - MgE -- Matthias G. Eckermann Senior Product Manager SUSE® Linux Enterprise SUSE LINUX Products GmbH Maxfeldstraße 5 90409 Nürnberg Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Nov 19, 2013 at 2:59 PM, Matthias G. Eckermann <mge@suse.com> wrote:
So, at least one conclusion can be drawn from this:
Under heaving writing, btrfs induces noticeably higher read await times than ext4, resulting in the considerably sluggish system I reported initially. [...] Yes, ext4 took longer (14 minutes longer). That could be noise, the system wasn't totally idle (I was browsing with firefox, painfully slowly, but inducing some extra load). It does look though to be an I/O scheduling issue more than a performance issue, because restore times are comparable.
Well, so, in the end, btrfs was faster, but the system less responsible to user interaction. Is that the right summary?
Yep.
If yes, that would not be too bad of a result, as for a system dedicated to a database, direct user interaction is not the primary use case. :-/
Well, yes, but this effect would also be observed for other querying processes. So, the issue itself is, that the scheduler is less fair under btrfs. I know schedulers are separate from the filesystem, so I have no idea why this is the case. It shouldn't happen. But it does.
Just for my curiosity: Which type of backend did you use (HDD or SSD?) and which IO Scheduler associated?
Rotating rust (HDD), and CFQ. For a database, the recommended scheduler is deadline, but since this isn't a dedicated system (it's a development box), I kept CFQ. That actually makes the issue more strange, since CFQ is supposed to avoid this sluggishness (and it does succeed with ext4). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Nov 19, 2013 at 3:09 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Tue, Nov 19, 2013 at 2:59 PM, Matthias G. Eckermann <mge@suse.com> wrote:
So, at least one conclusion can be drawn from this:
Under heaving writing, btrfs induces noticeably higher read await times than ext4, resulting in the considerably sluggish system I reported initially. [...] Yes, ext4 took longer (14 minutes longer). That could be noise, the system wasn't totally idle (I was browsing with firefox, painfully slowly, but inducing some extra load). It does look though to be an I/O scheduling issue more than a performance issue, because restore times are comparable.
Well, so, in the end, btrfs was faster, but the system less responsible to user interaction. Is that the right summary?
Yep.
I think I managed to find a clue to a cause. It seems heavy write activity on btrfs partitions induces higher swappiness than the same activity in ext4. Not sure why. Decreasing swappiness moderately improved responsiveness a great deal, so this seems to be the right direction in finding a fix. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (4)
-
Claudio Freire
-
Cristian Rodríguez
-
Hendrik Woltersdorf
-
Matthias G. Eckermann