[Bug 1030310] Regression in WAL and COMMIT times for pgioperf-bench on ext3

29 Mar 2017

      http://bugzilla.suse.com/show_bug.cgi?id=1030310
http://bugzilla.suse.com/show_bug.cgi?id=1030310#c2

--- Comment #2 from Jan Kara <jack@suse.com> ---
OK, so I've been testing with SLE12 SP3 kernels on ives.arch.suse.de. One thing
I've noticed is that writeback-throttling patches have been backported from
upstream to SLE12-SP3 and it screws this workload badly. Read numbers look all
nice and dandy like:

read[21335]: avg: 6.3 msec; max: 303.8 msec
read[21335]: avg: 6.4 msec; max: 224.6 msec
read[21335]: avg: 6.2 msec; max: 286.5 msec
read[21335]: avg: 6.2 msec; max: 264.8 msec
read[21335]: avg: 6.4 msec; max: 258.2 msec
read[21335]: avg: 6.4 msec; max: 280.4 msec

However writer struggles hard to make any progress at all - usually
pgioperf.log contains only entries like:

wal[27108]: avg: 0.0 msec; max: 6.1 msec
wal[27108]: avg: 0.0 msec; max: 10.9 msec
commit[27108]: avg: 2.6 msec; max: 2361.9 msec
wal[27108]: avg: 0.0 msec; max: 16.5 msec

which are from the time before readers actually managed to start. Then writers
are blocked until the whole benchmark completes.

Analysis of blktrace data has shown that wbt logic interacts badly with CFQ. As
a result of wbt logic, CFQ always sees just one write request at a time so when
such request is seen, async queue gets scheduled, eventually gets its time slot
and submits that one write request. Once that completes, readers are scheduled
again since the async queue has no more IO. As a result we complete about 1
write per couple of seconds which is far too low - single writeback pass
through the data file has more writes than we can complete during the whole
benchmark run.

TODO item: Disable writeback throttling for non-multiqueue devices by default.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 1030310] Regression in WAL and COMMIT times for pgioperf-bench on ext3

bugzilla_noreply＠novell.com