Comment # 9 on bug 1030310 from
OK, I've been doing more investigation about this issue. In the end culprit of
stalls are transaction commit times. During the benchmark run average commit
time is ~18s with standard deviation of ~41s! The top 5 commit times are:

274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.

And the reason why transation commits are taking so long (although they are
pretty small) is that flusher worker holds transaction open in
ext4_writepages() while writing back pages. This writeback gets throttled by
CFQ and so it takes a long time for ext4_writepages() to complete and thus for
transaction handle to be dropped while consequently allows transaction commit
to complete.

A relatively simple solution to this problem is that we can start a transaction
only once we find a page that needs block allocation / extent conversion in
ext4_writepages(). With this change transaction commit times drop to 0.1s on
average with standard deviation of 0.15s and top 5 commit times:

0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s

Also the benchmark numbers themselves look better after the change. For reads
results look like:

read[23390]: avg: 10.7 msec; max: 358.5 msec
read[23387]: avg: 10.7 msec; max: 358.8 msec
read[23394]: avg: 10.7 msec; max: 358.9 msec
read[23392]: avg: 10.7 msec; max: 358.6 msec
read[23395]: avg: 10.7 msec; max: 358.6 msec
read[23382]: avg: 10.7 msec; max: 358.7 msec
read[23381]: avg: 10.7 msec; max: 358.9 msec
read[23385]: avg: 10.7 msec; max: 358.4 msec
read[23393]: avg: 10.7 msec; max: 359.0 msec
read[23389]: avg: 10.7 msec; max: 358.6 msec
read[23388]: avg: 10.7 msec; max: 358.7 msec
read[23386]: avg: 10.7 msec; max: 358.3 msec
read[23396]: avg: 10.7 msec; max: 359.0 msec
read[23383]: avg: 10.7 msec; max: 358.5 msec
read[23391]: avg: 10.7 msec; max: 358.9 msec
read[23384]: avg: 10.7 msec; max: 359.0 msec

with maximum observed read latency ~500 msec. Average wal times are 0.0 msec
with maximums at 10-20 msec range and one 300 msec sample. Also commit times
look reasonable. Averages are in 30-50 msec range and maximums peak at 10
seconds - that's still quite big but order of magnitude better than with
unpatched kernel.


You are receiving this mail because: