OK, so I've been testing with SLE12 SP3 kernels on ives.arch.suse.de. One thing I've noticed is that writeback-throttling patches have been backported from upstream to SLE12-SP3 and it screws this workload badly. Read numbers look all nice and dandy like: read[21335]: avg: 6.3 msec; max: 303.8 msec read[21335]: avg: 6.4 msec; max: 224.6 msec read[21335]: avg: 6.2 msec; max: 286.5 msec read[21335]: avg: 6.2 msec; max: 264.8 msec read[21335]: avg: 6.4 msec; max: 258.2 msec read[21335]: avg: 6.4 msec; max: 280.4 msec However writer struggles hard to make any progress at all - usually pgioperf.log contains only entries like: wal[27108]: avg: 0.0 msec; max: 6.1 msec wal[27108]: avg: 0.0 msec; max: 10.9 msec commit[27108]: avg: 2.6 msec; max: 2361.9 msec wal[27108]: avg: 0.0 msec; max: 16.5 msec which are from the time before readers actually managed to start. Then writers are blocked until the whole benchmark completes. Analysis of blktrace data has shown that wbt logic interacts badly with CFQ. As a result of wbt logic, CFQ always sees just one write request at a time so when such request is seen, async queue gets scheduled, eventually gets its time slot and submits that one write request. Once that completes, readers are scheduled again since the async queue has no more IO. As a result we complete about 1 write per couple of seconds which is far too low - single writeback pass through the data file has more writes than we can complete during the whole benchmark run. TODO item: Disable writeback throttling for non-multiqueue devices by default.