ext4 looks good pgioperfbench ext4 4.11.0-rc5 4.11.0-rc5 vanilla transact-v1r1 Min commit 13.10 ( 0.00%) 11.00 ( 16.03%) Min read 1053.00 ( 0.00%) 1077.30 ( -2.31%) Min wal 0.00 ( 0.00%) 0.00 ( 0.00%) Max-95% commit 6154.80 ( 0.00%) 78.90 ( 98.72%) Max-95% read 1473.40 ( 0.00%) 1097.10 ( 25.54%) Max-95% wal 6359.70 ( 0.00%) 0.10 (100.00%) Max-99% commit 8933.20 ( 0.00%) 382.50 ( 95.72%) Max-99% read 1696.80 ( 0.00%) 1097.10 ( 35.34%) Max-99% wal 7013.60 ( 0.00%) 0.20 (100.00%) Max commit 10651.00 ( 0.00%) 3090.50 ( 70.98%) Max read 1696.80 ( 0.00%) 1097.10 ( 35.34%) Max wal 76206.20 ( 0.00%) 41.40 ( 99.95%) Mean commit 828.89 ( 0.00%) 57.06 ( 93.12%) Mean read 1111.46 ( 0.00%) 1088.66 ( 2.05%) Mean wal 1241.19 ( 0.00%) 0.08 ( 99.99%) This is a limited view of the report but it's fairly obvious it's good. Max wal latency of 76 seconds down to 4 ms, read latencies very similar, commit times way down. However, there appears to be some read starvation going on because the number of read samples is far lower (not in the report). A manual check shows 416 read samples with the vanilla kernel and 80 with the patches. The story is much more severe for ext3 4.11.0-rc5 4.11.0-rc5 vanilla transact-v1r1 Min commit 12.40 ( 0.00%) 9.80 ( 20.97%) Min read 1046.90 ( 0.00%) Min wal 0.00 ( 0.00%) 0.00 ( 0.00%) Max-95% commit 4156.80 ( 0.00%) 101.40 ( 97.56%) Max-95% read 1296.10 ( 0.00%) (100.00%) Max-95% wal 4623.20 ( 0.00%) 0.10 (100.00%) Max-99% commit 6352.20 ( 0.00%) 521.90 ( 91.78%) Max-99% read 1296.20 ( 0.00%) (100.00%) Max-99% wal 5346.10 ( 0.00%) 0.20 (100.00%) Max commit 36643.40 ( 0.00%) 2212.40 ( 93.96%) Max read 1296.20 ( 0.00%) (100.00%) Max wal 45138.40 ( 0.00%) 124.40 ( 99.72%) Those blank entries for read are somewhat of a reporting bug but occur due to no samples being recorded. A manual check verifies. 304 samples with the vanilla kernel and 0 with the patches applied. The per-sample graphs (not presented) shows that commit and wal times are consistently very low but the lack of reads is of concern. A partially manual rerun to see what readers were doing was not particularly revealing. It's stuck in read as you'd expect delboy:~ # cat /proc/3177/stack [<ffffffff810b5876>] io_schedule+0x16/0x40 [<ffffffff811a8466>] wait_on_page_bit_common+0x116/0x1c0 [<ffffffff811ab367>] generic_file_read_iter+0x157/0x8b0 [<ffffffffa01afd9a>] ext4_file_read_iter+0x4a/0xd0 [ext4] [<ffffffff8123b27e>] __vfs_read+0xbe/0x130 [<ffffffff8123c13e>] vfs_read+0x9e/0x170 [<ffffffff8123d666>] SyS_read+0x46/0xa0 [<ffffffff810039ae>] do_syscall_64+0x6e/0x180 [<ffffffff8176be2f>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff They're not completely stalled because tracing one of the readers show that reads are completing but apparently not enough of them to meet the threshold where pgioperf reports something. It could be another flaw in the benchmark and the reason for fewer reads being recorded is simply because writes are not being stalled but it's worth checking out. One major observation supporting that it's a basic timing issue is that the time the benchmark takes to complete is way reduced. 4.11.0-rc5 4.11.0-rc5 vanillatransact-v1r1 User 14.51 8.91 System 188.92 97.41 Elapsed 4432.20 2660.06 That's way faster and this may all be down to timing. Hence, there may be no problem with the patches here as such and what is needed is to adjust the benchmark to report stall times more frequently and increase the number of samples it takes before the benchmark is considered complete.