[Bug 1168089] New: reaim-io-disk-ext4 regression in 5.4 vs 5.5
http://bugzilla.suse.com/show_bug.cgi?id=1168089 Bug ID: 1168089 Summary: reaim-io-disk-ext4 regression in 5.4 vs 5.5 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: jack@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Marvin has identified a regression in reaim-io-disk workload on ext4 filesystem between 5.4 and 5.5 kernels. It has bisected it down to: ast good/First bad commit ========================== Last good commit: 09edf4d381957b144440bac18a4769c53063b943 First bad commit: b1b4705d54abedfd69dcdf42779c521aa1e0fbd3
From b1b4705d54abedfd69dcdf42779c521aa1e0fbd3 Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski <mbobrowski@mbobrowski.org> Date: Tue, 5 Nov 2019 23:01:37 +1100 Subject: [PATCH] ext4: introduce direct I/O read using iomap infrastructure
This patch introduces a new direct I/O read path which makes use of the iomap infrastructure. The new function ext4_do_read_iter() is responsible for calling into the iomap infrastructure via iomap_dio_rw(). If the read operation performed on the inode is not supported, which is checked via ext4_dio_supported(), then we simply fallback and complete the I/O using buffered I/O. Existing direct I/O read code path has been removed, as it is now redundant. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/f98a6f73fadddbfbad0fc5ed04f712ca0b799f37.157294932... it.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> fs/ext4/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/ext4/inode.c | 38 +------------------------------------- 2 files changed, 54 insertions(+), 39 deletions(-) Comparison ========== good bad Hmean disk-1 2081.89 ( 0.00%) 2042.21 ( -1.91%) Hmean disk-41 93821.51 ( 0.00%) 84536.08 * -9.90%* Hmean disk-81 157689.81 ( 0.00%) 149079.75 * -5.46%* Hmean disk-121 193187.87 ( 0.00%) 188473.52 * -2.44%* Hmean disk-161 216591.93 ( 0.00%) 214000.89 * -1.20%* Hmean disk-201 241489.79 ( 0.00%) 237963.69 ( -1.46%) Hmean disk-241 256201.28 ( 0.00%) 252004.18 * -1.64%* Hmean disk-281 281845.54 ( 0.00%) 276665.57 * -1.84%* Hmean disk-321 286181.28 ( 0.00%) 281991.22 ( -1.46%) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1168089 http://bugzilla.suse.com/show_bug.cgi?id=1168089#c1 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Status|NEW |CONFIRMED CC| |kernel-performance-bugs@sus | |e.de Assignee|kernel-maintainers@forge.pr |jack@suse.com |ovo.novell.com | --- Comment #1 from Jan Kara <jack@suse.com> --- This warrants further investigation when there's time... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1168089 https://bugzilla.suse.com/show_bug.cgi?id=1168089#c2 --- Comment #2 from Jan Kara <jack@suse.com> --- Results before/after the commit b1b4705d54abedfd69dcdf42779c521aa1e0fbd3 on marvin8: Hmean disk-1 3861.00 ( 0.00%) 3355.70 * -13.09%* Hmean disk-25 240384.62 ( 0.00%) 163755.46 * -31.88%* Hmean disk-49 465189.88 ( 0.00%) 318872.02 * -31.45%* Hmean disk-73 620396.60 ( 0.00%) 477124.18 * -23.09%* Hmean disk-97 771883.29 ( 0.00%) 590263.69 * -23.53%* Hmean disk-121 842227.38 ( 0.00%) 655234.66 * -22.20%* Hmean disk-145 925531.91 ( 0.00%) 751295.34 * -18.83%* Hmean disk-169 978764.48 ( 0.00%) 794670.85 * -18.81%* Hmean disk-193 1054644.81 ( 0.00%) 882621.95 * -16.31%* -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1168089 https://bugzilla.suse.com/show_bug.cgi?id=1168089#c3 --- Comment #3 from Jan Kara <jack@suse.com> --- Results before/after the commit b1b4705d54abedfd69dcdf42779c521aa1e0fbd3 on marvin8 with performance cpufreq governor: Hmean disk-1 4166.67 ( 0.00%) 4126.55 ( -0.96%) Hmean disk-25 242718.45 ( 0.00%) 195822.45 * -19.32%* Hmean disk-49 472668.81 ( 0.00%) 377892.03 * -20.05%* Hmean disk-73 640350.88 ( 0.00%) 527710.84 * -17.59%* Hmean disk-97 778074.87 ( 0.00%) 679906.54 * -12.62%* Hmean disk-121 868421.05 ( 0.00%) 812080.54 * -6.49%* Hmean disk-145 964523.28 ( 0.00%) 882352.94 * -8.52%* Hmean disk-169 1014000.00 ( 0.00%) 984466.02 * -2.91%* Hmean disk-193 1128654.97 ( 0.00%) 1058500.91 * -6.22%* The difference is not that big but still noticeable for say 25 processes. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1168089 https://bugzilla.suse.com/show_bug.cgi?id=1168089#c4 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #4 from Jan Kara <jack@suse.com> --- OK, so I've tracked this down by "bisecting" operations inside reaim workfile (which was a bit tedious but not too bad). The small workfile that still reproduces the regression is: 20 disk_dio_rd 20 sync_disk_cp 20 sync_disk_update You could probably still drop sync_disk_update from it but I didn't really try since at this point I understood what's going on. The culprit is that disk_dio_rd opens tmpa.common file with O_DIRECT and reads from it. sync_disk_cp also opens tmpa.common file and reads from it but without O_DIRECT. These files are different for different reaim processes (each process runs in its dedicated dir) but each process alternates between these two workloads so direct IO read usually has runs on a file with existing page cache. And iomap code (unlike legacy direct IO) does evict page cache even during direct IO reads which then forces sync_disk_cp to reread the file from the disk. When I changed reaim not to share the same file for DIO and buffered test, the regression went away. Since mixing of buffered & direct IO is not really interesting wrt performance, I think there's no kernel bug to fix. But as a side note there's now 54752de928c "iomap: Only invalidate page cache pages on direct IO writes" sitting in linux-next which makes iomap DIO code behave the same way as legacy DIO code in this regard so that will make the regression go away as well. I'm somewhat undecided whether we should modify reaim to not use the same file for direct IO read tests and buffered IO read tests. I don't think the sharing of the file is particularly interesting usecase but OTOH if all the accesses are reads, it isn't completely insane either. So I'll probably leave it for now. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1168089 https://bugzilla.suse.com/show_bug.cgi?id=1168089#c5 Mel Gorman <mgorman@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mgorman@suse.com --- Comment #5 from Mel Gorman <mgorman@suse.com> --- (In reply to Jan Kara from comment #4)
I'm somewhat undecided whether we should modify reaim to not use the same file for direct IO read tests and buffered IO read tests. I don't think the sharing of the file is particularly interesting usecase but OTOH if all the accesses are reads, it isn't completely insane either. So I'll probably leave it for now.
While dubious, I think it would be interesting to catch differences in behavior with respect to page cache invalidation due to mixing direct and buffered IO. I added a note to the configuration file about this. -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com