[Bug 1227180] New: Potential regression io-io_uring-nops reports 3-8% decrease in mainline
https://bugzilla.suse.com/show_bug.cgi?id=1227180 Bug ID: 1227180 Summary: Potential regression io-io_uring-nops reports 3-8% decrease in mainline Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: gabriel.bertazi@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- grid Infrastructure reports a 3-8% regression on io_uring nops test, found on Simba2 test potentially introduced by: Report Message-ID (Internal: <667e4b04./dspYBfaSWvQlPH0%root@laplace.suse.de>) Last Good/First Bad commit ========================== Last good commit: 23fbdde6205d9351bb52a4b8f11ec38bdbc8561a First bad commit: 0667db14e1f029d56243aa2509ebc5f944388200 From 0667db14e1f029d56243aa2509ebc5f944388200 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence@gmail.com> Date: Mon, 18 Mar 2024 22:00:34 +0000 Subject: [PATCH] io_uring: refactor io_req_complete_post() Comparison ========== initial initial last penup last penup first good-96fca68c4fbf bad-afcd48134c58 bad-6498c5c9 bad-87585b05 good-23fbdde6 good-e5c12945 bad-0667db14 Min kIOPS 1429690.00 ( 0.00%) 1321070.00 ( -7.60%) 1384300.00 ( -3.17%) 1387440.00 ( -2.96%) 1447660.00 ( 1.26%) 1440670.00 ( 0.77%) 1383010.00 ( -3.27%) Hmean kIOPS 1593231.10 ( 0.00%) 1465349.28 * -8.03%* 1538450.10 * -3.44%* 1529321.56 * -4.01%* 1594227.00 ( 0.06%) 1595295.82 ( 0.13%) 1539502.39 * -3.37%* Stddev kIOPS 20306.39 ( 0.00%) 18381.94 ( 9.48%) 18745.79 ( 7.69%) 16745.21 ( 17.54%) 17985.08 ( 11.43%) 18572.15 ( 8.54%) 18746.99 ( 7.68%) CoeffVar kIOPS 1.27 ( 0.00%) 1.25 ( 1.58%) 1.22 ( 4.40%) 1.09 ( 14.09%) 1.13 ( 11.48%) 1.16 ( 8.66%) 1.22 ( 4.46%) Max kIOPS 1596750.00 ( 0.00%) 1469160.00 ( -7.99%) 1541920.00 ( -3.43%) 1532590.00 ( -4.02%) 1597540.00 ( 0.05%) 1598520.00 ( 0.11%) 1542790.00 ( -3.38%) BHmean-50 kIOPS 1596569.59 ( 0.00%) 1468943.05 ( -7.99%) 1541505.71 ( -3.45%) 1532015.50 ( -4.04%) 1597234.47 ( 0.04%) 1598279.59 ( 0.11%) 1542587.95 ( -3.38%) BHmean-95 kIOPS 1596456.58 ( 0.00%) 1468444.20 ( -8.02%) 1541379.66 ( -3.45%) 1531885.72 ( -4.04%) 1597010.06 ( 0.03%) 1598150.62 ( 0.11%) 1542470.52 ( -3.38%) BHmean-99 kIOPS 1595092.95 ( 0.00%) 1466984.13 ( -8.03%) 1540200.20 ( -3.44%) 1530919.05 ( -4.02%) 1595875.70 ( 0.05%) 1597044.89 ( 0.12%) 1541282.00 ( -3.37%) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |gabriel.bertazi@suse.com CC| |gabriel.bertazi@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kernel-performance-bugs@sus | |e.de -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 https://bugzilla.suse.com/show_bug.cgi?id=1227180#c1 --- Comment #1 from Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> --- I originally identified this as a test fluctuation. Problem is, it continues to vary up to 15%, which might be hiding real regressions. Raising priority: i.e. http://perf-vm-lp.arch.suse.cz/marvin/dashboard-SLE-15-SP6-openSUSE-LEAP-15.... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 https://bugzilla.suse.com/show_bug.cgi?id=1227180#c2 --- Comment #2 from Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> --- This test has been plagued with variations so I've tried to caracterize that first. It appears to fluctuate way more on SP6. I ran 10 iterations on hardy3 oon SP5, and I'm able to get very consistent runs with 5.14.21-150500.55.80. Out of 10 runs of the nops test, I've got minor variations only. 5.14.21-150500.55.80 Hmean kIOPS 949512.35 ( 0.00%) 950248.45 ( 0.08%) 946624.42 ( -0.30%) 949864.34 ( 0.04%) 949096.65 ( -0.04%) 949732.54 ( 0.02%) 948473.48 ( -0.11%) 949273.86 ( -0.03%) 948704.33 ( -0.09%) 949414.88 ( -0.01%) Stddev kIOPS 867.48 ( 0.00%) 357.05 ( 58.84%) 3018.65 (-247.98%) 883.33 ( -1.83%) 457.11 ( 47.31%) 612.73 ( 29.37%) 497.17 ( 42.69%) 271.68 ( 68.68%) 552.23 ( 36.34%) 511.52 ( 41.03%) Not directly linked to this bug, but recording here for future reference. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 https://bugzilla.suse.com/show_bug.cgi?id=1227180#c3 --- Comment #3 from Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> --- In SP6, we have much more variation. Same machine, SP6. mitigations=auto: Hmean kIOPS 746353.26 ( 0.00%) 742246.48 ( -0.55%) 748464.20 ( 0.28%) 743324.07 ( -0.41%) 752230.04 ( 0.79%) 748760.59 ( 0.32%) 752005.82 ( 0.76%) 744865.18 ( -0.20%) 746811.71 ( 0.06%) 750949.33 ( 0.62%) Stddev kIOPS 718.33 ( 0.00%) 1927.17 (-168.28%) 756.26 ( -5.28%) 717.91 ( 0.06%) 816.84 ( -13.71%) 839.24 ( -16.83%) 982.03 ( -36.71%) 1491.61 (-107.65%) 1768.06 (-146.13%) 769.85 ( -7.17%) CoeffVar kIOPS 0.10 ( 0.00%) 0.26 (-169.77%) 0.10 ( -4.98%) 0.10 ( -0.35%) 0.11 ( -12.83%) 0.11 ( -16.46%) 0.13 ( -35.68%) 0.20 (-108.06%) 0.24 (-145.98%) 0.10 ( -6.52%) Note the results here are 25% worse than SP5. But the variation is still stable. I then disabled the mitigations: mitigations=off Hmean kIOPS 1539106.99 ( 0.00%) 1539732.55 ( 0.04%) 1538721.65 ( -0.03%) 1538612.99 ( -0.03%) 1537466.46 ( -0.11%) 1538740.83 ( -0.02%) 1539686.30 ( 0.04%) 1540408.73 ( 0.08%) 1538587.63 ( -0.03%) 1538455.36 ( -0.04%) Stddev kIOPS 6159.42 ( 0.00%) 3156.50 ( 48.75%) 4129.19 ( 32.96%) 3844.70 ( 37.58%) 4154.63 ( 32.55%) 7024.73 ( -14.05%) 3229.00 ( 47.58%) 3099.93 ( 49.67%) 4159.56 ( 32.47%) 4051.66 ( 34.22%) CoeffVar kIOPS 0.40 ( 0.00%) 0.21 ( 48.77%) 0.27 ( 32.94%) 0.25 ( 37.56%) 0.27 ( 32.48%) 0.46 ( -14.08%) 0.21 ( 47.60%) 0.20 ( 49.71%) 0.27 ( 32.45%) 0.26 ( 34.19%) And finally we see much more variation. I'm not sure why yet, from io_uring point of view, this should be a simple benchamrk were every operation executes inlined. My initial though is the scheduler change were we initially place new tasks. This executes independently multiple threads and agglutinates the result so perhaps we are seeing the effect of cpu in deeper sleep states getting scheduled to handle this task? Let me try that next by reverting that scheduler change. I'm also puzzled by the results of mitigations=auto, where SP5 is better than SP6. that makes no sense. Let me look into that. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 https://bugzilla.suse.com/show_bug.cgi?id=1227180#c4 --- Comment #4 from Gabriel Krisman Bertazi <gabriel.bertazi@suse.com> --- (In reply to Gabriel Krisman Bertazi from comment #3)
Hmean kIOPS 746353.26 ( 0.00%) 742246.48 ( -0.55%) 748464.20 ( 0.28%) 743324.07 ( -0.41%) 752230.04 ( 0.79%) 748760.59 ( 0.32%) 752005.82 ( 0.76%) 744865.18 ( -0.20%) 746811.71 ( 0.06%) 750949.33 ( 0.62%)
Stddev kIOPS 718.33 ( 0.00%) 1927.17 (-168.28%) 756.26 ( -5.28%) 717.91 ( 0.06%) 816.84 ( -13.71%) 839.24 ( -16.83%) 982.03 ( -36.71%) 1491.61 (-107.65%) 1768.06 (-146.13%) 769.85 ( -7.17%)
CoeffVar kIOPS 0.10 ( 0.00%) 0.26 (-169.77%) 0.10 ( -4.98%) 0.10 ( -0.35%) 0.11 ( -12.83%) 0.11 ( -16.46%) 0.13 ( -35.68%) 0.20 (-108.06%) 0.24 (-145.98%) 0.10 ( -6.52%)
mitigations=off
Stddev kIOPS 6159.42 ( 0.00%) 3156.50 ( 48.75%) 4129.19 ( 32.96%) 3844.70 ( 37.58%) 4154.63 ( 32.55%) 7024.73 ( -14.05%) 3229.00 ( 47.58%) 3099.93 ( 49.67%) 4159.56 ( 32.47%) 4051.66 ( 34.22%)
CoeffVar kIOPS 0.40 ( 0.00%) 0.21 ( 48.77%) 0.27 ( 32.94%) 0.25 ( 37.56%) 0.27 ( 32.48%) 0.46 ( -14.08%) 0.21 ( 47.60%) 0.20 ( 49.71%) 0.27 ( 32.45%) 0.26 ( 34.19%)
I'm also puzzled by the results of mitigations=auto, where SP5 is better than SP6. that makes no sense. Let me look into that.
Nevermind. I forgot these are are actually percentages in mmtests. it is in openQA dashboard that these are decimals. So we have 0.2% to 0.4% variance. not ba -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227180 Yuhu Zhao <zhao.yuhu@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zhao.yuhu@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com