[kernel-bugs] [Bug 1179404] New: dbench4 regression on machines dobby
https://bugzilla.suse.com/show_bug.cgi?id=1179404 Bug ID: 1179404 Summary: dbench4 regression on machines dobby Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: jack@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Our grid machine dobby is seeing heavy performance regression on dbench4 runs for all filesystems. Results for XFS: 15-SP2-5.3.18-24.34-187 LEAP-15.2-PT-5.10.0-rc2-pt-master-20201117 Amean 1 45.23 ( 0.00%) 39.53 ( 12.61%) Amean 2 53.92 ( 0.00%) 121.29 (-124.96%) Amean 4 70.61 ( 0.00%) 612.66 (-767.72%) Amean 8 83.34 ( 0.00%) 621.63 (-645.89%) Amean 16 105.08 ( 0.00%) 399.54 (-280.23%) Amean 32 137.76 ( 0.00%) 331.95 (-140.96%) Amean 64 254.23 ( 0.00%) 393.61 ( -54.83%) Amean 512 10252.94 ( 0.00%) 10961.67 ( -6.91%) Results for ext4: Amean 1 40.71 ( 0.00%) 52.90 ( -29.95%) Amean 2 103.78 ( 0.00%) 123.01 ( -18.52%) Amean 4 99.57 ( 0.00%) 505.56 (-407.74%) Amean 8 106.81 ( 0.00%) 341.80 (-220.01%) Amean 16 126.87 ( 0.00%) 359.92 (-183.68%) Amean 32 212.60 ( 0.00%) 624.00 (-193.51%) Amean 64 358.64 ( 0.00%) 1411.40 (-293.54%) Amean 512 4158.79 ( 0.00%) 11070.77 (-166.20%) -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179404 https://bugzilla.suse.com/show_bug.cgi?id=1179404#c1 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hare@suse.com, | |kernel-performance-bugs@sus | |e.de QA Contact|qa-bugs@suse.de |jack@suse.com --- Comment #1 from Jan Kara <jack@suse.com> --- Marvin has bisected the breakage down to commit 103fbf8e4020 ("scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug"). Hannes since you were handling the patch upstream, any idea how it could have caused such a big regression? I'll try to verify the bisection by just reverting the patch and doing a manual run when I have a bit of time... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179404 https://bugzilla.suse.com/show_bug.cgi?id=1179404#c2 --- Comment #2 from Jan Kara <jack@suse.com> --- FWIW I've verified that dobby is indeed using megaraid_sas driver. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179404 https://bugzilla.suse.com/show_bug.cgi?id=1179404#c3 --- Comment #3 from Jan Kara <jack@suse.com> --- So dbench4 numbers on dobby seem to be inherently volatile. Here is a comparison of 5.10-rc5 with 5.10-rc5 + revert of 103fbf8e4020: Amean 1 38.92 ( 0.00%) 43.87 * -12.73%* Amean 2 133.63 ( 0.00%) 101.44 * 24.09%* Amean 4 594.75 ( 0.00%) 522.21 * 12.20%* Amean 8 641.82 ( 0.00%) 361.85 * 43.62%* Amean 16 403.16 ( 0.00%) 244.62 * 39.32%* Amean 32 311.56 ( 0.00%) 254.49 * 18.32%* Amean 64 309.48 ( 0.00%) 258.01 * 16.63%* Amean 512 10766.08 ( 0.00%) 10491.84 * 2.55%* But regardless of the volatility the trend that the revert helps dbench4 performance is pretty clear. I had a look into what could be causing this. One unusual thing is that we have a single rotating drive behind the Megaraid SAS controller and thus BFQ io scheduler is used. Now that I'm thinking about it, I'll probably queue a few more runs in different configurations to rule out other usual issues with dbench - I'll make sure cpu frequency scaling is turned off, and also try switching to mq-deadline IO scheduler to see what impact it has. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179404 https://bugzilla.suse.com/show_bug.cgi?id=1179404#c4 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium --- Comment #4 from Jan Kara <jack@suse.com> --- OK, I have another set of results from dobby, this time with cpufreq performance governor enabled to avoid frequency scaling surprises. TLDR: results from the first run is confirmed: 5.10.0-rc5 5.10.0-rc5 vanilla megaraid-revert Amean 1 36.59 ( 0.00%) 36.41 * 0.49%* Amean 2 127.66 ( 0.00%) 95.83 * 24.94%* Amean 4 585.20 ( 0.00%) 513.57 * 12.24%* Amean 8 544.07 ( 0.00%) 348.76 * 35.90%* Amean 16 371.87 ( 0.00%) 237.78 * 36.06%* Amean 32 310.47 ( 0.00%) 242.73 * 21.82%* Amean 64 302.07 ( 0.00%) 256.75 * 15.00%* Amean 512 10390.57 ( 0.00%) 10701.90 * -3.00%* Now what is interesting is a comparison with mq-deadline IO scheduler: Amean 1 33.50 ( 0.00%) 34.97 * -4.41%* Amean 2 36.34 ( 0.00%) 38.33 * -5.47%* Amean 4 40.00 ( 0.00%) 40.96 * -2.40%* Amean 8 49.45 ( 0.00%) 50.02 * -1.15%* Amean 16 70.48 ( 0.00%) 71.70 * -1.74%* Amean 32 110.90 ( 0.00%) 112.30 * -1.26%* Amean 64 200.85 ( 0.00%) 200.80 ( 0.03%) Amean 512 9402.01 ( 0.00%) 9880.16 * -5.09%* So indeed with mq-deadline IO scheduler the megaraid commit is neutral (well, slightly positive but for dbench that's mostly within a noise). So it means the megaraid change is somehow interacting badly with the BFQ IO scheduler. On the positive side, higher end disks and thus mq-deadline is probably more common with these kinds of storage controllers. But let's investigate what upsets BFQ... I suspect multiple HW queues (which is what megaraid patch exposes AFAIU) somehow upset IO scheduling done by BFQ but I'm not even sure how IO scheduling is supposed to work with multiple HW queues so I have to investigate first. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1179404 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |jack@suse.com -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com