[Bug 1131437] New: dbench4 regression with 5.0 kernel on anderson and marvin7
http://bugzilla.suse.com/show_bug.cgi?id=1131437 Bug ID: 1131437 Summary: dbench4 regression with 5.0 kernel on anderson and marvin7 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: jack@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Our performance dashboard shows significant regressions (up to 20%) for dbench4 on anderson and marvin7 especially on XFS filesystem but to some extent also on ext4. This appears both with and without mitigations. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c1 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium CC| |kernel-performance-bugs@sus | |e.de Assignee|kernel-maintainers@forge.pr |jack@suse.com |ovo.novell.com | --- Comment #1 from Jan Kara <jack@suse.com> --- Excerpt from marvin7 global-dhp__io-dbench4-async-xfs-nosecure results: Amean 1 20.05 ( 0.00%) 32.72 ( -63.16%) Amean 2 21.07 ( 0.00%) 29.75 ( -41.18%) Amean 4 26.75 ( 0.00%) 34.36 ( -28.42%) Amean 8 38.10 ( 0.00%) 44.60 ( -17.03%) Amean 16 64.19 ( 0.00%) 68.59 ( -6.86%) Amean 32 123.59 ( 0.00%) 129.22 ( -4.56%) Amean 64 432.81 ( 0.00%) 294.08 ( 32.05%) Stddev 1 1.21 ( 0.00%) 4.70 (-288.82%) Stddev 2 1.34 ( 0.00%) 8.00 (-497.46%) Stddev 4 2.94 ( 0.00%) 7.38 (-151.14%) Stddev 8 6.61 ( 0.00%) 8.90 ( -34.55%) Stddev 16 15.70 ( 0.00%) 13.24 ( 15.66%) Stddev 32 39.10 ( 0.00%) 21.01 ( 46.27%) Stddev 64 163.76 ( 0.00%) 42.91 ( 73.79%) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c2 --- Comment #2 from Jan Kara <jack@suse.com> --- Experimenting on marvin4 if I can reproduce this... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c3 --- Comment #3 from Jan Kara <jack@suse.com> --- So the first pair of runs didn't reproduce the issue. Results from compare_kernels.sh: Amean 1 24.35 ( 0.00%) 25.89 ( -6.36%) Amean 2 34.53 ( 0.00%) 31.78 ( 7.96%) Amean 4 37.27 ( 0.00%) 40.00 ( -7.32%) Amean 8 48.15 ( 0.00%) 48.73 ( -1.20%) Amean 16 69.70 ( 0.00%) 70.78 ( -1.54%) Amean 32 130.43 ( 0.00%) 130.74 ( -0.24%) Amean 64 505.36 ( 0.00%) 522.22 ( -3.34%) Stddev 1 5.21 ( 0.00%) 6.77 ( -29.92%) Stddev 2 7.50 ( 0.00%) 8.29 ( -10.58%) Stddev 4 6.30 ( 0.00%) 6.36 ( -0.85%) Stddev 8 8.43 ( 0.00%) 8.88 ( -5.32%) Stddev 16 16.71 ( 0.00%) 16.86 ( -0.94%) Stddev 32 40.14 ( 0.00%) 41.19 ( -2.60%) Stddev 64 160.02 ( 0.00%) 160.91 ( -0.56%) So it really looks more related to the fact which also Giovanni noticed that dbench4 results became more noisy recently... I'll see if I can nail that down. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c4 --- Comment #4 from Jan Kara <jack@suse.com> --- So there's definitely some noticeable run-to-run variance on marvin4 with 5.0 kernel with 1, 2, and 4 processes: Amean 1 39.99 ( 0.00%) 30.06 ( 24.83%) 39.55 ( 1.10%) 40.41 ( -1.06%) 27.10 ( 32.24%) Amean 2 31.76 ( 0.00%) 31.41 ( 1.08%) 36.08 ( -13.60%) 31.54 ( 0.68%) 34.71 ( -9.31%) Amean 4 39.32 ( 0.00%) 34.54 ( 12.15%) 41.13 ( -4.60%) 42.66 ( -8.50%) 42.64 ( -8.44%) Amean 8 49.04 ( 0.00%) 48.64 ( 0.82%) 48.49 ( 1.13%) 48.90 ( 0.29%) 48.61 ( 0.89%) Amean 16 70.39 ( 0.00%) 70.45 ( -0.09%) 70.17 ( 0.30%) 70.09 ( 0.43%) 70.25 ( 0.19%) Amean 32 131.14 ( 0.00%) 128.87 ( 1.73%) 129.54 ( 1.22%) 128.55 ( 1.97%) 127.95 ( 2.43%) Amean 64 256.17 ( 0.00%) 256.22 ( -0.02%) 254.46 ( 0.67%) 255.01 ( 0.45%) 255.38 ( 0.31%) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c5 --- Comment #5 from Jan Kara <jack@suse.com> --- OK, when the machine is booted with intel_idle.max_cstate=1, the numbers are much more stable (did run just for 1, 2, and 4 clients): Amean 1 36.13 ( 0.00%) 35.76 ( 1.03%) 36.18 ( -0.14% ) 36.44 ( -0.86%) 36.43 ( -0.82%) Amean 2 34.63 ( 0.00%) 31.68 ( 8.50%) 31.83 ( 8.08% ) 31.59 ( 8.77%) 31.70 ( 8.45%) Amean 4 40.73 ( 0.00%) 38.24 ( 6.13%) 39.73 ( 2.46% ) 40.53 ( 0.51%) 40.29 ( 1.09%) Stddev 1 4.65 ( 0.00%) 4.17 ( 10.35%) 4.44 ( 4.43% ) 3.59 ( 22.76%) 4.39 ( 5.45%) Stddev 2 6.42 ( 0.00%) 6.18 ( 3.81%) 6.16 ( 4.09%) 6.38 ( 0.58%) 6.10 ( 4.97%) Stddev 4 5.94 ( 0.00%) 5.57 ( 6.11%) 5.89 ( 0.80%) 5.76 ( 2.97%) 5.64 ( 4.97%) I'm going to verify now how stock 4.20 dbench numbers are / aren't stable to see whether this instability is a recent thing. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c6 Mel Gorman <mgorman@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mgorman@suse.com --- Comment #6 from Mel Gorman <mgorman@suse.com> --- (In reply to Jan Kara from comment #5)
OK, when the machine is booted with intel_idle.max_cstate=1, the numbers are much more stable (did run just for 1, 2, and 4 clients):
Amean 1 36.13 ( 0.00%) 35.76 ( 1.03%) 36.18 ( -0.14% ) 36.44 ( -0.86%) 36.43 ( -0.82%) Amean 2 34.63 ( 0.00%) 31.68 ( 8.50%) 31.83 ( 8.08% ) 31.59 ( 8.77%) 31.70 ( 8.45%) Amean 4 40.73 ( 0.00%) 38.24 ( 6.13%) 39.73 ( 2.46% ) 40.53 ( 0.51%) 40.29 ( 1.09%) Stddev 1 4.65 ( 0.00%) 4.17 ( 10.35%) 4.44 ( 4.43% ) 3.59 ( 22.76%) 4.39 ( 5.45%) Stddev 2 6.42 ( 0.00%) 6.18 ( 3.81%) 6.16 ( 4.09%) 6.38 ( 0.58%) 6.10 ( 4.97%) Stddev 4 5.94 ( 0.00%) 5.57 ( 6.11%) 5.89 ( 0.80%) 5.76 ( 2.97%) 5.64 ( 4.97%)
I'm going to verify now how stock 4.20 dbench numbers are / aren't stable to see whether this instability is a recent thing.
If 4.20 looks good, I would suggest taking a close look / revert of 8e3b40395450 ("cpufreq: intel_pstate: Fix up iowait_boost computation") and the base commit it relies on b8bd1581aa61 ("cpufreq: intel_pstate: Rework iowait boosting to be less aggressive") because they have "potential to regress workloads that pause on IO for short periods" written all over them. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c7 --- Comment #7 from Jan Kara <jack@suse.com> --- Yeah, I'm aware of these two commits (as one bisection on anderson landed there) but they got merged only in 5.1-rc1. So they aren't a reason for this regression. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c8 --- Comment #8 from Jan Kara <jack@suse.com> --- OK, 4.20 results aren't stable either: Amean 1 38.35 ( 0.00%) 38.01 ( 0.90%) 39.70 ( -3.52%) 39.47 ( -2.91%) 38.26 ( 0.24%) Amean 2 35.02 ( 0.00%) 31.31 ( 10.61%) 30.80 ( 12.05%) 35.20 ( -0.50%) 34.39 ( 1.81%) Amean 4 42.70 ( 0.00%) 41.81 ( 2.09%) 41.64 ( 2.47%) 37.73 ( 11.63%) 34.50 ( 19.20%) Stddev 1 5.17 ( 0.00%) 5.29 ( -2.22%) 3.36 ( 35.04%) 5.24 ( -1.35%) 5.19 ( -0.23%) Stddev 2 8.41 ( 0.00%) 8.55 ( -1.64%) 8.53 ( -1.47%) 8.31 ( 1.24%) 8.71 ( -3.60%) Stddev 4 6.49 ( 0.00%) 6.54 ( -0.87%) 6.22 ( 4.14%) 7.07 ( -8.97%) 7.63 ( -17.63%) *But* this is with SCSI_MQ driving the disk. I'll try whether the variance won't be lower with the old block layer because upto 4.20 (including), we were using that for SCSI devices AFAIK. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c9 --- Comment #9 from Jan Kara <jack@suse.com> --- OK, 4.20 with scsi_mod.use_blk_mq=0: Amean 1 38.46 ( 0.00%) 26.94 ( 29.95%) 25.55 ( 33.58%) 26.92 ( 30.00%) 26.01 ( 32.38%) Amean 2 31.93 ( 0.00%) 31.75 ( 0.55%) 31.68 ( 0.76%) 35.29 ( -10.53%) 31.71 ( 0.68%) Amean 4 41.66 ( 0.00%) 35.14 ( 15.65%) 38.41 ( 7.81%) 38.30 ( 8.08%) 36.31 ( 12.84%) Stddev 1 5.87 ( 0.00%) 7.16 ( -21.96%) 5.83 ( 0.74%) 6.77 ( -15.34%) 6.20 ( -5.72%) Stddev 2 9.03 ( 0.00%) 8.73 ( 3.30%) 8.85 ( 1.98%) 8.06 ( 10.75%) 8.66 ( 4.09%) Stddev 4 6.11 ( 0.00%) 7.96 ( -30.25%) 7.69 ( -25.90%) 8.18 ( -33.85%) 9.60 ( -57.09%) So the variability is still there. Will try even older kernels next week. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c10 --- Comment #10 from Jan Kara <jack@suse.com> --- Results with Leap 15.0 kernel (4.12.14-lp150.11-default): Amean 1 22.52 ( 0.00%) 22.51 ( 0.05%) 22.47 ( 0.21%) 22.26 ( 1.12%) 22.29 ( 1.02%) Amean 2 24.02 ( 0.00%) 24.28 ( -1.08%) 24.19 ( -0.69%) 24.30 ( -1.16%) 24.72 ( -2.92%) Amean 4 29.15 ( 0.00%) 28.86 ( 1.00%) 28.83 ( 1.08%) 29.49 ( -1.17%) 29.12 ( 0.10%) Stddev 1 1.03 ( 0.00%) 0.97 ( 5.67%) 1.49 ( -44.48%) 1.40 ( -35.73%) 1.42 ( -36.98%) Stddev 2 1.49 ( 0.00%) 1.83 ( -22.74%) 1.70 ( -14.18%) 1.83 ( -23.22%) 1.75 ( -17.88%) Stddev 4 2.95 ( 0.00%) 3.07 ( -3.97%) 3.00 ( -1.60%) 2.85 ( 3.47%) 3.12 ( -5.57%) You can notice results are stable & also noticeably better for this machine. Will try to narrow it down... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c11 --- Comment #11 from Jan Kara <jack@suse.com> --- Numbers with 4.13 vanilla kernel: Amean 1 40.92 ( 0.00%) 40.26 ( 1.61%) 43.51 ( -6.34%) 44.32 ( -8.32%) 43.99 ( -7.52%) Amean 2 45.70 ( 0.00%) 45.43 ( 0.60%) 44.86 ( 1.85%) 44.96 ( 1.63%) 44.70 ( 2.19%) Amean 4 34.34 ( 0.00%) 35.75 ( -4.10%) 35.40 ( -3.09%) 34.83 ( -1.42%) 34.37 ( -0.08%) Stddev 1 2.93 ( 0.00%) 2.80 ( 4.38%) 5.02 ( -71.29%) 4.96 ( -69.56%) 5.11 ( -74.60%) Stddev 2 3.68 ( 0.00%) 4.26 ( -15.58%) 4.83 ( -31.30%) 4.84 ( -31.59%) 5.33 ( -44.69%) Stddev 4 5.30 ( 0.00%) 5.71 ( -7.89%) 5.89 ( -11.26%) 5.42 ( -2.37%) 5.32 ( -0.53%) So numbers are much worse that with Leap 15.0 and not as stable (but still more stable than with 4.20). Not much surprised - likely cpufreq fixes we have in Leap 15 are missing. Will poke around a some more. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c12 --- Comment #12 from Jan Kara <jack@suse.com> --- 4.18 vanilla kernel numbers: Amean 1 27.80 ( 0.00%) 26.42 ( 4.99%) 26.98 ( 2.94%) 25.19 ( 9.41%) 26.81 ( 3.56%) Amean 2 31.14 ( 0.00%) 30.74 ( 1.30%) 34.20 ( -9.82%) 34.22 ( -9.88%) 31.36 ( -0.71%) Amean 4 37.22 ( 0.00%) 41.98 ( -12.78%) 39.83 ( -7.01%) 33.91 ( 8.90%) 41.87 ( -12.49%) Stddev 1 7.47 ( 0.00%) 6.63 ( 11.34%) 7.09 ( 5.14%) 5.18 ( 30.62%) 6.73 ( 10.00%) Stddev 2 8.42 ( 0.00%) 8.05 ( 4.43%) 8.13 ( 3.49%) 8.46 ( -0.48%) 8.37 ( 0.53%) Stddev 4 8.84 ( 0.00%) 6.54 ( 26.01%) 6.06 ( 31.48%) 7.32 ( 17.27%) 6.94 ( 21.56%) Better numbers overall but still unstable. I guess I'll check that a) intel_idle.max_cstate=1 still fixes instability with 4.18 b) what is missing wrt scheduler / cpufreq in 4.18 compared to what we have in Leap 15.0. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c13 --- Comment #13 from Jan Kara <jack@suse.com> --- 4.18 vanilla kernel numbers with intel_idle.max_cstate=1: Amean 1 26.78 ( 0.00%) 26.63 ( 0.55%) 29.06 ( -8.51%) 26.06 ( 2.70%) 26.81 ( -0.09%) Amean 2 31.79 ( 0.00%) 31.85 ( -0.19%) 32.54 ( -2.36%) 34.30 ( -7.90%) 33.63 ( -5.78%) Amean 4 39.09 ( 0.00%) 41.04 ( -4.99%) 37.31 ( 4.55%) 38.27 ( 2.09%) 35.77 ( 8.49%) Stddev 1 3.31 ( 0.00%) 3.60 ( -8.60%) 5.65 ( -70.56%) 2.67 ( 19.45%) 3.98 ( -20.17%) Stddev 2 6.04 ( 0.00%) 5.86 ( 2.84%) 5.76 ( 4.61%) 6.33 ( -4.83%) 5.84 ( 3.21%) Stddev 4 5.90 ( 0.00%) 5.69 ( 3.56%) 6.69 ( -13.35%) 6.26 ( -6.13%) 6.62 ( -12.13%) So not significantly different to numbers without intel_idle.max_cstate=1. But variation is about comparable to the one with 5.0 and intel_idle.max_cstate=1 (comment 5). I also did 4.18 run with intel_idle.max_cstate=1 and 'performance' frequency governor. Results are: Amean 1 25.07 ( 0.00%) 25.04 ( 0.13%) 25.06 ( 0.07%) 25.03 ( 0.19%) 25.03 ( 0.18%) Amean 2 28.16 ( 0.00%) 27.20 ( 3.42%) 27.21 ( 3.38%) 27.47 ( 2.46%) 27.75 ( 1.45%) Amean 4 29.58 ( 0.00%) 29.50 ( 0.27%) 30.03 ( -1.50%) 30.57 ( -3.33%) 29.49 ( 0.31%) Stddev 1 0.59 ( 0.00%) 0.70 ( -19.15%) 0.55 ( 6.57%) 0.56 ( 5.15%) 0.54 ( 9.03%) Stddev 2 1.41 ( 0.00%) 1.49 ( -5.43%) 1.46 ( -3.51%) 1.10 ( 22.32%) 1.24 ( 12.20%) Stddev 4 2.62 ( 0.00%) 2.71 ( -3.30%) 2.75 ( -4.91%) 2.78 ( -6.16%) 3.14 ( -19.70%) So in this case the variation is significantly reduced. I'll see if I can identify which patches make Leap 15.0 kernel similarly stable. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c14 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |giovanni.gherdovich@suse.co | |m Flags| |needinfo?(giovanni.gherdovi | |ch@suse.com) --- Comment #14 from Jan Kara <jack@suse.com> --- So I took 4.18 + patches.suse/cpufreq-intel_pstate-use-setpoint-of-10-on-servers.patch patches.suse/cpufreq-ondemand-set-default-up_threshold-to-30-on-multi-core-systems.patch patches.suse/cpufreq-intel_pstate-Ramp-up-frequency-faster-when-utilisation-reaches-setpoint.patch patches.suse/cpufreq-intel_pstate-Temporarily-boost-P-state-when-exiting-from-idle.patch and the results are as: Amean 1 39.42 ( 0.00%) 39.00 ( 1.06%) 38.98 ( 1.13%) 39.58 ( -0.40%) 38.93 ( 1.25%) Amean 2 34.35 ( 0.00%) 36.44 ( -6.09%) 31.83 ( 7.32%) 33.67 ( 1.98%) 31.72 ( 7.65%) Amean 4 35.31 ( 0.00%) 42.02 ( -19.02%) 41.41 ( -17.28%) 44.03 ( -24.71%) 43.04 ( -21.90%) So actually worse than vanilla 4.18. It seems it won't be so easy and we'll need to figure out from scratch what's happening with frequency scaling and how to fix it for dbench. Giovanni, any idea what to try? BTW, feel free to use marvin4 for experiments (kernel sources are in source/linux, config I run is in configs/config-global-dhp__io-dbench4-async-xfs). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c15 --- Comment #15 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Created attachment 802894 --> http://bugzilla.suse.com/attachment.cgi?id=802894&action=edit 2-clients dbench data from 5.0 versus the SLE-15-SP1 kernel I've looked at the 2-clients case ignoring the rest. To me it looks like the problem in mainline 5.0 (booted with security mitigation disabled) is that only one client can go fast, but not both at the same time. I'm attaching a plot that shows that. Sometime the fast/slow roles switch mid-test, see for example the mainline-4 plot in my figure. Jan already showed that frequency scaling plays a role here; I'll try to find what's the event or situation that makes one client go fast but not the other. Jan discovery that setting governor=performance and max_cstate=1 fixes the problem means that the key to solve this are on-cpu events, as opposed to what happens off-cpu (waiting for stuff) -- that should make it slightly easier to see on the traces. For that I'll need to identify and trace the IO thread (which I guess it's called dbench-something) and the XFS journaling thread, as Jan once explained me these are the ones that get the work done. Once I know on which cpu those are, I can correlate to frequency changes from turbostat with a sampling rate like 500 ms or so. Something comforting about the diagram I'm attaching here is that the phenomenon is rather macroscopic, meaning coarse sampling with turbostat can be used: a dbench client can be fast or slow, but it stays like that for a long period of time (seconds). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c16 --- Comment #16 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Created attachment 802895 --> http://bugzilla.suse.com/attachment.cgi?id=802895&action=edit Jan's results from comment 1 to 14 compiled into a plot For my future quick reference, all Jan's tables into a single plot. Considering mainly the cases of 1, 2 and 4 clients. One thing that surprises me is that the LEAP-15 kernel from comment #10 is considerably worse than the SLE-15-SP1 kernel (comment #1, first column). Those two should be basically the same kernel. It isn't specified in the previous comments, but I'm assuming that the userspace for all measurement was LEAP-15 (as it is in the marvin dashbord from which this regression was noticed). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c17 --- Comment #17 from Jan Kara <jack@suse.com> --- Giovanni, note that comment 1 and 10 are from different machines (marvin7 vs marvin4), also comment 1 is with security mitigations disabled while I was running my tests on marvin4 with mitigations enabled. Finally yes, userspace has always been Leap15 in my tests. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c18 --- Comment #18 from Jan Kara <jack@suse.com> --- BTW, impera has just bisected dbench4 regression between 5.0 and 5.1 on hardy2 to: Last good commit: a8e1942d97dcc44d1425807c71a4252f9e3b53b6 First bad commit: b8bd1581aa6110eb234c0d424eccd3f32d7317e6
From b8bd1581aa6110eb234c0d424eccd3f32d7317e6 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Date: Thu, 7 Feb 2019 12:51:04 +0100 Subject: [PATCH] cpufreq: intel_pstate: Rework iowait boosting to be less aggressive The current iowait boosting mechanism in intel_pstate_update_util() is quite aggressive, as it goes to the maximum P-state right away, and may cause excessive amounts of energy to be used, which is not desirable and arguably isn't necessary too. Follow commit a5a0809bc58e ("cpufreq: schedutil: Make iowait boost more energy efficient") that reworked the analogous iowait boost mechanism in the schedutil governor and make the iowait boosting in intel_pstate_update_util() work along the same lines. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c19 --- Comment #19 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Created attachment 804342 --> http://bugzilla.suse.com/attachment.cgi?id=804342&action=edit migrations and frequency scaling plot for dbench / 2 clients on marvin4 I don't have answers yet, but I made some plots to show migrations and the three turbostat metrics Avg_MHz, Busy% and Bzy_MHz. These numbers where not taken from the userspace turbostat tool but computed from the tracepoint data of power:pstate_sample (the intel_pstate freq scaling driver/governor). What's apparent in the plots is that on the SLE15 kernel both clients see a very high Bzy_MHz (average frequency w/o including idle time). This value is extremely close to the 1-core-turbo frequency (3.1 GHz on Marvin4), i.e. the frequency available when all but one cores are idle. On the other hand, Bzy_MHz on v5.0 shows: * the first client getting a value on par with SLE15 * the second client never getting more than the max non-turbo p-state (a.k.a. "base frequency", 2.3 GHz on Marvin4). It's actually often less than that. So this is consistent with the remark that "on v5.0 only one client at a time gets low latency". The Busy% signal doesn't look very different on the two kernels, which hints that idling shouldn't be the root cause of the problem. The migration pattern isn't aberrant: except for an initial phase lasting around 5 seconds (the first 10 pages of the flipbook), the clients don't roam too much around but appear to stick to a small set of cpus. It remain to see why v5.0 can't unlock 1-core-turbo for both clients as SLE15 does. Some notes on the plots and frequency formulas: * the most unorthodox of the plots is the migrations panel: it should be thought of as bundle of NCPUS horizontal stripes, each representing a cpu occupation over time. * in the migrations panel cpu are sorted numerically and not according to topology. I may upload an additional diagram showing what a NUMA node looks in such plot; roughly speaking, 0-11 and 24-35 are NUMA node #0 and 12-23,36-47 and NUMA node #1 * the power:pstate_sample tracepoint gives delta_APERF, delta_MPERF and delta_TSC, which is all it's needed to compute the "turbostat metrics". The "delta" part means "since the previous pstate_sample hit". Quick recap: * APERF is a counter ticking at the actual frequency of the core. Stops at idle. * MPERF is a counter ticking at the constant frequency of the max non-turbo p-state, also called "base frequency". Stops at idle. * TSC or Time Stamp Counter is exactly like MPERF but doesn't stop at idle (on Marvin4 -- older precessors have a so colled "non-invariant TSC" which means it stops at idle, and is then exactly the same as MPERF). * the formulas are (straight from turbostat source code): * Avg_MHz = delta_APERF * base_freq / delta_TSC (in the above we're computing the length of the time interval counting TSC ticks, since we know there are 2300M a second of those) * Busy% = delta_MPERF / delta_TSC (in the above we're using that MPERF and TSC ticks at the same speed, but the latter doesn't stop at idle) * Bzy_MHz = delta_APERF / delta_MPERF * base_freq (Here we use the relative speed of APERF wrt MPERF as a multiplier to MPERF's frequency. Also, we use that APERF and MPERF don't tick when in idle). * the attached page is one out of 360, since each page show half a second of activity and a dbench run is 3 minutes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c20 --- Comment #20 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Created attachment 804345 --> http://bugzilla.suse.com/attachment.cgi?id=804345&action=edit second page extract from the full report Another page to show that the patterns described previously hold in general. Next I'm going to attach a 3rd page. The full report in the form of a multi-page PDF file is at https://w3.suse.de/~ggherdovich/BSC1131437/flipbook-dbench-2clients.pdf (~50 MB) Code and data for the plot is at https://gitlab.suse.de/ggherdovich/dbench-migrations Things I forgot to mention in the previous comment: * a shortcoming of this plots is that the xfs kworker journaling thread is not traced. I haven't thought yet at how to grab it; probably I could insert a static tracepoint in the XFS entry point and recognize the interesting kworker that way. After that is done, crossing data with pstate migration is the same as already done * tracing was done with systemtap being careful not to do any IO while the benchmark was running (all the arrays are kept in memory) * I verified the regression was present when collecting the data, i.e. tracing didn't interfere with the benchmark results -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c21 --- Comment #21 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Created attachment 804348 --> http://bugzilla.suse.com/attachment.cgi?id=804348&action=edit 3rd sample page extracted from the full report. Here client#1 from the SLE15 kernel is roaming quite some across cpus, yet it still manages to get top frequency (Busy MHz). OTOH client#1 from v5.0 sticks to almost the same cpu for half a second and can't get the frequency to go up. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 http://bugzilla.suse.com/show_bug.cgi?id=1131437#c22 --- Comment #22 from Giovanni Gherdovich <giovanni.gherdovich@suse.com> --- Regarding the plots: in the AvgMhz, Busy% and BusyMHz panels, vertical orange lines corresponds to migrations. Which is: the client was moved to a different cpu (and the plot then records the new cpu's frequency). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131437 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|jack@suse.com |giovanni.gherdovich@suse.co | |m -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1131437 https://bugzilla.suse.com/show_bug.cgi?id=1131437#c28 Mian Yousaf Kaukab <yousaf.kaukab@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |CONFIRMED CC| |yousaf.kaukab@suse.com Flags|needinfo?(giovanni.gherdovi | |ch@suse.com) | --- Comment #28 from Mian Yousaf Kaukab <yousaf.kaukab@suse.com> --- Tracker bug to revert following patch from various SLE releases: patches.suse/cpufreq-intel_pstate-Revert-upstream-changes-to-iowa.patch -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com