Created attachment 802894 [details]
2-clients dbench data from 5.0 versus the SLE-15-SP1 kernel
I've looked at the 2-clients case ignoring the rest.
To me it looks like the problem in mainline 5.0 (booted with security
mitigation disabled) is that only one client can go fast, but not both at the
same time.
I'm attaching a plot that shows that. Sometime the fast/slow roles switch
mid-test, see for example the mainline-4 plot in my figure. Jan already showed
that frequency scaling plays a role here; I'll try to find what's the event or
situation that makes one client go fast but not the other. Jan discovery that
setting governor=performance and max_cstate=1 fixes the problem means that the
key to solve this are on-cpu events, as opposed to what happens off-cpu
(waiting for stuff) -- that should make it slightly easier to see on the
traces.
For that I'll need to identify and trace the IO thread (which I guess it's
called dbench-something) and the XFS journaling thread, as Jan once explained
me these are the ones that get the work done. Once I know on which cpu those
are, I can correlate to frequency changes from turbostat with a sampling rate
like 500 ms or so. Something comforting about the diagram I'm attaching here
is that the phenomenon is rather macroscopic, meaning coarse sampling with
turbostat can be used: a dbench client can be fast or slow, but it stays like
that for a long period of time (seconds).