[kernel-bugs] [Bug 1178318] New: RX 5600M shader clocks stuck at 300MHz with kernel 5.9.1
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318 Bug ID: 1178318 Summary: RX 5600M shader clocks stuck at 300MHz with kernel 5.9.1 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: aaron.zakhrov@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 843188 --> http://bugzilla.opensuse.org/attachment.cgi?id=843188&action=edit Unigine Heaven with GALLIUM HUD=shader-clock and DRI_PRIME=1 set With Kernel 5.9.1-default on my Dell G5 15 SE running OpenSUSE tumbleweed, my discrete Radeon RX 5600M has its shader clock stuck at 300MHz. The relevant upstream bug reports are: https://gitlab.freedesktop.org/drm/amd/-/issues/1272 and https://gitlab.freedesktop.org/drm/amd/-/issues/1252 This happens regardless of GPU load. Glxgears, vkcube and Unigine heaven all show the same behavior both plugged in and on battery. The clock is not being misreported either. I get very slow graphics performance on the discrete GPU when using kernel 5.9.1 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c1
--- Comment #1 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c2
--- Comment #2 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c4
--- Comment #4 from Aaron Dominick
(In reply to Aaron Dominick from comment #1)
As far as I can tell, this seems to be either a configuration mismatch when the RPM is built. Using the 5.9.1 kernel compiled with make localmodconfig has the discrete GPU running at the proper AMD specified clock speeds (800 - 1700 MHz with a max of 1750Mhz when the high power profile is forced)
What exactly did you with localmodconfig? You rebuilt the kernel based on TW 5.9.1 kernel, or the config carried from the older 5.8.x?
In anyway, please give the kernel config and dmesg output of both good and bad-working 5.9.1 kernels, so that we can compare in details.
I built it based on the Tumbleweed 5.9.1 kernel. Attached is the Tumbleweed default kernel configuration and boot log -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c5
--- Comment #5 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c6
--- Comment #6 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c7
--- Comment #7 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c8
--- Comment #8 from Aaron Dominick
Thanks. Are the above from your kernel? Could you re-install the original openSUSE kernel and reproduce the issue, and attach the dmesg output from there?
At the next build, I recommend you to modify CONFIG_LOCALVERSION to point to a different suffix, so that you can install both your kernel and the original kernel at the same time.
That is from the original OpenSUSE kernel (5.9.1-default) that came with the kernel-default RPM package -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c9
--- Comment #9 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c10
--- Comment #10 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c11
--- Comment #11 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c12
--- Comment #12 from Aaron Dominick
OK, then those are the results with failure, right? Now we need the results from the working state, i.e. from your own kernel.
I've attached the files for the success state. The kernel compiled with localmodconfig removes the clock issue but loads only modules that were loaded on compile time. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c13
--- Comment #13 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c14
--- Comment #14 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c15
--- Comment #15 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c16
--- Comment #16 from Aaron Dominick
Was it really the pure result of make localmodconfig?
I wonder it because the log showed more differences, e.g. EDAC messages are gone on your kernel, and some RCU-related messages are also gone. Also, the elantech i2c stuff might be missing as well.
Please double-check whether you have the same modules loaded on both kernels.
It seems to be a conflict with OpenSUSE's kernel confguration and the ACPI-D3Cold patch that is mentioned here: https://gitlab.freedesktop.org/drm/amd/-/issues/1252 When I manually built a 5.9-rc7 kernel with that patch applied I noticed the same behavior. On Kernel 5.8.x before the patch was applied, the discrete GPU's shader clock would drop from 800MHz down to 300-400MHz on Unigine Heaven or similarly intensive workloads. Lighter workloads like vkcube and glxgears would stay at 800MHz. I am not sure what downstream patch causes this but the issue seems to go away on kernel 5.10-rc2 built with olddefconfig. I can share those logs and config files as well. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c17
--- Comment #17 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c18
--- Comment #18 from Aaron Dominick
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c19
--- Comment #19 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c20
--- Comment #20 from Aaron Dominick
There is a 5.10-rc kernel package in OBS Kernel:HEAD repo. Could you check with it, too?
The kernel from OBS Kernel:HEAD works fine -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c21
--- Comment #21 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c22
--- Comment #22 from Aaron Dominick
Thanks for confirmation. So it's indeed fixed in 5.10 tree, and it implies that the issue isn't about the config but the real root cause was something else that could be "fixed" properly :)
The question is what's the fix...
I am not entirely sure myself but it seems to be related to either ACPI or AMDGPU power management code. On kernel 5.8.15 (which I dont have anymore unfortunately) the shader clocks would run between 300MHz and 800MHz depending on GPU load. On kernels 5.9.x the clock runs at 300MHz flat and on kernel 5.10 it runs at the proper "game" clock speed between 800MHz and 1750MHz. I think it is one of the additional SUSE patches. The changelog for 5.10 says they have been removed. 2 of those relate to AMDGPU. I think that is what fixed it? Either that or one of the new config options. The downstream patches should not have affected my manually built kernel from kernel.org In any case I think we can close this bug since it seems to be fixed in 5.10 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318
http://bugzilla.opensuse.org/show_bug.cgi?id=1178318#c23
Aaron Dominick
participants (1)
-
bugzilla_noreply@suse.com