[kernel-bugs] [Bug 1174278] New: kernel-firmware-amdgpu 20200702 breaks video on Raven Ridge
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 Bug ID: 1174278 Summary: kernel-firmware-amdgpu 20200702 breaks video on Raven Ridge Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: adam.reichold@t-online.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 839828 --> http://bugzilla.opensuse.org/attachment.cgi?id=839828&action=edit GZippped output of YaST's hardware information module Update kernel-firmware-amdgpu from 20200610-1.1 to 20200702-1.1 breaks video on a HP ProBook 455R G6 containing an AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx, i.e. AMD Raven Ridge. The problem manifests itself as a blank screen when the boot splash would be displayed and continuously spinning system fan. The machine is unresponsive when reaching this state, i.e. escape and sysrq keys are not working. Downgrading just the kernel-firwmare-amdgpu packages works around the issue. There is one kernel log message only visible with the new firmware: kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c1 Maximilian Trummer <opensuse@trummer.xyz> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |opensuse@trummer.xyz --- Comment #1 from Maximilian Trummer <opensuse@trummer.xyz> --- *** Bug 1174277 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c2 --- Comment #2 from Maximilian Trummer <opensuse@trummer.xyz> --- I got the same issue with a 3500U in a Lenovo T495. I marked my earlier bug as dupe of this one cause this seems to have identified the cause and provides more detail. It's Picasso though, not Raven Ridge. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 Maximilian Trummer <opensuse@trummer.xyz> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|kernel-firmware-amdgpu |kernel-firmware-amdgpu |20200702 breaks video on |20200702 breaks video on |Raven Ridge |Picasso -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c3 --- Comment #3 from Maximilian Trummer <opensuse@trummer.xyz> --- As a workaround, I also held back kernel-firmware-amdgpu, though I also kept all other firmware packages at their old version. With this, TW 20200716 works for now. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c8 --- Comment #8 from Maximilian Trummer <opensuse@trummer.xyz> --- I installed the kernel-firmware-amdgpu package from your repo now
https://download.opensuse.org/repositories/home:/tiwai:/branches:/Kernel:/HE...
kernel-firmware-amdgpu-20200716-325.1.noarch.rpm
Everything works so far, including - hardware video decoding - 3D application with OpenGL - external monitor -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c9 --- Comment #9 from Takashi Iwai <tiwai@suse.com> --- Thanks for quick testing. Now I submitted the package to FACTORY. I'll keep my branch package for a while until it reaches to TW. Or the equivalent package is found in OBS Kernel:HEAD repo, too. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c23 Ingo Göppert <ingo.goeppert+suse@mailbox.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ingo.goeppert+suse@mailbox. | |org --- Comment #23 from Ingo Göppert <ingo.goeppert+suse@mailbox.org> --- I got the same issue with a 3700U in a Lenovo T495 with the latest tumbleweed snapshot 20201011. Rolled back to 20201007 the GPU works fine. I don't understand why in https://build.opensuse.org/package/show/openSUSE:Factory/kernel-firmware revision 151 the AMDGPU Picasso workaround was dropped, but this issue is still open. The firmware is still buggy and makes my laptop unusable. Please tell me if I can give any debug output or test something. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c24 --- Comment #24 from Takashi Iwai <tiwai@suse.com> --- This seems not relevant with the amdgpu firmware contents, but somehow the firmware files got screwed up. See bug 1177428, the later comments discussing about this recent regression, not about the resume problem. There it was confirmed that the latest 20.40 firmware worked for Picasso devices. So, could you check the following? - Uninstall kernel-firmware-amdgpu package once % zypper rm -u kernel-firmware-amdgpu - Remove the stale files in /lib/firmware/amdgpu (if any) % rm -rf /lib/firmware/amdgpu - Install the latest package from TW % zypper in kernel-firmware-amdgpu-20201005 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c25 --- Comment #25 from Adam Reichold <adam.reichold@t-online.de> --- I actually have a similar issue again too, but it appears sufficiently different (w.r.t. log messages and system fan activity) that I reported it as a separate upstream bug at https://gitlab.freedesktop.org/drm/amd/-/issues/1329. But the timing w.r.t. dropping the old Picasso firmware admittedly looks a bit suspicious. I am currently trying to narrow down the affected kernel version as per upstream request... -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c26 --- Comment #26 from Adam Reichold <adam.reichold@t-online.de> --- Sadly, I have to report that my report as a separate issue was probably erroneous and downgrading the firmware package to 20200916-1.1 did fix the issue for me as well. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c27 --- Comment #27 from Adam Reichold <adam.reichold@t-online.de> --- At this point, I suspect something SUSE specific being the root cause here: While downgrading the firmware package did help and upgrading to kernel-firmware-20201005-337.1.noarch.rpm from https://build.opensuse.org/package/show/home:tiwai:branches:Kernel:HEAD/kern... did also work, just using the plain upstream amdgpu folder from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git and dumping it into /lib/firmware/updates works as well. (I used firmware_class.dyndbg=+p to check that the update files I fetched directly via Git are loaded). -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c28 --- Comment #28 from Adam Reichold <adam.reichold@t-online.de> --- (In reply to Adam Reichold from comment #27)
At this point, I suspect something SUSE specific being the root cause here: While downgrading the firmware package did help and upgrading to kernel-firmware-20201005-337.1.noarch.rpm from https://build.opensuse.org/package/show/home:tiwai:branches:Kernel:HEAD/ kernel-firmware did also work, just using the plain upstream amdgpu folder from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git and dumping it into /lib/firmware/updates works as well. (I used firmware_class.dyndbg=+p to check that the update files I fetched directly via Git are loaded).
I am sorry for spamming this bug report, really. But this is getting weirder still: I have no revert to a clean Tumbleweed 20201011-722.1 and everything still works. So booting into the old firmware once seems to restore the system until something unknown happens that breaks the new firmware?! -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c29 --- Comment #29 from Ingo Göppert <ingo.goeppert+suse@mailbox.org> --- (In reply to Takashi Iwai from comment #24) [...]
So, could you check the following? - Uninstall kernel-firmware-amdgpu package once % zypper rm -u kernel-firmware-amdgpu
- Remove the stale files in /lib/firmware/amdgpu (if any) % rm -rf /lib/firmware/amdgpu
Directory was removed completely.
- Install the latest package from TW % zypper in kernel-firmware-amdgpu-20201005
Done. Only the first time booting with the new firmware the external monitor was not working (internal was ok). I had to put the laptop out of the docking station and put it in again. Since then everything works fine. Done several reboots, no problems with the display anymore. Thanks. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c30 Andras Szerencses <andrewlucky1@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrewlucky1@gmail.com --- Comment #30 from Andras Szerencses <andrewlucky1@gmail.com> --- Created attachment 842643 --> http://bugzilla.opensuse.org/attachment.cgi?id=842643&action=edit After latest upgrade given Picasso graphics starts with nomodeset only Is the problem on my system caused by the one in this bug report or should I report it separately? With latest upgrade on openSUSE Tumbleweed I can boot the system with nomodeset only. This file is from the system with issue. I did a full rollback so the next file below is about details of the working system before the upgrade. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c31 --- Comment #31 from Andras Szerencses <andrewlucky1@gmail.com> --- Created attachment 842644 --> http://bugzilla.opensuse.org/attachment.cgi?id=842644&action=edit Before the latest upgrade given Picasso graphics was working fine So this is my dmesg result with a rollback to the state when it was still working before the actual upgrade. I'd note the upgraded system wouldn't work with with 5.8.12 kernel either what's actually working on the not upgraded booted system without nomodeset. Let me know if I should try 5.8.10 kernel with upgraded system or what other info do you need if there is any? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c32 --- Comment #32 from Takashi Iwai <tiwai@suse.com> --- Try the procedure in comment 24. If it doesn't help, please open another bug report. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174278 http://bugzilla.opensuse.org/show_bug.cgi?id=1174278#c33 --- Comment #33 from Andras Szerencses <andrewlucky1@gmail.com> --- (In reply to Takashi Iwai from comment #32) Thanks for the suggestion although I wouldn't remove stuff like this, in some cases newbies can totally break the system. Saving the stuff on another location is OK. With today's firmware upgrades the issue on my system was solved so seemingly it was some previously missed stuff by openSUSE I assume. Good luck with this Picasso case. Thanks again for the prompt response. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com