[Bug 1177428] New: AMDGPU resume fail
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428 Bug ID: 1177428 Summary: AMDGPU resume fail Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: X.Org Assignee: gfx-bugs@suse.de Reporter: karl.mistelberger@nefkom.net QA Contact: gfx-bugs@suse.de Found By: --- Blocker: --- Resume fails with a frozen desktop: [ 24.398940] kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000). [ 24.398978] kernel: [drm] PSP is resuming... [ 24.418828] kernel: [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR [ 24.427721] kernel: [drm] psp command (0x5) failed and response status is (0xFFFF0007) [ 24.873381] kernel: ata2.00: supports DRM functions and may not be fully accessible [ 24.873629] kernel: ata5.00: supports DRM functions and may not be fully accessible [ 24.875318] kernel: ata5.00: supports DRM functions and may not be fully accessible [ 24.875414] kernel: [drm] kiq ring mec 2 pipe 1 q 0 [ 24.875815] kernel: ata2.00: supports DRM functions and may not be fully accessible [ 24.921903] kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 25.403984] kernel: [drm] Fence fallback timer expired on ring sdma0 [ 25.436018] kernel: [drm] Fence fallback timer expired on ring gfx [ 25.436197] kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22). [ 25.436204] kernel: [drm:process_one_work] *ERROR* ib ring test failed (-22). [ 34.512086] kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000). [ 34.512139] kernel: [drm] PSP is resuming... [ 34.532000] kernel: [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR [ 34.541638] kernel: [drm] psp command (0x5) failed and response status is (0xFFFF0007) [ 34.986957] kernel: ata2.00: supports DRM functions and may not be fully accessible [ 34.989592] kernel: ata2.00: supports DRM functions and may not be fully accessible [ 34.990819] kernel: ata5.00: supports DRM functions and may not be fully accessible [ 34.992628] kernel: ata5.00: supports DRM functions and may not be fully accessible [ 35.013814] kernel: [drm] kiq ring mec 2 pipe 1 q 0 [ 35.060526] kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 35.541618] kernel: [drm] Fence fallback timer expired on ring sdma0 [ 36.085419] kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). [ 36.085428] kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110). inxi -zaG: Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel bus ID: 06:00.0 chip ID: 1002:15d8 Display: server: X.Org 1.20.9 compositor: kwin_x11 driver: amdgpu display ID: :0 screens: 1 Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2") s-diag: 582mm (22.9") Monitor-1: DVI-D-0 res: 1920x1080 hz: 60 dpi: 79 size: 621x341mm (24.4x13.4") diag: 708mm (27.9") OpenGL: renderer: AMD RAVEN (DRM 3.38.0 5.8.12-1-default LLVM 10.0.1) v: 4.6 Mesa 20.1.8 direct render: Yes -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c1
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c2
--- Comment #2 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c3
--- Comment #3 from Karl Mistelberger
Is this a regression?
I assembled the machine in July. IIRC suspend/resume worked then. Later I ran into trouble with graphics, which went away with further updating: https://forums.opensuse.org/showthread.php/544219-Amdgpu-Trouble drm is a great idea, but to me it seems it still has teething problems. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c15
--- Comment #15 from Karl Mistelberger
OK, thanks. Then could you try to crawl through the old journal and check which kernel version started showing the problem? The one you showed with 5.7.7 and the lastly tested was 5.7.12. There might be something between them you've tested.
Last good resume from journal is with 8.4-1-default. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c16
--- Comment #16 from Karl Mistelberger
Also, please check whether "amd_iommu=off" boot option makes any improvement wrt this bug.
Booted with amd_iommu=off and got a freeze too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c17
--- Comment #17 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c20
--- Comment #20 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c21
--- Comment #21 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c22
--- Comment #22 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c23
--- Comment #23 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c24
--- Comment #24 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c25
--- Comment #25 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c26
--- Comment #26 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c27
Christian Hartmann
BTW you need to update only amdgpu firmware: % zypper in --oldpackage --force http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/ kernel-firmware-amdgpu-20200916-1.1.noarch.rpm
Downgrading the kernel-firmware-amdgpu package fixed the issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c28
--- Comment #28 from Takashi Iwai
Downgrading the kernel-firmware-amdgpu package fixed the issue.
Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too? We need to check which firmware is involved. In the case of Karl, it was amdgpu/picasso*. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c29
--- Comment #29 from Takashi Iwai
Created attachment 842492 [details] still freezing
Thanks. It's Picasso board, and this was already a problem in the past, hence we shipped the older firmware as a workaround. At the latest kernel-firmware update, we removed the workaround as I was informed that the issue should have been fixed, but apparently it's not fixed. So I'm going to put the old firmware again. However, the question is which old one; I'd really like to see whether the original issue (the GPU error at resume) comes from the firmware or not. Now I uploaded various versions of picasso firmware files taken from linux-firmware.git. The tarball contains subdirectory for each version (e.g. 19.50, 20.10, ...). For testing it, try the following: - Create /lib/firmware/updates/amdgpu directory: % mkdir -p /lib/firmware/updates/amdgpu - Copy the contents of the firmware version you want to test (e.g. 19.50): % cp 19.50/amdgpu/picasso* /lib/firmware/updates/amdgpu/ - Rebuild initrd and retest: % mkinitrd % reboot The version 20.40 is the same one as the latest kernel-firmware package, hence this is supposed to be broken. I included it to be sure. Please check each version and let me know the behavior. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c30
--- Comment #30 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c31
--- Comment #31 from Karl Mistelberger
Please check each version and let me know the behavior. Thanks!
Tested all of them with 5.8.14-1-default and none of them works. Just to be sure: hofkirchen:~ # journalctl -b 0 --no-h --grep amdgpu|grep Loading Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_gpu_info.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_sdma.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_asd.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ta.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_pfp.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_me.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ce.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_rlc_am4.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec2.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/raven_dmcu.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_vcn.bin -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c32
--- Comment #32 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c33
--- Comment #33 from Karl Mistelberger
Thank you for quick testing. This concluded that the resume problem is no regression of the recent firmware files, at least.
Also, could you tell which firmware version did boot properly? I suppose 20.40 caused the same problem?
Never assume anything. All of the five booted correctly and all of them freezed upon resume. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c34
--- Comment #34 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c35
--- Comment #35 from Karl Mistelberger
Hmm. The latest TW kernel-firmware package contains 20.40, so this should have triggered the same problem.
Could you confirm the following? - Remove /lib/firmware/updates/* - Install again the latest TW kernel-firmware (20201008) - mkinitrd and reboot
Boot fails.
If this shows the boot problem, try to put picasso 20.40 firmware again /lib/firmware/updates, mkinitrd and retest.
Boot works. :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c36
--- Comment #36 from Christian Hartmann
(In reply to Christian Hartmann from comment #27)
Downgrading the kernel-firmware-amdgpu package fixed the issue.
Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too? We need to check which firmware is involved.
In the case of Karl, it was amdgpu/picasso*.
I've uploaded my dmesg output... And yes, it also looks like picasso... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c37
--- Comment #37 from Takashi Iwai
(In reply to Takashi Iwai from comment #34)
Hmm. The latest TW kernel-firmware package contains 20.40, so this should have triggered the same problem.
Could you confirm the following? - Remove /lib/firmware/updates/* - Install again the latest TW kernel-firmware (20201008) - mkinitrd and reboot
Boot fails.
If this shows the boot problem, try to put picasso 20.40 firmware again /lib/firmware/updates, mkinitrd and retest.
Boot works. :-)
Weird... Could you try the following? - Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again - Boot with nomodeset, then run % unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd, reboot and check whether it works now -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c38
--- Comment #38 from Takashi Iwai
I've uploaded my dmesg output... And yes, it also looks like picasso...
OK, then could you also check the test in comment 29? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c39
--- Comment #39 from Karl Mistelberger
Weird... Could you try the following?
- Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again
- Boot with nomodeset, then run % unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd, reboot and check whether it works now
Tried that:
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_gpu_info.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_sdma.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_asd.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_ta.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_pfp.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_me.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_ce.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_rlc_am4.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_mec.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_mec2.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/raven_dmcu.bin.xz
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from
/lib/firmware/amdgpu/picasso_vcn.bin
Uncompressed, ran mkinitrd and rebooted. Boot failed:
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring gfx test failed (-110)
Oct 11 20:00:30 kernel: [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of
IP block
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c40
--- Comment #40 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c41
--- Comment #41 from Christian Hartmann
(In reply to Christian Hartmann from comment #36)
I've uploaded my dmesg output... And yes, it also looks like picasso...
OK, then could you also check the test in comment 29?
Just did that... I first tried 20.30 and after that 20.40. Both times the system booted just fine. (In reply to Takashi Iwai from comment #38)
(In reply to Christian Hartmann from comment #36)
I've uploaded my dmesg output... And yes, it also looks like picasso...
OK, then could you also check the test in comment 29?
I just did that... I first tried with 20.30 and after that with 20.40. Both times the system booted just fine. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c42
Zbigniew Luszpinski
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c43
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c44
--- Comment #44 from Karl Mistelberger
Now can people wit Picasso board test the kenrel-firmware-amdgpu package in OBS Kernel:HEAD repo? I didn't revert the amdgpu there yet, but there was some error about the split / copy script and it was fixed at first.
http://download.opensuse.org/repositories/Kernel:/HEAD/standard/noarch/ kernel-firmware-amdgpu-20201005-334.1.noarch.rpm
Please make sure that you clear /lib/firmware/updates/* and uncompressed /lib/firmware/amdgpu/picasso* files beforehand.
Still fails.
If the boot still fails with this version, let's check again the following:
- Uncompress /lib/firmware/amdgpu/picasso*.xz, and retest unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd reboot
:~ # unxz -f /lib/firmware/amdgpu/picasso*.xz unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_ce.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_gpu_info.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_me.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_mec.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_mec2.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_pfp.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_rlc.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_sdma.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_vcn.bin.xz: File format not recognized :~ # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c45
--- Comment #45 from Takashi Iwai
:~ # unxz -f /lib/firmware/amdgpu/picasso*.xz unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized
What shows the output of below? file /lib/firmware/amdgpu/picasso_asd.bin.xz -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c46
--- Comment #46 from Karl Mistelberger
(In reply to Karl Mistelberger from comment #44)
:~ # unxz -f /lib/firmware/amdgpu/picasso*.xz unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized
What shows the output of below? file /lib/firmware/amdgpu/picasso_asd.bin.xz
/lib/firmware/amdgpu/picasso_asd.bin.xz: empty :~ # ll /lib/firmware/amdgpu/picasso*.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_asd.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_ce.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_me.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec2.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_pfp.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_rlc.bin.xz -rw-r--r-- 1 root root 9160 Oct 8 13:23 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_sdma.bin.xz -rw-r--r-- 1 root root 9548 Oct 8 13:23 /lib/firmware/amdgpu/picasso_ta.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_vcn.bin.xz -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c47
--- Comment #47 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c48
--- Comment #48 from Karl Mistelberger
Hm. Something wrong with the installation. Could you retry? At best:
% rm -rf /lib/firmware/amdgpu % zypper rm -u --nodeps kernel-firmware-amdgpu % zypper in --oldpackage -f kernel-firmware-amdgpu-XXX.rpm (where XXX is filled with the actual package rpm you've downloaded.)
Then check the file command again.
:~ # ll /lib/firmware/amdgpu/picasso*.xz -rw-r--r-- 3 root root 31836 Oct 11 23:38 /lib/firmware/amdgpu/picasso_asd.bin.xz -rw-r--r-- 2 root root 3156 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ce.bin.xz -rw-r--r-- 2 root root 112 Oct 11 23:38 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz -rw-r--r-- 2 root root 6104 Oct 11 23:38 /lib/firmware/amdgpu/picasso_me.bin.xz -rw-r--r-- 4 root root 26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec.bin.xz -rw-r--r-- 4 root root 26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec2.bin.xz -rw-r--r-- 2 root root 8312 Oct 11 23:38 /lib/firmware/amdgpu/picasso_pfp.bin.xz -rw-r--r-- 2 root root 9292 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc.bin.xz -rw-r--r-- 1 root root 9160 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz -rw-r--r-- 2 root root 7360 Oct 11 23:38 /lib/firmware/amdgpu/picasso_sdma.bin.xz -rw-r--r-- 1 root root 9548 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ta.bin.xz -rw-r--r-- 3 root root 219540 Oct 11 23:38 /lib/firmware/amdgpu/picasso_vcn.bin.xz :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c49
--- Comment #49 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c50
--- Comment #50 from Karl Mistelberger
OK, now one step forward. And it still fails to boot?
I boots, however suspend/resume still freezes: Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1 Oct 12 10:38:44 kernel: [drm] Fence fallback timer expired on ring sdma0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). Oct 12 10:38:44 kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c51
--- Comment #51 from Takashi Iwai
(In reply to Takashi Iwai from comment #49)
OK, now one step forward. And it still fails to boot?
I boots, however suspend/resume still freezes:
OK, that's more or less expected from previous results. We couldn't hit two birds in a shot :) Now the question is why the boot failure happened with the recent kernel-firmware package. It might be that hard-link contents in the package got broken via update. Let's wait for the test results from other people, too. I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix repo. It uses fdupes with -s option to make the duped file symlinks instead of hard-links, as it might work better. Please test just a simple update of kernel-firmware-amdgpu package from: http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/openSUSE... It's just to confirm that switching to symlink doesn't break things. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c52
--- Comment #52 from Christian Hartmann
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c53
--- Comment #53 from Karl Mistelberger
Please test just a simple update of kernel-firmware-amdgpu package from:
http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
I reverted to broken firmware first by running zypper dup. Then I ran: The following package is going to be upgraded: kernel-firmware-amdgpu The following package is going to change vendor: kernel-firmware-amdgpu openSUSE -> obs://build.opensuse.org/home:tiwai 1 package to upgrade, 1 to change vendor. Overall download size: 7.6 MiB. Already cached: 0 B. No additional space will be used or freed after the operation. Continue? [y/n/v/...? shows all options] (y): Retrieving package kernel-firmware-amdgpu-20201005-336.1.noarch (1/1), 7.6 MiB ( 7.6 MiB unpacked) Boot now works. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c54
--- Comment #54 from Takashi Iwai
(In reply to Takashi Iwai from comment #51)
Please test just a simple update of kernel-firmware-amdgpu package from:
http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
I reverted to broken firmware first by running zypper dup.
Actually I wonder whether the boot still fails at this moment after zypper dup. If the problem was some hard-link mess via the update, it might work magically even without fixing anything else, just because you once uninstalled and cleaned up.
Then I ran:
The following package is going to be upgraded: kernel-firmware-amdgpu
The following package is going to change vendor: kernel-firmware-amdgpu openSUSE -> obs://build.opensuse.org/home:tiwai
1 package to upgrade, 1 to change vendor. Overall download size: 7.6 MiB. Already cached: 0 B. No additional space will be used or freed after the operation. Continue? [y/n/v/...? shows all options] (y): Retrieving package kernel-firmware-amdgpu-20201005-336.1.noarch (1/1), 7.6 MiB ( 7.6 MiB unpacked)
Boot now works.
Thanks for confirmation. I guess using symlink is a better option in anyway, so let's move toward this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c55
--- Comment #55 from Karl Mistelberger
Actually I wonder whether the boot still fails at this moment after zypper dup. If the problem was some hard-link mess via the update, it might work magically even without fixing anything else, just because you once uninstalled and cleaned up.
Reverted from obs://build.opensuse.org/home:tiwai to openSUSE. Boot works now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c56
--- Comment #56 from Christian Hartmann
(In reply to Karl Mistelberger from comment #53)
(In reply to Takashi Iwai from comment #51)
Please test just a simple update of kernel-firmware-amdgpu package from:
http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
I reverted to broken firmware first by running zypper dup.
Actually I wonder whether the boot still fails at this moment after zypper dup. If the problem was some hard-link mess via the update, it might work magically even without fixing anything else, just because you once uninstalled and cleaned up.
If I'm not wrong, this would also explain the behaviour I faced when trying the old firmware releases and switching back to 20.40 still worked fine. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c57
--- Comment #57 from Christian Hartmann
I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix repo. It uses fdupes with -s option to make the duped file symlinks instead of hard-links, as it might work better. Please test just a simple update of kernel-firmware-amdgpu package from:
http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
It's just to confirm that switching to symlink doesn't break things.
Usingt this version I was able to boot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c58
--- Comment #58 from Takashi Iwai
So, I've just checked going back to the official released version and boot fails.
OK. Did you uninstall kernel-firmware-amdgpu package once? Or only upgrade/downgrade the package? If it's the latter case, try the following: - Uninstall kernel-firmware-amdgpu once % zypper rm -u kernel-firmware-amdgpu - Remove stale files in /lib/firmware/amdgpu, if any % rm -f /lib/firmware/amdgpu - Install the kernel-firmware-amdgpu package again from TW % zypper in kernel-firmware-amdgpu-20201005 (pass some option to specify the repo or specify the proper rpm release number to get the TW package.) I guess this would make it working for yours, too. In anyway, it seems that the symlink version of fdupes works better, and I'm going to submit it now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c59
--- Comment #59 from Christian Hartmann
(In reply to Christian Hartmann from comment #57)
So, I've just checked going back to the official released version and boot fails.
OK. Did you uninstall kernel-firmware-amdgpu package once? Or only upgrade/downgrade the package? If it's the latter case, try the following:
- Uninstall kernel-firmware-amdgpu once % zypper rm -u kernel-firmware-amdgpu
- Remove stale files in /lib/firmware/amdgpu, if any % rm -f /lib/firmware/amdgpu
- Install the kernel-firmware-amdgpu package again from TW % zypper in kernel-firmware-amdgpu-20201005 (pass some option to specify the repo or specify the proper rpm release number to get the TW package.)
I guess this would make it working for yours, too.
In anyway, it seems that the symlink version of fdupes works better, and I'm going to submit it now.
Yes, after uninstalling and deleting the files my system boots normally with the version from the official repo -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c61
--- Comment #61 from Zbigniew Luszpinski
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c62
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c65
--- Comment #65 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c66
--- Comment #66 from Takashi Iwai
Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
Did you test the kernel with your current openSUSE system? Also what about more recent kernels? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c67
--- Comment #67 from Karl Mistelberger
(In reply to Karl Mistelberger from comment #65)
Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
Did you test the kernel with your current openSUSE system? Also what about more recent kernels?
Operating System: openSUSE Tumbleweed 20201014 KDE Plasma Version: 5.20.0 KDE Frameworks Version: 5.75.0 Qt Version: 5.15.1 Kernel Version: 5.9.1-1.g8abc535-default OS Type: 64-bit Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics Memory: 29.3 GiB of RAM Graphics Processor: AMD RAVEN kernel-stable freezes too: 3400G:~ # zypper se -is kernel-default Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+----------------+---------+---------------------+--------+------------------ i+ | kernel-default | package | 5.8.15-1.1.gc680e93 | x86_64 | (System Packages) i+ | kernel-default | package | 5.9.1-1.1.g8abc535 | x86_64 | kernel-stable 3400G:~ # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c68
--- Comment #68 from Takashi Iwai
(In reply to Takashi Iwai from comment #66)
(In reply to Karl Mistelberger from comment #65)
Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
Did you test the kernel with your current openSUSE system? Also what about more recent kernels?
Operating System: openSUSE Tumbleweed 20201014 KDE Plasma Version: 5.20.0 KDE Frameworks Version: 5.75.0 Qt Version: 5.15.1 Kernel Version: 5.9.1-1.g8abc535-default
I meant the recent *Fedora* kernel. They must ship the newer version than the tad old 5.6.y. And I don't know yet what you exactly tested with Fedora kernel...
OS Type: 64-bit Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics Memory: 29.3 GiB of RAM Graphics Processor: AMD RAVEN
kernel-stable freezes too:
Do you mean the freeze at boot, or freeze after resume? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c69
--- Comment #69 from Karl Mistelberger
(In reply to Karl Mistelberger from comment #67)
(In reply to Takashi Iwai from comment #66)
(In reply to Karl Mistelberger from comment #65)
Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
Did you test the kernel with your current openSUSE system? Also what about more recent kernels?
Operating System: openSUSE Tumbleweed 20201014 KDE Plasma Version: 5.20.0 KDE Frameworks Version: 5.75.0 Qt Version: 5.15.1 Kernel Version: 5.9.1-1.g8abc535-default
I meant the recent *Fedora* kernel. They must ship the newer version than the tad old 5.6.y. And I don't know yet what you exactly tested with Fedora kernel...
I tested suspend/resume with Fedora, Manjaro and openSUSE here: 3400G:~ # inxi -SMCG System: Host: 3400G Kernel: 5.9.1-1.g8abc535-default x86_64 bits: 64 Console: tty 2 Distro: openSUSE Tumbleweed 20201014 Machine: Type: Desktop Mobo: Gigabyte model: B450 AORUS ELITE v: x.x serial: N/A UEFI: American Megatrends v: F51 date: 12/18/2019 CPU: Topology: Quad Core model: AMD Ryzen 5 3400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2048 KiB Speed: 1291 MHz min/max: 1400/3700 MHz Core speeds (MHz): 1: 1361 2: 1328 3: 1300 4: 1309 5: 1258 6: 1368 7: 1302 8: 1342 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel Display: server: X.Org 1.20.9 driver: amdgpu FAILED: ati unloaded: fbdev,modesetting,vesa resolution: 1920x1080~60Hz OpenGL: renderer: AMD RAVEN (DRM 3.39.0 5.9.1-1.g8abc535-default LLVM 10.0.1) v: 4.6 Mesa 20.1.8 3400G:~ #
Do you mean the freeze at boot, or freeze after resume?
Freeze after suspend/resume with openSUSE. No issues with Fedora and Manjaro. I will try to test a newer Fedora kernel. Tested openSUSE: 5.8.14-1.2, 5.8.15-1.1.gc680e93, 5.9.1-1.1.g8abc535 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c70
--- Comment #70 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c71
--- Comment #71 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c72
--- Comment #72 from Takashi Iwai
Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend.
I'm not sure whether you can deploy the Fedora kernel package onto openSUSE system, but it might be worth to try. Just install it via "rpm -ivh xxx.rpm --nodeps", run "mkinitrd" manually, and see whether it boots up. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c73
--- Comment #73 from Karl Mistelberger
(In reply to Karl Mistelberger from comment #71)
Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend.
I'm not sure whether you can deploy the Fedora kernel package onto openSUSE system, but it might be worth to try. Just install it via "rpm -ivh xxx.rpm --nodeps", run "mkinitrd" manually, and see whether it boots up.
I ran curl https://repos.fedorapeople.org/repos/thl/kernel-vanilla.repo > /etc/zypp/repos.d/kernel-vanilla.repo However I am lost with: 3400G:~ # zypper se -s kernel-vanilla Error building the cache: [kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201...] Valid metadata not found at specified URL History: - [kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201...] Repository type can't be determined. Warning: Skipping repository 'Linux vanilla kernels from mainline series' because of the above error. Some of the repositories have not been refreshed because of an error. 3400G:~ # zypper lr kernel-vanilla-mainline Alias : kernel-vanilla-mainline Name : Linux vanilla kernels from mainline series URI : http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201... Enabled : Yes GPG Check : ( p) Yes Priority : 99 (default priority) Autorefresh : Off Keep Packages : Off Type : NONE GPG Key URI : https://repos.fedorapeople.org/repos/thl/RPM-GPG-KEY-knurd-kernel-vanilla Path Prefix : Parent Service : Keywords : --- Repo Info Path : /etc/zypp/repos.d/kernel-vanilla.repo MD Cache Path : /var/cache/zypp/raw/kernel-vanilla-mainline 3400G:~ # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c74
--- Comment #74 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c75
--- Comment #75 from Karl Mistelberger
It's better not to add repo but just download the target *.rpm file and install it directly.
Here we are: 3400G:/var/cache/zypp/packages/Fedora # rpm -ivh * --nodeps warning: kernel-5.9.1-36.vanilla.1.fc32.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 863625fa: NOKEY Verifying... ################################# [100%] Preparing... ################################# [100%] Updating / installing... 1:kernel-core-5.9.1-36.vanilla.1.fc################################# [ 20%] 2:kernel-modules-5.9.1-36.vanilla.1################################# [ 40%] 3:kernel-5.9.1-36.vanilla.1.fc32 ################################# [ 60%] 4:kernel-modules-extra-5.9.1-36.van################################# [ 80%] 5:kernel-modules-internal-5.9.1-36.################################# [100%] /var/tmp/rpm-tmp.Bw5jCe: line 1: /bin/kernel-install: No such file or directory warning: %posttrans(kernel-core-5.9.1-36.vanilla.1.fc32.x86_64) scriptlet failed, exit status 127 3400G:/var/cache/zypp/packages/Fedora # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c76
--- Comment #76 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c77
--- Comment #77 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c78
--- Comment #78 from Takashi Iwai
As the postinstall scripts fail no kernel is generated. :-(
Then try to install with rpm --noscripts option. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c79
--- Comment #79 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c80
--- Comment #80 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c81
--- Comment #81 from Karl Mistelberger
Interesting, so it's likely either the kernel update to 5.9.x or the fix of kernel-firmware-amdgpu took effect.
There is the old kernel in http://download.opensuse.org/tumbleweed/repo/oss/ and new firmware in http://download.opensuse.org/update/tumbleweed/ i+ | kernel-default | package | 5.8.14-1.2 | x86_64 | Haupt-Repository (OSS) i+ | kernel-firmware-all | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository i+ | kernel-firmware-amdgpu | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository ... i | kernel-firmware-usb-network | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository i | purge-kernels-service | package | 0-7.2 | noarch | Haupt-Repository (OSS) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
Zbigniew Luszpinski
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c84
Karl Mistelberger
I guess it's kernel-firmware workaround, but hey, who knows :)
In anyway, assume that it'll keep working, and let's close now. Feel free to reopen if you encounter the same problem again. Thanks.
Changed the monitor and the freeze upon suspend/resume is back: 3400G:~ # hwinfo --monitor 35: None 00.0: 10002 LCD Monitor [Created at monitor.125] Unique ID: rdCR.K1i5gxVmsEC Parent ID: GBI1.Tt0a+NI8vi1 Hardware Class: monitor Model: "SAMSUNG LU28R55" Vendor: SAM "SAMSUNG" Device: eisa 0x1017 "LU28R55" Serial ID: "H4ZN302578" Resolution: 720x400@70Hz Resolution: 640x480@60Hz Resolution: 640x480@67Hz Resolution: 640x480@72Hz Resolution: 640x480@75Hz Resolution: 800x600@56Hz Resolution: 800x600@60Hz Resolution: 800x600@72Hz Resolution: 800x600@75Hz Resolution: 832x624@75Hz Resolution: 1024x768@60Hz Resolution: 1024x768@70Hz Resolution: 1024x768@75Hz Resolution: 1280x1024@75Hz Resolution: 1152x864@75Hz Resolution: 1280x720@60Hz Resolution: 1280x1024@60Hz Resolution: 3840x2160@60Hz Size: 632x360 mm Year of Manufacture: 2038 Week of Manufacture: 50 Detailed Timings #0: Resolution: 3840x2160 Horizontal: 3840 4016 4104 4400 (+176 +264 +560) +hsync Vertical: 2160 2168 2178 2250 (+8 +18 +90) +vsync Frequencies: 594.00 MHz, 135.00 kHz, 60.00 Hz Driver Info #0: Max. Resolution: 3840x2160 Vert. Sync Range: 50-75 Hz Hor. Sync Range: 30-135 kHz Bandwidth: 594 MHz Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #12 (VGA compatible controller) 3400G:~ # journalctl -b -3 --grep amdgpu -o short-monotonic -p err -- Logs begin at Wed 2020-10-21 16:58:25 CEST, end at Thu 2020-10-29 16:00:08 CET. -- [ 274.870164] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22). [ 377.490546] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). 3400G:~ # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c85
--- Comment #85 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c86
--- Comment #86 from Karl Mistelberger
OK, then please report and track the bug on the upstream bug tracker, e.g. the gitlab.freedesktop.org issues. The package bug must have been fixed, so the rest is pure the driver or the firmware bug, which we can't help much from distro side.
I did so: https://gitlab.freedesktop.org/drm/amd/-/issues/1354 However this morning I found suspend/resume doesn't work anymore with the old monitor. It worked with firmware in http://download.opensuse.org/update/tumbleweed/, but that's now gone and http://download.opensuse.org/tumbleweed/repo/oss/ is now used: 3400G:~ # zypper se -is kernel-firmware-amdgpu Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+------------------------+---------+--------------+--------+-------------------------------- i+ | kernel-firmware-amdgpu | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository 3400G:~ # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c87
--- Comment #87 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c89
Nikolai Nikolaevskii
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c90
--- Comment #90 from Karl Mistelberger
Using AMD Ryzen 3 3200G. In spring 2020 I was using Leap 15.1. To use built-in graphics I needed kernel newer than 4.12 from Leap 15.1. So I used kernels from kernel:stable repo. With Leap 15.1 + kernel 5.5.x suspend to RAM worked OK. With Leap 15.1 + kernel 5.6.x suspend to RAM stopped to work. Then I used kernel 5.3 for Leap 15.1 from Leap 15.2 developers repo to get suspend to RAM working. Now suspend to RAM is working OK with Leap 15.2 and standard 5.3 kernel.
I tried Leap 5.3.18-lp152.66-default and still get the following messages on suspend to RAM: Mar 13 09:46:17 Leap kernel: Non-boot CPUs are not disabled Mar 13 09:46:17 Leap kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22). Mar 13 09:46:17 Leap kernel: [drm:process_one_work] *ERROR* ib ring test failed (-22). So I am wondering what your exact kernel version is. Mine are: i+ | kernel-default | package | 5.3.18-lp152.66.2 | x86_64 | Hauptaktualisierungs-Repository i+ | kernel-default | package | 5.3.18-lp152.63.1 | x86_64 | Hauptaktualisierungs-Repository i+ | kernel-firmware-all | package | 20201120-35.1 | noarch | (System Packages) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c91
--- Comment #91 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c92
--- Comment #92 from Karl Mistelberger
If you see the problem with the latest TW kernel and with the latest kernel-firmware-amdgpu package, you should report the problem to upstream and resolve the bug there at first.
I reported the bug here: https://gitlab.freedesktop.org/drm/amd/-/issues/1354 But I am still waiting for a response. Any idea how to proceed? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c93
--- Comment #93 from Nikolai Nikolaevskii
It's hard to say. We used to have a workaround (keeping the old firmware file) for Vega10 in Leap 15.1 and Leap 15.2, but this was dropped in TW (hence also Kernel:HEAD and Kernel:stable) as well as Leap 15.3.
So, if you have kernel-firmware-all package on your system (not kernel-firmware), it means you having the latest firmware from TW/Kernel:HEAD, and the workaround in the firmware was gone. And, IIRC, this problem depends on the hardware setup such as the backlight level, so the upstream couldn't reproduce the issue.
If you see the problem with the latest TW kernel and with the latest kernel-firmware-amdgpu package, you should report the problem to upstream and resolve the bug there at first.
We can get firmware files from amdgpu-pro drivers, package "RPMS/noarch/amdgpu-dkms-firmware*". The latest 20.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-5... 20.40: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-4... 20.10: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-1... 19.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux Change last numbers to get another version. What files to use? vega10*.bin or vega12*.bin or vega20*.bin or vegam*.bin? Ryzen 3200G has Vega 8, Ryzen 3400G has Vega 11 (Radeon��� RX Vega 11 Graphics). To OP (Karl Mistelberger): try to use firmware from amdgpu-pro-20.10. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c94
--- Comment #94 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c95
Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c96
--- Comment #96 from Karl Mistelberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c97
--- Comment #97 from Nikolai Nikolaevskii
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c98
--- Comment #98 from Nikolai Nikolaevskii
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c99
--- Comment #99 from Nikolai Nikolaevskii
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c100
--- Comment #100 from Karl Mistelberger
ILL OP solved his problem by changing motherboard from Gigabyte B450 Aorus Elite to Asus PRIME B450-PLUS. Mine Asus X570 + Picasso AMD Ryzen 3200G suspends to RAM OK.
Possible reasons: 1. EFI firmware. 2. Problems with LED subsystem.
The ASUSTeK model: PRIME B450-PLUS suspends/resumes flawlessly since moving from Gigabyte B450 Aorus. Spurious crashes of GPU with IO_PAGE_FAULTs observed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
Felipe Martinez
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428
http://bugzilla.opensuse.org/show_bug.cgi?id=1177428#c101
--- Comment #101 from Felipe Martinez
participants (1)
-
bugzilla_noreply@suse.com