[Bug 1234320] New: Integrated AMD GPU randomly resets and crashes desktop
https://bugzilla.suse.com/show_bug.cgi?id=1234320 Bug ID: 1234320 Summary: Integrated AMD GPU randomly resets and crashes desktop Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: vortex@z-ray.de QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Hey there I have this issue since I own this GPU. It is an integrated AMDGPU of the Ryzen 7 7800X3D. Every now and then the GPU randomly resets causing a desktop freeze followed by a crash and I find myself back in the login screen of gnome with all unsaved work lost.
Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=20380, emitted seq=20382 Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Process information: process Xwayland pid 3428 thread Xwayland:cs0 pid 3429 Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin! Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State Completed Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: MODE2 reset Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset succeeded, trying to resume Dez 09 14:05:55 makron kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000). Dez 09 14:05:55 makron kernel: [drm] VRAM is lost due to GPU reset! Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: PSP is resuming... Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resuming... Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resumed successfully! Dez 09 14:05:55 makron kernel: [drm] DMUB hardware initialized: version=0x05001C00 Dez 09 14:05:56 makron kernel: [drm] kiq ring mec 2 pipe 1 q 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow start Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow done Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset(2) succeeded! Dez 09 14:05:56 makron kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Dez 09 14:05:56 makron gnome-shell[3428]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
The GPU itself is called "AMD Radeon Graphics (RADV RAPHAEL_MENDOCINO)" at least acording to the Vulkan info of radv. At this point I am not sure if this is a general amdgpu driver bug better to be reported to the upstream Kernel or not. I attached my full system log since the last boot when the crash happens. Additionally I'd like to state that I run a dual GPU system with an nVidia GPU as secondary GPU. If I plug in all my displays into the nVidia GPU. So the NV GPU drives the whole desktop none of these happens. On other AMD GPUs running (Aeon with recent Kernel) I did not observed this issue. One Being a Radeon RX 7700XT and the Steam Deck APU. But I really like to make use of both GPUs for better power efficiency even though this is a desktop PC. Kind regards, V. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1234320 https://bugzilla.suse.com/show_bug.cgi?id=1234320#c1 --- Comment #1 from Imo Hester <vortex@z-ray.de> --- Created attachment 879029 --> https://bugzilla.suse.com/attachment.cgi?id=879029&action=edit journalctl since boot with amdgpu crash -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1234320 https://bugzilla.suse.com/show_bug.cgi?id=1234320#c3 --- Comment #3 from Imo Hester <vortex@z-ray.de> --- Hey there while I was not successful in installing and running 6.12 (I somehow bricked Aeon along the lines and even broke the rollback feature by it ending up in an endless boot - crash - reboot - crash - reboot cycle probably caused by the Aeon health checker and had to re-install the system) I found a "workaround" for the issue. Here are my findings: 1) I noticed the desktop crashes where way more common and imminent if running Steam via a Tumbleweed distrobox container. I barely got a successful launch of Steam without the desktop crashing at least once or freezing up entirely and only a hard reset was possible. 2) Later I pin pointed the freezes to XWayland as I was able to have a stable desktop if running Gnome not in Wayland but X11 mode and thus not causing any X11 client to run via XWayland. However other XWayland clients seem not to be that problematic (eg Discord, VS Code) but Steam really seems to do some weird things causing the GPU to crash. 3) Most interesting though is with Steam running as a flatpak application the crashes where not as often. There where crashes but not as imminent as Steam via Distrobox. Also I am about to believe the issue lies somewhere in Mesa. As Neither Steam or the Kernel had any noticable updates in the past few days of reporting the issue. But Mesa had. And the over all stability decreased based on my subjective observations. Also the Mesa inside the Tumlbeweed distrobox container is a little more recent then the Mesa in the flatpak runtime for Steam. Which might be an indicator as well? Or it is some weird race condition between the amdgpu driver, Mesa, XWayland and what ever Steam does? I'd like to add another crash log where the GPU did not successfully recover from the crash and I had to hard reset my system. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1234320 https://bugzilla.suse.com/show_bug.cgi?id=1234320#c4 --- Comment #4 from Imo Hester <vortex@z-ray.de> --- Created attachment 879294 --> https://bugzilla.suse.com/attachment.cgi?id=879294&action=edit amdgpu crash without successful recover -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1234320 https://bugzilla.suse.com/show_bug.cgi?id=1234320#c5 --- Comment #5 from Imo Hester <vortex@z-ray.de> --- Nevermind it just crashed on X11 as well ... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1234320 https://bugzilla.suse.com/show_bug.cgi?id=1234320#c6 Imo Hester <vortex@z-ray.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vortex@z-ray.de) | --- Comment #6 from Imo Hester <vortex@z-ray.de> --- Created attachment 879299 --> https://bugzilla.suse.com/attachment.cgi?id=879299&action=edit AMDGPU crash with Kernel 6.12.6 As Kernel 6.12.6 is now in the repos I re-tested things and still the GPU crashes if someone opens up "too many" application at the same time. I opened up Discord, FluffyChat, Telegram and Steam (probably observable from the logs) in one go and the iGPU crashed again. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com