Bug ID | 1234320 |
---|---|
Summary | Integrated AMD GPU randomly resets and crashes desktop |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-bugs@opensuse.org |
Reporter | vortex@z-ray.de |
QA Contact | qa-bugs@suse.de |
Target Milestone | --- |
Found By | --- |
Blocker | --- |
Hey there I have this issue since I own this GPU. It is an integrated AMDGPU of
the Ryzen 7 7800X3D.
Every now and then the GPU randomly resets causing a desktop freeze followed by
a crash and I find myself back in the login screen of gnome with all unsaved
work lost.
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=20380, emitted seq=20382
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Process information: process Xwayland pid 3428 thread Xwayland:cs0 pid 3429
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State Completed
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: MODE2 reset
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset succeeded, trying to resume
> Dez 09 14:05:55 makron kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
> Dez 09 14:05:55 makron kernel: [drm] VRAM is lost due to GPU reset!
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: PSP is resuming...
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resuming...
> Dez 09 14:05:55 makron kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resumed successfully!
> Dez 09 14:05:55 makron kernel: [drm] DMUB hardware initialized: version=0x05001C00
> Dez 09 14:05:56 makron kernel: [drm] kiq ring mec 2 pipe 1 q 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow start
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow done
> Dez 09 14:05:56 makron kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset(2) succeeded!
> Dez 09 14:05:56 makron kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> Dez 09 14:05:56 makron gnome-shell[3428]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
The GPU itself is called "AMD Radeon Graphics (RADV RAPHAEL_MENDOCINO)" at
least acording to the Vulkan info of radv.
At this point I am not sure if this is a general amdgpu driver bug better to be
reported to the upstream Kernel or not.
I attached my full system log since the last boot when the crash happens.
Additionally I'd like to state that I run a dual GPU system with an nVidia GPU
as secondary GPU.
If I plug in all my displays into the nVidia GPU. So the NV GPU drives the
whole desktop none of these happens.
On other AMD GPUs running (Aeon with recent Kernel) I did not observed this
issue. One Being a Radeon RX 7700XT and the Steam Deck APU.
But I really like to make use of both GPUs for better power efficiency even
though this is a desktop PC.
Kind regards,
V.