[Bug 1212169] New: drm:amdgpu_job_timeout

https://bugzilla.suse.com/show_bug.cgi?id=1212169 Bug ID: 1212169 Summary: drm:amdgpu_job_timeout Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: azouhr@opensuse.org QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 867480 --> https://bugzilla.suse.com/attachment.cgi?id=867480&action=edit output of hwinfo After updating Tumbleweed to kernel 6.3.4 from 6.3.2, I encounter sporadic hangs of display: [11488.681228] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3214852, emitted seq=3214854 [11488.681693] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 3679 thread firefox:cs0 pid 3746 [11488.682121] amdgpu 0000:07:00.0: amdgpu: GPU reset begin! From what I read, KDE would have to be restarted to work again, however the original issue is probably seen before. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c2 --- Comment #2 from Berthold Gunreben <azouhr@opensuse.org> --- This issue is kine of hard to reproduce. Yesterday, I again ran into the issue, which seems to be tracked here: https://gitlab.freedesktop.org/drm/amd/-/issues/1974 Noteably, I only run into the issue when the screensaver is on, which points to a power management issue with the graphics card. I never encountered this, when manually disabling screen saving. Kernel right now is 6.4.2-1-default -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c3 --- Comment #3 from Berthold Gunreben <azouhr@opensuse.org> --- ok ... I should not have told so, just right now, the machine crashed. I could not even use ssh to access it. I'll add the messages starting with the GPU problem until I did a reboot. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c4 --- Comment #4 from Berthold Gunreben <azouhr@opensuse.org> --- Created attachment 868392 --> https://bugzilla.suse.com/attachment.cgi?id=868392&action=edit messages from crash to hard reboot -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c7 Berthold Gunreben <azouhr@opensuse.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(azouhr@opensuse.o | |rg) | --- Comment #7 from Berthold Gunreben <azouhr@opensuse.org> --- Just a headsup: Since the upgrade of kernel and kernel-firmware packages, I did not have any crashes. Obviously I don't know if this is by chance, or if the issue is fixed. I now run 6.5.0-rc5-2.g997a7e4-default and will continue with Kernel:stable for the time being. One thing to notice (for what its worth ... I don't know): nvtop now displays way lower utilizations of the GPUs. Also, the usage looks more balanced, and the external GPU is also used when the internal is not fully loaded. Seems to be an improvement anyways. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c8 --- Comment #8 from Berthold Gunreben <azouhr@opensuse.org> --- Created attachment 868767 --> https://bugzilla.suse.com/attachment.cgi?id=868767&action=edit new occurance with current kernel and firmware The issue just got me again, this time with current kernel 6.5.0-rc5-2.g997a7e4-default and current kernel-firmware-radeon-20230731-444.1.noarch -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c10 Berthold Gunreben <azouhr@opensuse.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(tiwai@suse.com) --- Comment #10 from Berthold Gunreben <azouhr@opensuse.org> --- Just a question: Is there a way to know what firmware has actually been loaded? The firmware has been installed, but I would like to double check that it is actually used. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c12 --- Comment #12 from Berthold Gunreben <azouhr@opensuse.org> --- Created attachment 868840 --> https://bugzilla.suse.com/attachment.cgi?id=868840&action=edit boot.msg with dyndbg=+p Looks good to me, however adding boot.msg with firmware_class.dyndbg=+p to document and make sure everything is alright. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c13 B <reiokorn@tutanota.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |reiokorn@tutanota.com --- Comment #13 from B <reiokorn@tutanota.com> --- Hello, i just wanted to mention that I also get this type of error and it causes my system to freeze. Need to restart with [Alt]+[SysReq]+REISUB. openSUSE Tumbleweed VERSION="20230821" 6.4.11-1-default Mesa 23.1.5 Dmesg log (full log): https://paste.opensuse.org/pastes/321366d5a4e7 Journalctl amdgpu: https://paste.opensuse.org/pastes/05e50981b729 Reported upstream... https://gitlab.freedesktop.org/drm/amd/-/issues/2801 -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c14 --- Comment #14 from Berthold Gunreben <azouhr@opensuse.org> --- Just wanted to mention, last night I had this crash with 6.5.0-rc7-1.g869afb7-default. I just updated to 6.5.0-7.gb5edcad-default and kernel-firmware-20230829-448.1 from Kernel:HEAD. I will mention here if I get a crash again. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c18 --- Comment #18 from Berthold Gunreben <azouhr@opensuse.org> --- Created attachment 869434 --> https://bugzilla.suse.com/attachment.cgi?id=869434&action=edit dmesg with kernel 6.5.2-1.gfdde566-default I (again) was too early. I just had another crash related to this bug. I am now in the process of updating to the latest kernel, but 6.5.2 did not really fix the issue. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c20 --- Comment #20 from Berthold Gunreben <azouhr@opensuse.org> --- ok, 6.6.0-rc1-2.g45a1ae6-default also displays the issue. I can add the dmesg output if you like, but I don't think, there is anything new in there. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1212169 https://bugzilla.suse.com/show_bug.cgi?id=1212169#c22 --- Comment #22 from Berthold Gunreben <azouhr@opensuse.org> --- I wanted to mention that the issue did not occur now for quite some weeks. Therefore, I think the issue might be solved in newer kernels. Currently running kernel 6.7.0-rc1, but the one before also did not show the issue. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com