[Bug 1212139] New: [amdgpu]] *ERROR* ring sdma0 timeout
https://bugzilla.suse.com/show_bug.cgi?id=1212139 Bug ID: 1212139 Summary: [amdgpu]] *ERROR* ring sdma0 timeout Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: monkeyboyted@yahoo.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 867453 --> https://bugzilla.suse.com/attachment.cgi?id=867453&action=edit dmesg-mohamed-mehany-steps-reproduce Linux steamdeck.lan 6.4.0-rc5-1.g2cab33e-default #1 SMP PREEMPT_DYNAMIC Sun Jun 4 20:15:10 UTC 2023 (2cab33e) x86_64 x86_64 x86_64 GNU/Linux Step to reproduce. ``` sudo -i echo "low" > /sys/class/drm/card1/device/power_dpm_force_performance_level exit sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover ``` LSB Version: n/a Distributor ID: openSUSE Description: openSUSE Tumbleweed Release: 20230605 Codename: n/a Information for package plasma5-mobile: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : plasma5-mobile Version : 5.27.5-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 2.3 MiB Installed : Yes Status : out-of-date (version 5.27.4-1.1 installed) Source package : plasma5-mobile-5.27.5-1.1.src Upstream URL : http://www.kde.org/ Summary : Plasma Mobile Description : Plasma shell and components targeted for phones. https://gitlab.freedesktop.org/drm/amd/-/issues/2220#note_1948249 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c1 --- Comment #1 from ted chang <monkeyboyted@yahoo.com> --- Created attachment 867454 --> https://bugzilla.suse.com/attachment.cgi?id=867454&action=edit dmesg-freezing-while-watching-videos Firefox and anglefish was opened. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 ted chang <monkeyboyted@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[amdgpu]] *ERROR* ring |[amdgpu]] *ERROR* ring |sdma0 timeout |sdma0 timeout 0 - Steam | |deck -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c2 --- Comment #2 from ted chang <monkeyboyted@yahoo.com> --- I wonder if my bug is similar to this bug https://bugzilla.opensuse.org/show_bug.cgi?id=1209294 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c4 --- Comment #4 from ted chang <monkeyboyted@yahoo.com> --- (In reply to Takashi Iwai from comment #3)
Yes, very likely. Let's wait for the upstream resolution.
All power bugs are nasty to debug and fix. The fix might take years. I hope Valve becomes rope into it. They might have more leverage to ask for AMD to spend more engineering hours into it. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c5 --- Comment #5 from ted chang <monkeyboyted@yahoo.com> --- Apparently, my ina2xx controller is dead https://steamcommunity.com/app/1675200/discussions/1/3186864655209404156/?ct... [ 7.430212] ina2xx i2c-PRP0001:02: supply vs not found, using dummy regulator [ 7.441099] ina2xx i2c-PRP0001:02: error configuring the device: -121 [ 7.458895] ina2xx i2c-PRP0001:03: supply vs not found, using dummy regulator [ 7.461193] ina2xx i2c-PRP0001:03: error configuring the device: -121 [ 7.471518] ina2xx i2c-PRP0001:04: supply vs not found, using dummy regulator [ 7.473991] ina2xx i2c-PRP0001:04: error configuring the device: -121 [ 7.484221] thermal LNXTHERM:00: registered as thermal_zone0 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c6 --- Comment #6 from ted chang <monkeyboyted@yahoo.com> --- The controller issue is probably unrelated. I have to wait for upstream. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c7 --- Comment #7 from ted chang <monkeyboyted@yahoo.com> --- AMD employee reproduced the bug https://gitlab.freedesktop.org/drm/amd/-/issues/2220#note_2146527 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c14 --- Comment #14 from ted chang <monkeyboyted@yahoo.com> --- Created attachment 871543 --> https://bugzilla.suse.com/attachment.cgi?id=871543&action=edit dmesg-sdma0 6.6.6-1-default sdma0 recovery timeout worked. ypper info kernel-default Loading repository data... Reading installed packages... Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.6.6-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 238.1 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.6.6-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2023-12-11 09:46:39 +0000 GIT Revision: a946a9f9d865a849717a570675413f097b229184 GIT Branch: stable 13921.428572] br-489b770a8295: port 5(vethd49d2d6) entered disabled state [13921.430516] vethd49d2d6 (unregistering): left allmulticast mode [13921.430530] vethd49d2d6 (unregistering): left promiscuous mode [13921.430555] br-489b770a8295: port 5(vethd49d2d6) entered disabled state [13971.699892] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=118486, emitted seq=118489 [13971.700466] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [13971.700981] amdgpu 0000:04:00.0: amdgpu: GPU reset begin! [13971.873638] amdgpu 0000:04:00.0: amdgpu: MODE2 reset [13971.883852] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume [13971.884507] [drm] PCIE GART of 1024M enabled (table at 0x000000F47FC00000). [13971.884678] [drm] PSP is resuming... [13971.907096] [drm] reserve 0xa00000 from 0xf47e000000 for PSP TMR [13972.572603] amdgpu 0000:04:00.0: amdgpu: SMU is resuming... [13972.573350] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully! [13972.583240] [drm] DMUB hardware initialized: version=0x0300000A [13972.662705] [drm] Failed to add display topology, DTM TA is not initialized. [13972.810113] [drm] Failed to add display topology, DTM TA is not initialized. [13972.830527] [drm] kiq ring mec 2 pipe 1 q 0 [13972.832594] [drm] VCN decode and encode initialized successfully(under DPG Mode). [13972.833161] [drm] JPEG decode initialized successfully. [13972.833168] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [13972.833192] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [13972.833197] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [13972.833200] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [13972.833204] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [13972.833208] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [13972.833211] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [13972.833215] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [13972.833219] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [13972.833223] amdgpu 0000:04:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [13972.833227] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [13972.833231] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [13972.833235] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [13972.833238] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [13972.833241] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [13972.836573] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start [13972.836579] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done [13972.836616] amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded! [13972.836941] [drm] Skip scheduling IBs! [13972.837328] [drm] Skip scheduling IBs! [13972.837479] [drm] Skip scheduling IBs! [13972.844896] [drm] Skip scheduling IBs! -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c15 --- Comment #15 from ted chang <monkeyboyted@yahoo.com> --- The timeout blinked both the internal and external screen. This timeout happened while watching youtube videos. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c16 --- Comment #16 from ted chang <monkeyboyted@yahoo.com> --- I do not think the 6.6 kernel is patched. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c17 --- Comment #17 from ted chang <monkeyboyted@yahoo.com> --- Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Valve Version: F7A0120 Release Date: 12/01/2023 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1212139 https://bugzilla.suse.com/show_bug.cgi?id=1212139#c18 --- Comment #18 from ted chang <monkeyboyted@yahoo.com> --- Created attachment 871544 --> https://bugzilla.suse.com/attachment.cgi?id=871544&action=edit journalctl-sdma0-timeout-recovery Information for package kernel-firmware-amdgpu: ----------------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-firmware-amdgpu Version : 20231214-1.1 Arch : noarch Vendor : openSUSE Installed Size : 22.4 MiB Installed : Yes (automatically) Status : up-to-date Source package : kernel-firmware-20231214-1.1.src Upstream URL : https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/ Summary : Kernel firmware files for AMDGPU graphics driver Description : This package contains compressed kernel firmware files for AMDGPU graphics driver. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com