On 08/12/2021 19:58, Patrik Jakobsson wrote:
On Wed, 2021-12-08 at 19:37 +0100, Frans de Boer wrote:
 LS,
 
 Recently I bought a Radeon RX 6600 XT card from Gigabyte. But I noticed quite
quickly that my system has issues. 95% of the time I try to wake the system -
only the screen is switched off after sometime, no system suspend - the system
input and display freezes.
 I noticed that the drive LED is still working, so assume that the rest of the
system is still working.
 
 Thinking that the PSU might not up-to it's task with this new card, I
upgraded
that too. It seemed to go better, but still have these freezes from time to
time.
 Below is a snippet from the log file around the time the system freezes. The
GPU seems to have issues, from which the software does not seem to recover:
Hi Frans,

Is this on Tumbleweed or Leap? It's easier to track this if you file a bug
report at bugzilla.opensuse.org and add me to CC.

Thanks
Patrik

 ----------------------------->
 Dec  8 18:51:59 pws1 kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR*
Error
waiting for DMUB idle: status=3
 Dec  8 18:52:02 pws1 kernel: snd_hda_intel 0000:03:00.1: refused to change
power state from D3hot to D0
 Dec  8 18:52:02 pws1 kernel: snd_hda_intel 0000:03:00.1: CORB reset
timeout#2,
CORBRP = 65535
 Dec  8 18:52:02 pws1 kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR*
Error
waiting for DMUB idle: status=3
 Dec  8 18:52:02 pws1 kernel: snd_hda_codec_hdmi hdaudioC1D0: Unable to sync
register 0x2f0d00. -5
 Dec  8 18:52:02 pws1 rtkit-daemon[6110]: Supervising 7 threads of 4 processes
of 1 users.
 Dec  8 18:52:02 pws1 rtkit-daemon[6110]: Successfully made thread 23363 of
process 6103 owned by 'frans' RT at priority 5.
 Dec  8 18:52:02 pws1 rtkit-daemon[6110]: Supervising 8 threads of 4 processes
of 1 users.
 Dec  8 18:52:05 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export
SMU
metrics table!
 Dec  8 18:52:08 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done
with your previous command!
 Dec  8 18:52:08 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export
SMU
metrics table!
 Dec  8 18:52:12 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done
with your previous command!
 Dec  8 18:52:12 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to export
SMU
metrics table!
 Dec  8 18:52:12 pws1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=2453204, emitted seq=2453206
 Dec  8 18:52:12 pws1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process
information: process Xorg.bin pid 1860 thread Xorg.bin:cs0 pid 1882
 Dec  8 18:52:12 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
 Dec  8 18:52:12 pws1 kernel: clocksource: Switched to clocksource acpi_pm
 Dec  8 18:52:12 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:52:16 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:52:15 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done
with your previous command!
 Dec  8 18:52:15 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable
gfxoff!
 Dec  8 18:52:20 pws1 kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR*
Error
waiting for DMUB idle: status=3
 Dec  8 18:52:29 pws1 kernel: amdgpu 0000:03:00.0:
[drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
 Dec  8 18:52:29 pws1 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ
disable failed
 Dec  8 18:52:29 pws1 kernel: amdgpu 0000:03:00.0:
[drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
 Dec  8 18:52:29 pws1 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ
disable failed
 Dec  8 18:52:29 pws1 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed
to
halt cp gfx
 Dec  8 18:52:33 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done
with your previous command!
 Dec  8 18:52:33 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable
smu
features.
 Dec  8 18:52:33 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm
features!
 Dec  8 18:52:33 pws1 kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <smu> failed -62
 Dec  8 18:52:33 pws1 kernel: [drm] free PSP TMR buffer
 Dec  8 18:52:34 pws1 kernel: [drm] psp gfx command DESTROY_TMR(0x7) failed
and
response status is (0x80000306)
 Dec  8 18:52:34 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
 Dec  8 18:52:34 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
 Dec  8 18:52:34 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
 Dec  8 18:52:37 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done
with your previous command!
 Dec  8 18:52:37 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
failed
 Dec  8 18:52:37 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: ASIC reset failed
with error, -62 for drm dev, 0000:03:00.0
 Dec  8 18:52:48 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset
succeeded,
trying to resume
 Dec  8 18:52:48 pws1 kernel: [drm] PCIE GART of 512M enabled (table at
0x00000080005A4000).
 Dec  8 18:52:48 pws1 kernel: [drm] VRAM is lost due to GPU reset!
 Dec  8 18:52:48 pws1 kernel: [drm] PSP is resuming...
 Dec  8 18:52:49 pws1 kernel: [drm] failed to load ucode SMC(0x18)
 Dec  8 18:52:49 pws1 kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and
response status is (0x80000306)
 Dec  8 18:52:49 pws1 kernel: [drm] reserve 0xa00000 from 0x81fe000000 for PSP
TMR
 Dec  8 18:52:51 pws1 kernel: [drm] psp gfx command AUTOLOAD_RLC(0x21) failed
and response status is (0x0)
 Dec  8 18:52:51 pws1 kernel: [drm:psp_load_non_psp_fw [amdgpu]] *ERROR*
Failed
to start rlc autoload
 Dec  8 18:52:51 pws1 kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume
failed
 Dec  8 18:52:51 pws1 kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR*
resume of IP block <psp> failed -22
 Dec  8 18:52:51 pws1 kernel: [drm] Skip scheduling IBs!
 Dec  8 18:52:52 pws1 kernel[1780]: Last message '[drm] Skip schedulin'
repeated
3 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:52:51 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(2) failed
 Dec  8 18:52:51 pws1 kernel: [drm] Skip scheduling IBs!
 Dec  8 18:52:52 pws1 kernel[1780]: Last message '[drm] Skip schedulin'
repeated
36 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:52:51 pws1 kernel: amdgpu_cs_ioctl: 22 callbacks suppressed
 Dec  8 18:52:51 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:52:52 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
5 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:52:51 pws1 kernel: snd_hda_intel 0000:03:00.1: refused to change
power state from D3hot to D0
 Dec  8 18:52:51 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:52:52 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
3 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:52:52 pws1 kernel: snd_hda_intel 0000:03:00.1: CORB reset
timeout#2,
CORBRP = 65535
 Dec  8 18:52:52 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with
ret = -22
 Dec  8 18:52:57 pws1 kernel: amdgpu_cs_ioctl: 43 callbacks suppressed
 Dec  8 18:52:57 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:02 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:02 pws1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma1 timeout, signaled seq=20241, emitted seq=20243
 Dec  8 18:53:02 pws1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=14909, emitted seq=14911
 Dec  8 18:53:02 pws1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process
information: process  pid 0 thread  pid 0
 Dec  8 18:53:02 pws1 kernel[1780]: Last message '[drm:amdgpu_job_time'
repeated
1 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:02 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
 Dec  8 18:53:02 pws1 kernel[1780]: Last message 'amdgpu 0000:03:00.0:'
repeated
1 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:02 pws1 kernel: amdgpu 0000:03:00.0: amdgpu: Bailing on TDR for
s_job:4f11, as another already in progress
 Dec  8 18:53:02 pws1 kernel: amdgpu_cs_ioctl: 32 callbacks suppressed
 Dec  8 18:53:02 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:08 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:07 pws1 kernel: amdgpu_cs_ioctl: 33 callbacks suppressed
 Dec  8 18:53:07 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:13 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:12 pws1 kernel: amdgpu_cs_ioctl: 29 callbacks suppressed
 Dec  8 18:53:12 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:18 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:17 pws1 kernel: amdgpu_cs_ioctl: 31 callbacks suppressed
 Dec  8 18:53:17 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:23 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:22 pws1 kernel: amdgpu_cs_ioctl: 32 callbacks suppressed
 Dec  8 18:53:22 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:28 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:28 pws1 kernel: amdgpu_cs_ioctl: 36 callbacks suppressed
 Dec  8 18:53:28 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:33 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:33 pws1 kernel: amdgpu_cs_ioctl: 25 callbacks suppressed
 Dec  8 18:53:33 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 Dec  8 18:53:38 pws1 kernel[1780]: Last message '[drm:amdgpu_cs_ioctl'
repeated
9 times, suppressed by syslog-ng on pws1.fransdb.local
 Dec  8 18:53:38 pws1 kernel: amdgpu_cs_ioctl: 36 callbacks suppressed
 Dec  8 18:53:38 pws1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
 <----------------------------
 After the last line the system is totally unresponsive.
 
 Does anybody has an Idea whether this due to a driver bug, firmare bug or
something else?
 
 System: Phenom II X4 965, 16 GB, Gigabyte Radeon RX 6600 XT pro with 4K
screen.
 
 Regards, Frans.
-- 
A: Yes, just like that                            A: Ja, net zo
Q: Oh, Just like reading a book backwards         Q: Oh, net als een boek
achterstevoren lezen
A: Because it upsets the natural flow of a story  A: Omdat het de natuurlijke
gang uit het verhaal haalt
Q: Why is top-posting annoying?                   Q: Waarom is Top-posting zo
irritant?
 
This is on Tumbleweed.
I am in the (slow) process to install a different distribution (Manjaro) because Leadp15.2 does not support the 4K resolution for this card.
I shall also post it via bugzilla, as requested.

--- Frans.