Mailinglist Archive: opensuse-bugs (6221 mails)

< Previous Next >
[Bug 1084767] 3D & games produce periodic GPU crashes (Radeon R7 370)
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Mon, 09 Apr 2018 19:47:50 +0000
  • Message-id: <bug-1084767-21960-iCqsbAFtb2@http.bugzilla.opensuse.org/>
http://bugzilla.opensuse.org/show_bug.cgi?id=1084767
http://bugzilla.opensuse.org/show_bug.cgi?id=1084767#c19

--- Comment #19 from Mircea Kitsune <sonichedgehog_hyperblast00@xxxxxxxxx> ---
(In reply to Max Staudt from comment #18)

The possibility of a hardware failure was ruled out by several tests:

- I've ran the card at reduced GPU / VRAM clock frequencies (DPM disabled).
Even when underclocked, the behavior of the issue is in no way affected. When a
hardware issue is at play, people always report the clock rates proportionally
influencing the crashes.

- I constantly monitor the temperature of the system. The highest I ever caught
the GPU at is 67°C, but typically it will only reach 64°C. The freezes are also
not influenced by the changing temperature in my room, the effect is the same
even if it's 23°C or 27°C in here.

- The freezes are only caused by 3D rendering, unrelated to the complexity of
the data (even simple scenes may cause them). If the VRAM or GPU were broken,
the same freezes would be caused by many other programs which put equal or more
strain on resources (GTK / QT interfaces, desktop compositing, etc).

That being said, I've just gathered new evidence that the driver settings are
involved in this issue: Today I tested the last amdgpu parameters on the list,
and seem to have found a set that greatly mitigates the problem. Those
parameters have given me up to 144 minutes before experiencing the freeze, a
huge record compared to the previous 90 minutes! They are:

amdgpu.prim_buf_per_se=16
amdgpu.pos_buf_per_se=16
amdgpu.cntl_sb_buf_per_se=16
amdgpu.param_buf_per_se=16

By default, all 4 of those settings are set to 0 by the system. Setting them to
16 has, at least during one test case, reduced the problem to 1/5 of its
previous frequency. The descriptions of the variables are:

parm: prim_buf_per_se:the size of Primitive Buffer per Shader Engine (default
depending on gfx) (int)
parm: pos_buf_per_se:the size of Position Buffer per Shader Engine (default
depending on gfx) (int)
parm: cntl_sb_buf_per_se:the size of Control Sideband per Shader Engine
(default depending on gfx) (int)
parm: param_buf_per_se:the size of Off-Chip Pramater Cache per Shader Engine
(default depending on gfx) (int)

I've obviously let the guys at freedesktop.org know about this as well, but
unfortunately that tracker seems to be very inactive. I'd appreciate it if
someone could at least check what those parameters do and let me know why they
had such a fundamental effect on the issue earlier.

If it's later agreed on that those or other driver parameters were at fault, I
may reopen this or start a new issue to suggest changing their defaults, if
that is considered okay. If not I will continue this in the upstream issue like
you suggested, which I'd like to ask the openSUSE devs to please follow if you
believe this can be customized or prioritized by the OS:

https://bugs.freedesktop.org/show_bug.cgi?id=105425

--
You are receiving this mail because:
You are on the CC list for the bug.
< Previous Next >