Bug ID: 1180742 Summary: [amdgpu]An AMD Vega series GPU randomly crashes Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: x86-64 OS: openSUSE Leap 15.2 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: firstname.lastname@example.org Reporter: email@example.com QA Contact: firstname.lastname@example.org Found By: --- Blocker: ---
Created attachment 844970 --> http://bugzilla.opensuse.org/attachment.cgi?id=844970&action=edit partial kernel log
The AMDGPU kernel driver randomly crashes GPU, usually under load, with Radeon VII hardware. The GPU hang is relatively hard to hit, as it usually takes 5 to 7 days before it crashes. After a hang it attempts to reset the GPU, but sometimes the reset fails and system stays sort of unresponsive. You can still access it over network, and there's some sort of reaction on keyboard events, but display stays dead. Also, it seems to bring PCIe bus down to 1.0 mode, and it stays that until reboot.
There's an upstream bug open that may have something to do about it: https://gitlab.freedesktop.org/drm/amd/-/issues/716
That particular GPU works fine on Windows machine
openSUSE Leap 15.2, kernel 5.3.18-lp152.57-default #1 SMP Fri Dec 4 07:27:58 UTC 2020 (7be5551)