[Bug 1046962] New: radeonsi: 3D engines causing frequent GPU lockups
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962 Bug ID: 1046962 Summary: radeonsi: 3D engines causing frequent GPU lockups Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: X.Org Assignee: xorg-maintainer-bugs@forge.provo.novell.com Reporter: sonichedgehog_hyperblast00@yahoo.com QA Contact: xorg-maintainer-bugs@forge.provo.novell.com Found By: --- Blocker: --- A GPU lockup has once again been introduced in Mesa and / or the RadeonSI driver. As is usual with this sort of thing, the image immediately freezes in place while audio stops and every form of input becomes unresponsive (including the NumLock / CapsLock keyboard leds), the only option being to power the machine off and back on. I started noticing this crash roughly a month ago after a distribution upgrade (openSUSE Tumbleweed). The crash only appears to be caused by 3D rendering. It's probabilistic but very frequent. It is triggered by a variety of game engines, and I've noticed it with at least the following ones: - Blender 3D: When opening certain scenes in Blender and going into Weight Paint mode, the system is bound to crash in at most 5 minutes of usage. - Second Life: Linux native viewers for Second Life also trigger this, I believe somewhere between 5 and 30 minutes estimate. - Xonotic (Darkplaces engine): Starting a game will freeze the machine anywhere between instantly (the moment a game starts) and 30 minutes at most. - The Dark Mod (idTech 4 engine): The same freeze will occur when playing TheDarkMod, anywhere between instantly and roughly 10 minutes at most. - MineCraft: The native version of Minecraft can also trigger the crash, after at most 1 hour of playing a game especially on servers with a lot of geometry. My OS is openSUSE Tumbleweed x64. My current Mesa version is 17.1.3, I can confirm first noticing this in 17.1.1, but I don't know if the issue was introduced in 17.1.0 or prior. My video card is a Radeon R7 370 (Gigabyte), Pitcairn Islands GPU, GCN 1.0, RadeonSI. Official product page: http://www.gigabyte.com/products/product-page.aspx?pid=5469 Please help to address this issue soon: I am unable to use several applications due to the risk they pose to my computer! Logs will be attached soon. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c1
--- Comment #1 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c2
--- Comment #2 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c3
--- Comment #3 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c4
--- Comment #4 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c5
--- Comment #5 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c6
--- Comment #6 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c7
--- Comment #7 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c8
--- Comment #8 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c9
Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c10
--- Comment #10 from Max Staudt
Could be we talking a VRAM memory leak?
Hmm, interesting. I don't have a radeon system at hand, but you can typically monitor memory usage through debugfs. Get a root shell and have a peek through the files in /sys/kernel/debug/dri/0/ - they're named differently for each driver, but radeon definitely has at least one that displays the amount of memory allocated to BOs. Thus, you can SSH in, or run this on a second screen (as root!): watch -n 1 cat /sys/kernel/debug/dri/0/THE_FILE_NAME -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c11
--- Comment #11 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c12
--- Comment #12 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c13
--- Comment #13 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c14
--- Comment #14 from Mircea Kitsune
From what I remember seeing, all of them were 0 byte files at all times. Only radeon_vram and a few binary files in there seemed to have any size.
I'm also not sure how efficient the test can be at this point; The SSH connection goes down the moment the system freezes, alongside all other processes. As such, both the other computer as well as the console would only capture the last moment before the freeze. I initially assumed that the SSH connection and other background processes might stay up, but it seems the whole computer stops responding at the exact moment the freeze occurs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c15
--- Comment #15 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c16
--- Comment #16 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c17
--- Comment #17 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c18
--- Comment #18 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c19
--- Comment #19 from Max Staudt
The problem is that, I'm not sure which file to follow or how, so I might need more specific instructions. The little script you suggested looks like it might work, but there are over 20 files inside debugfs. Perhaps the watch command can be used to track changes in all of them?
Please look at all of the files I suggested in Comment 13 in turn, until you find something that looks like what I described in Comment 10. Then, watch that file, or the file(s), possibly in parallel SSH sessions.
This might however work in case we're talking a gradual leak, and not something that all happens instantly (in less than a second) so that the system freezes before it gets recorded. I'm still assuming it's all instant, but I just realized the other option could still be the case too.
Exactly, let's hope it's a gradual thing. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c20
--- Comment #20 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c21
--- Comment #21 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c22
--- Comment #22 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c23
--- Comment #23 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c24
--- Comment #24 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c25
--- Comment #25 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c26
--- Comment #26 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c27
--- Comment #27 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c28
--- Comment #28 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c29
--- Comment #29 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c30
--- Comment #30 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c31
--- Comment #31 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c32
--- Comment #32 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c33
--- Comment #33 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c34
--- Comment #34 from Max Staudt
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c35
--- Comment #35 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c36
--- Comment #36 from Max Staudt
For the time being, I've decided to test whether this also happens with the RadeonSI scheduler. To make sure I'm applying it to all games across the system, I've added the following line to ~/.profile and restarted:
export R600_DEBUG=sisched
Oh, I didn't know that. Thanks! As for SuperTuxKart: I'm not sure it actually uses the functionality that breaks your system. It's just a guess since AFAIK it's a rather modern game using an engine that supports skeletal animations. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c37
--- Comment #37 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c38
Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c39
--- Comment #39 from Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
Mircea Kitsune
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962
http://bugzilla.opensuse.org/show_bug.cgi?id=1046962#c40
Mircea Kitsune
participants (1)
-
bugzilla_noreply@novell.com