[Bug 805300] New: X server locks up after a random time of gaming.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c0 Summary: X server locks up after a random time of gaming. Classification: openSUSE Product: openSUSE 12.3 Version: RC 1 Platform: x86-64 OS/Version: SUSE Other Status: NEW Severity: Critical Priority: P5 - None Component: X.Org AssignedTo: bnc-team-xorg-bugs@forge.provo.novell.com ReportedBy: novell.com@marekpasnikowski.name QAContact: xorg-maintainer-bugs@forge.provo.novell.com Found By: --- Blocker: --- Created an attachment (id=526135) --> (http://bugzilla.novell.com/attachment.cgi?id=526135) dmesg output User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) rekonq/2.1 Safari/534.34 I use FirePro V3700. glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD RV620 OpenGL version string: 3.0 Mesa 9.0.2 OpenGL shading language version string: 1.30 After a random time (it can be quite long) X server locks up and display starts flickering. Reproducible: Always Steps to Reproduce: Play a 3D game on Radeon V3700. Actual Results: Relevant snippet of dmesg: [ 3630.449209] radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec [ 3630.449223] radeon 0000:02:00.0: GPU lockup (waiting for 0x0000000000035101 last fence id 0x0000000000035100) [ 3630.450306] radeon 0000:02:00.0: Saved 185 dwords of commands on ring 0. [ 3630.450316] radeon 0000:02:00.0: GPU softreset: 0x00000003 [ 3630.462480] radeon 0000:02:00.0: R_008010_GRBM_STATUS = 0xA0003030 [ 3630.462483] radeon 0000:02:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [ 3630.462484] radeon 0000:02:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [ 3630.462486] radeon 0000:02:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 3630.462488] radeon 0000:02:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ 3630.462490] radeon 0000:02:00.0: R_00867C_CP_BUSY_STAT = 0x00020186 [ 3630.462492] radeon 0000:02:00.0: R_008680_CP_STAT = 0x80028645 [ 3630.462494] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 3630.477453] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 3630.492412] radeon 0000:02:00.0: R_008010_GRBM_STATUS = 0xA0003030 [ 3630.492414] radeon 0000:02:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [ 3630.492416] radeon 0000:02:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [ 3630.492418] radeon 0000:02:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 3630.492420] radeon 0000:02:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ 3630.492421] radeon 0000:02:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ 3630.492423] radeon 0000:02:00.0: R_008680_CP_STAT = 0x80100000 [ 3630.496610] radeon 0000:02:00.0: GPU reset succeeded, trying to resume [ 3630.513495] [drm] probing gen 2 caps for device 10de:3e8 = 1/0 [ 3630.515180] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 3630.515218] radeon 0000:02:00.0: WB disabled [ 3630.515222] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000010000004 and cpu addr 0xffff880035232004 [ 3630.515224] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000010000c0c and cpu addr 0xffff880035232c0c [ 3630.546315] [drm] ring test on 0 succeeded in 1 usecs [ 3630.546371] [drm] ring test on 3 succeeded in 1 usecs [ 3630.553355] [drm] ib test on ring 0 succeeded in 0 usecs [ 3630.553392] [drm] ib test on ring 3 succeeded in 1 usecs [ 3641.023715] radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec [ 3641.023729] radeon 0000:02:00.0: GPU lockup (waiting for 0x0000000000035109 last fence id 0x0000000000035105) [ 3641.254434] radeon 0000:02:00.0: couldn't schedule ib [ 3641.254446] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.254858] radeon 0000:02:00.0: couldn't schedule ib [ 3641.254865] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.255420] radeon 0000:02:00.0: couldn't schedule ib [ 3641.255428] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.256433] radeon 0000:02:00.0: couldn't schedule ib [ 3641.256441] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.257095] radeon 0000:02:00.0: couldn't schedule ib [ 3641.257105] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.258179] radeon 0000:02:00.0: couldn't schedule ib [ 3641.258190] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.258743] radeon 0000:02:00.0: couldn't schedule ib [ 3641.258750] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.260073] radeon 0000:02:00.0: couldn't schedule ib [ 3641.260081] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.262625] radeon 0000:02:00.0: couldn't schedule ib [ 3641.262636] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.262874] radeon 0000:02:00.0: couldn't schedule ib [ 3641.262879] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.263180] radeon 0000:02:00.0: couldn't schedule ib [ 3641.263186] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.263459] radeon 0000:02:00.0: couldn't schedule ib [ 3641.263465] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.270927] radeon 0000:02:00.0: couldn't schedule ib [ 3641.270941] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3641.277794] radeon 0000:02:00.0: couldn't schedule ib [ 3641.277806] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3644.781769] radeon 0000:02:00.0: couldn't schedule ib [ 3644.781781] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3644.842193] radeon 0000:02:00.0: couldn't schedule ib [ 3644.842200] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3644.842608] radeon 0000:02:00.0: couldn't schedule ib [ 3644.842611] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3644.848181] radeon 0000:02:00.0: couldn't schedule ib [ 3644.848189] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3649.445044] type=1400 audit(1361556808.900:44): apparmor="DENIED" operation="change_hat" info="unconfined" error=-1 pid=740 comm="login" [ 3653.936506] radeon 0000:02:00.0: couldn't schedule ib [ 3653.936513] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.937310] radeon 0000:02:00.0: couldn't schedule ib [ 3653.937326] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.938382] radeon 0000:02:00.0: couldn't schedule ib [ 3653.938392] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.938813] radeon 0000:02:00.0: couldn't schedule ib [ 3653.938820] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.940005] radeon 0000:02:00.0: couldn't schedule ib [ 3653.940014] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.941092] radeon 0000:02:00.0: couldn't schedule ib [ 3653.941104] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.942288] radeon 0000:02:00.0: couldn't schedule ib [ 3653.942301] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.942902] radeon 0000:02:00.0: couldn't schedule ib [ 3653.942910] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.947214] radeon 0000:02:00.0: couldn't schedule ib [ 3653.947226] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 3653.952740] radeon 0000:02:00.0: couldn't schedule ib [ 3653.952754] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! Expected Results: Nothing, crash free gaming. I am willing to cooperate in order to solve this problem. With instructions, I can do any all test needed. Also: uname -a Linux hirager-5.site 3.8.0-2-desktop #1 SMP PREEMPT Wed Feb 20 16:06:07 UTC 2013 (86d0404) x86_64 x86_64 x86_64 GNU/Linux The kernel is from one of official OpenSUSE repos (not sure which one anymore, had HEAD in URL or name.). I updated it with hope of getting rid of the crashes. I didn't downgrade because of its radeon improvements. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c1 Stephan Kulow <coolo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |coolo@suse.com Flag| |SHIP_STOPPER- --- Comment #1 from Stephan Kulow <coolo@suse.com> 2013-02-23 16:55:43 CET --- the radeon driver doesn't seem to be in good shape, I agree. Let's hope we get some fixes from upstream for it to fix through updates -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c2 --- Comment #2 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-02-23 16:17:51 UTC --- I have hunted down a repository with mesa 9.2, but the crash still happens. I have one idea: this GPU has 64 bit bus width. What would happen, if the driver did not know about it and tried to transfer more data? Is it a possible cause? In a few days I am going to have the card tested for hardware problems at a specialized workshop as well. If only the blob from AMD worked with current kernel and X, I would have had tested it already... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Status|NEW |ASSIGNED AssignedTo|bnc-team-xorg-bugs@forge.pr |xorg-maintainer-bugs@forge. |ovo.novell.com |provo.novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c3 --- Comment #3 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-02-26 17:39:42 UTC --- I had the card professionally examined - no hardware problems found. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c4 --- Comment #4 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-02-26 22:58:38 UTC --- I have confirmed my find: Rochard is a game which crashes the X 100% reliably. So tell me what can I do to help? I can fetch logs and collect verbose outputs - all I need are the commands. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c5 --- Comment #5 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-03-01 06:08:27 UTC --- To clean up the test environment, I have reinstalled openSUSE from 12.2 NET KDE and upgraded it to factory-tested repositories. I will stick to those for quite a while. If you need data, just tell me how can I get and I will provide it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|X server locks up after a |radeon [RV620] X server |random time of gaming. |locks up after a random | |time of gaming. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c6 --- Comment #6 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-07-16 21:13:28 UTC --- Recently I installed an AMD RV670 card. I have similar behavior with it. On clean OpenSUSE 13.1 M3 I had weird hangups where I could see the cursor move, but nothing else I did to system worked. Can't really define what my problem was because I did not connect that with this bug and did not observe properly. Now I have a combination of OpenSUSE 12.3 with repos providing current devel versions of kernel and Mesa (3.11 and 9.2, respectively) + radeon.dpm=1 (the new proper power management fresh from AMD). What I experience now is (from dmesg) this: radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec [15360.084472] radeon 0000:02:00.0: GPU lockup (waiting for 0x0000000000186a34 last fence id 0x0000000000186a31) [15360.084475] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [15360.084478] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [15360.084480] radeon 0000:02:00.0: ib ring test failed (-35). [15360.085498] radeon 0000:02:00.0: GPU softreset: 0x00000019 [15360.085500] radeon 0000:02:00.0: R_008010_GRBM_STATUS = 0xA23334AC [15360.085501] radeon 0000:02:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15360.085503] radeon 0000:02:00.0: R_000E50_SRBM_STATUS = 0x200000C0 [15360.085505] radeon 0000:02:00.0: R_008674_CP_STALLED_STAT1 = 0x04000000 [15360.085506] radeon 0000:02:00.0: R_008678_CP_STALLED_STAT2 = 0x00000002 [15360.085508] radeon 0000:02:00.0: R_00867C_CP_BUSY_STAT = 0x00008484 [15360.085509] radeon 0000:02:00.0: R_008680_CP_STAT = 0x80818645 [15360.085511] radeon 0000:02:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15360.135245] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF [15360.135318] radeon 0000:02:00.0: SRBM_SOFT_RESET=0x00000100 [15360.137408] radeon 0000:02:00.0: R_008010_GRBM_STATUS = 0xA0003030 [15360.137423] radeon 0000:02:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [15360.137427] radeon 0000:02:00.0: R_000E50_SRBM_STATUS = 0x200080C0 [15360.137430] radeon 0000:02:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [15360.137433] radeon 0000:02:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [15360.137436] radeon 0000:02:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [15360.137442] radeon 0000:02:00.0: R_008680_CP_STAT = 0x80120000 [15360.137447] radeon 0000:02:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [15360.137453] radeon 0000:02:00.0: GPU reset succeeded, trying to resume [15360.138845] [drm] PCIE gen 2 link speeds already enabled [15360.140518] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [15360.140579] radeon 0000:02:00.0: WB enabled [15360.140585] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8801101a5c00 [15360.140592] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8801101a5c0c [15360.171528] [drm] ring test on 0 succeeded in 1 usecs [15360.171599] [drm] ring test on 3 succeeded in 1 usecs [15360.171617] [drm] ib test on ring 0 succeeded in 0 usecs [15360.171634] [drm] ib test on ring 3 succeeded in 1 usecs So this bug really contains two separate problems: 1. GPU lockup. 2. Failure to reset GPU completely and properly. The HD3870 I currently use recovers properly, without lingering after-effects. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c7 --- Comment #7 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-07-20 09:30:40 UTC --- I did some testing since. I have rearranged my repositories a bit and now I can reliably trigger the crash by launching Steam. It does have a chance of successful recovery, which made me think the radeon.dpm=1 solved the issue previously. # zypper lr -P # | Alias | Name | Enabled | Refresh | Priority ---+---------------------------+--------------------------------------------------------------------------------+---------+---------+--------- 11 | google-talkplugin | google-talkplugin | Yes | Yes | 96 5 | Kernel:stable | Kernel:stable | Yes | Yes | 97 8 | X11:XOrg | X11:XOrg | Yes | Yes | 97 3 | KDE_Extra | Additional packages maintained by the KDE team (KDE_Release_410_openSUSE_12.3) | Yes | Yes | 98 4 | KDE_Release_410 | KDE SC 4.10 releases (openSUSE_12.3) | Yes | Yes | 98 6 | Kernel:vanilla | Kernel:vanilla | Yes | Yes | 99 12 | openSUSE-12.3-1.7 | openSUSE-12.3-1.7 | Yes | Yes | 99 14 | repo-debug | openSUSE-12.3-Debug | Yes | Yes | 99 15 | repo-debug-update | openSUSE-12.3-Update-Debug | Yes | Yes | 99 16 | repo-debug-update-non-oss | openSUSE-12.3-Update-Debug-Non-Oss | Yes | Yes | 99 17 | repo-non-oss | openSUSE-12.3-Non-Oss | Yes | Yes | 99 18 | repo-source | openSUSE-12.3-Source | Yes | Yes | 99 19 | repo-update | openSUSE-12.3-Update | Yes | Yes | 99 20 | repo-update-non-oss | openSUSE-12.3-Update-Non-Oss | Yes | Yes | 99 1 | Archiving | Archiving | Yes | Yes | 100 7 | M17N:fonts | M17N | Yes | Yes | 100 10 | games | games | Yes | Yes | 100 2 | GNOME:Ayatana | GNOME:Ayatana | Yes | Yes | 101 9 | devel:languages:pascal | devel:languages:pascal | Yes | Yes | 102 13 | packman | packman | Yes | Yes | 103 I am willing to sacrifice my personal comfort and stick to this setup, so I can perform the tests required to hunt the bug down. As I wrote previously, just tell me what tools to install and how to use them. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c8 --- Comment #8 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-07-20 09:44:37 UTC --- And just as I pressed the "Commit" button here my full of effects, OpenGL2-enabled KDE desktop went down with this crash. So unfortunately, this makes for a very broad range of possible places for the bug to be in... I suggest to change the bug's title to "Radeon R600g driver locks up the GPU." -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805300 https://bugzilla.novell.com/show_bug.cgi?id=805300#c9 --- Comment #9 from Marek Paśnikowski <novell.com@marekpasnikowski.name> 2013-07-20 09:58:30 UTC --- One more observation I forgot to share: what happens with display during the lockup. 1. The image freezes. 2. The image changes (in one(?) frame) to black screen. 3. IF reset successful THEN image returns ELSE IF reset semi-successful THEN image returns frozen and changes back to black ELSE the display reports loss of signal and goes to standby. The V3700 mostly resets "semi-successfully" hence the flickering. The HD3870 goes down permanently after 1 or 2 cycles. Rarely does reset properly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com