http://bugzilla.novell.com/show_bug.cgi?id=1050256 Bug ID: 1050256 Summary: GPU hang Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.3 Hardware: x86-64 OS: SUSE Other Status: NEW Severity: Normal Priority: P5 - None Component: KDE Workspace (Plasma) Assignee: opensuse-kde-bugs@opensuse.org Reporter: davejplater@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- In plasma5 the work space freezes for a period intermittently and journalctl has this output: Jul 24 10:14:49 arbuthnot kernel: [drm] GPU HANG: ecode 6:0:0xbd73ffff, in plasmashell [3124], reason: Hang on render ring, action: reset Jul 24 10:14:49 arbuthnot kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jul 24 10:14:49 arbuthnot kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jul 24 10:14:49 arbuthnot kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jul 24 10:14:49 arbuthnot kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jul 24 10:14:49 arbuthnot kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jul 24 10:14:49 arbuthnot kernel: drm/i915: Resetting chip after gpu hang Sometimes it can occur continuously but ctl-alt-f1 to a console and init 3 is still possible. This is after a zypper dup --no-allow-vendor-change from 42.2 where this problem didn't occur. Setting nomodeset at boot makes the problem go away. My graphics is : 08: PCI 02.0: 0300 VGA compatible controller (VGA) [Created at pci.378] Unique ID: _Znp.Ek_1fzLhuA5 SysFS ID: /devices/pci0000:00/0000:00:02.0 SysFS BusID: 0000:00:02.0 Hardware Class: graphics card Model: "Intel 2nd Generation Core Processor Family Integrated Graphics Controller" Vendor: pci 0x8086 "Intel Corporation" Device: pci 0x0102 "2nd Generation Core Processor Family Integrated Graphics Controller" SubVendor: pci 0x105b "Foxconn International, Inc." SubDevice: pci 0x0d8d Revision: 0x09 Memory Range: 0xf7800000-0xf7bfffff (rw,non-prefetchable) Memory Range: 0xe0000000-0xefffffff (ro,non-prefetchable) I/O Ports: 0xf000-0xf03f (rw) IRQ: 11 (no events) I/O Ports: 0x3c0-0x3df (rw) Module Alias: "pci:v00008086d00000102sv0000105Bsd00000D8Dbc03sc00i00" Driver Info #0: Driver Status: i915 is active Driver Activation Cmd: "modprobe i915" Config Status: cfg=no, avail=yes, need=no, active=unknown Primary display adapter: #8 My cpu is an "Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c2 --- Comment #2 from Dave Plater <davejplater@gmail.com> --- As suggested on the factory ml, I've removed drm-kmp-default, nomodeset and rebooted. So far I haven't had a gpu hang. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 Dave Plater <davejplater@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c3 --- Comment #3 from Dave Plater <davejplater@gmail.com> --- I think I can safely state that removing drm-kmp-default definitely fixes the gpu hangs. It shouldn't be pulled in for affected gpu's -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c4 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |davejplater@gmail.com Flags| |needinfo?(davejplater@gmail | |.com) --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- OK, could you check what is the typical way to reproduce the bug? I'll try to test a SandyBridge machine here, but I'd like to know the procedure. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c5 Dave Plater <davejplater@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(davejplater@gmail | |.com) | --- Comment #5 from Dave Plater <davejplater@gmail.com> --- To reproduce: install drm-kmp-default in a 42.3 installation and boot into plasma5. I normally have a saved session with firefox and konsole both with multiple tabs and one kwrite instance. Four desktops firefox on 1 and konsole/kwrite on 4. The hanging starts when I open thunderbird on desktop 2 and go back to firefox. Using sddm window manager. I'll try xfce which is my backup gui. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c6 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |patrik.jakobsson@suse.com --- Comment #6 from Takashi Iwai <tiwai@suse.com> --- OK, thanks. I think I can see the issue reliably on the local machine. I just need to boot with smaller memory, e.g. mem=1G boot option, start KDE, then open Firefox. That triggers the GPU hang immediately. It implies an issue in the page handling. BTW, are you using intel X driver (i.e. xf86-video-intel is installed?) On a freshly installed Leap 42.3 system, I didn't have it but the modesetting driver is used instead. The problem happens no matter which X driver is used, so it doesn't matter much, but the devils live always in details, hence I'd like to make sure. In anyway, it'd be good to know whether this happens on XFCE, too (with or without copmositor). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c7 --- Comment #7 from Dave Plater <davejplater@gmail.com> --- i+ | xf86-video-amdgpu | package | 1.3.0-1.1 | x86_64 | oss i+ | xf86-video-amdgpu | package | 1.3.0-1.1 | x86_64 | Main Repository (OSS) i+ | xf86-video-fbdev | package | 0.4.4-9.4 | x86_64 | oss i+ | xf86-video-fbdev | package | 0.4.4-9.4 | x86_64 | Main Repository (OSS) i+ | xf86-video-intel | package | 2.99.917.770_gcb6ba2da-1.3 | x86_64 | oss i+ | xf86-video-intel | package | 2.99.917.770_gcb6ba2da-1.3 | x86_64 | Main Repository (OSS) i+ | xf86-video-nouveau | package | 1.0.15-1.3 | x86_64 | oss i+ | xf86-video-nouveau | package | 1.0.15-1.3 | x86_64 | Main Repository (OSS) i+ | xf86-video-vesa | package | 2.3.4-9.4 | x86_64 | oss i+ | xf86-video-vesa | package | 2.3.4-9.4 | x86_64 | Main Repository (OSS) I just rebooted with drm-kmp-default reinstalled but after deleting nomodeset I set runlevel 3 and tailed journalctl on tty2 and init 5 on tty1 and I haven't had a hang in plasma5 yet. I've confirmed that "fbcon: inteldrmfb (fb0) is primary device" is in the journal, it only occurs when I boot without nomodeset and drm-kmp-default is installed. Ill try a reboot straight into runlevel 5 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c8 --- Comment #8 from Dave Plater <davejplater@gmail.com> --- It's happened when in firefox and thunderbird's new mail notification popped up. I'm not confident that I can reproduce in xfce because I don't think that it's load will be enough. Now I'm trying to reproduce it, even tried a full screen video, it's hard to reproduce. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c9 --- Comment #9 from Takashi Iwai <tiwai@suse.com> --- Could you try to test XFCE with the smaller memory size as I did? In my case with KDE, mem=1G sufficed to trigger the problem quickly. You can try a slightly smaller value, too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c10 --- Comment #10 from Dave Plater <davejplater@gmail.com> --- (In reply to Takashi Iwai from comment #9)
Could you try to test XFCE with the smaller memory size as I did? In my case with KDE, mem=1G sufficed to trigger the problem quickly. You can try a slightly smaller value, too.
You mean video memory, the lowest I can go is 32M framebuffer and 128M graphics. I'm using xfce now with those minimums. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c11 --- Comment #11 from Takashi Iwai <tiwai@suse.com> --- No, I meant the whole RAM size. You can limit the size by passing mem=XXX boot option, where XXX is the size (e.g. 1G, 512M, etc). The problem of i915 driver is tied with the RAM size. When user-space use more memory and the free page becomes tight, the system tries to swap out, and the i915 driver tries shrink its page lists. The problem seems happening during it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c12 --- Comment #12 from Dave Plater <davejplater@gmail.com> --- plasma5 had a hang immediately with mem=1G. After ctrl-backspace time 2 I logged into xfce and apart from being very slow switching applications/desktops, I had to close kicad, I haven't had a gpu hang yet even with the thunderbird pop up. Going back to my normal 4G, I've got work to do. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c13 --- Comment #13 from Dave Plater <davejplater@gmail.com> --- Created attachment 733736 --> http://bugzilla.novell.com/attachment.cgi?id=733736&action=edit gpu crash dump Crash dump from the last hang with mem=1G -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c14 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(davejplater@gmail | |.com) --- Comment #14 from Takashi Iwai <tiwai@suse.com> --- This looks like a regression caused by the recent PM fix. I found a paper-over patch in the recent upstream, so I tried to backport it, and this seems working. The test drm-kmp package is being built in OBS home:tiwai:branches:openSUSE:Leap:42.3:Update/drm repo. Retrieve the rpm via osc, osc getbinaries home:tiwai:branches:openSUSE:Leap:42.3:Update/drm/standard/x86_64 Could you test this kmp? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c15 --- Comment #15 from Takashi Iwai <tiwai@suse.com> --- (In reply to Takashi Iwai from comment #14)
Retrieve the rpm via osc, osc getbinaries home:tiwai:branches:openSUSE:Leap:42.3:Update/drm/standard/x86_64
Now finally the package was published, too: http://download.opensuse.org/repositories/home:/tiwai:/branches:/openSUSE:/L... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c16 Dave Plater <davejplater@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(davejplater@gmail | |.com) | --- Comment #16 from Dave Plater <davejplater@gmail.com> --- Installed via osc getbinaries and then booted into plasma5 with mem=1G and no gpu hangs even with the thunderbird pop up. Looks like you fixed the bug. I'm now on normal 4G memory. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 http://bugzilla.novell.com/show_bug.cgi?id=1050256#c17 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #17 from Takashi Iwai <tiwai@suse.com> --- Thanks. I submitted the fix now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 SMASH SMASH <smash_bz@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|obs:running:7039:important |obs:running:7039:important | |maint:planned:update -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 Bob Goddard <suse-20050616@bgcomp.co.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |suse-20050616@bgcomp.co.uk -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=1050256 SMASH SMASH <smash_bz@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|maint:planned:update | -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com