[Bug 1161720] New: i915 hang continues in 5.4.12-1-default
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 Bug ID: 1161720 Summary: i915 hang continues in 5.4.12-1-default Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: cwdillon@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 828183 --> http://bugzilla.opensuse.org/attachment.cgi?id=828183&action=edit contents of /sys/class/drm/card0/error I dropped down to 5.4.12-default today when the latest Tumbleweed release came out and have already experienced graphical environment freeze with this error. Perhaps I misunderstood that Tumbleweed release managers applied the drm/i915/gt patch to 5.4.12 default kernel? I understand that this patch is not planned to be applied to 5.4 branch kernels by i915 drm managers. (https://www.spinics.net/lists/stable/msg351278.html) I hadn't experienced any freezes under 5.4.13 pre-release from Kernel:next (but I also don't any support for Bumblebee/NVIDIA in that kernel). Also attaching the contents of dmesg | grep i915: ``` [ 3.845376] i915 0000:00:02.0: vgaarb: deactivate vga console [ 3.847415] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 3.848292] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 4.529697] [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0 [ 4.681033] fbcon: i915drmfb (fb0) is primary device [ 4.725228] i915 0000:00:02.0: fb0: i915drmfb frame buffer device [ 9.671989] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915]) [ 9.721605] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915]) [ 3021.917516] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0 [ 3021.917518] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 3021.918523] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 3021.919248] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 3021.919351] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 3021.921095] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 3021.921814] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} ``` -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiwai@suse.com --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- (In reply to Clarence Dillon from comment #0)
Perhaps I misunderstood that Tumbleweed release managers applied the drm/i915/gt patch to 5.4.12 default kernel? I understand that this patch is not planned to be applied to 5.4 branch kernels by i915 drm managers. (https://www.spinics.net/lists/stable/msg351278.html)
It's not included in TW kernel, either Do you mean this has to be included for addressing your problem...?
I hadn't experienced any freezes under 5.4.13 pre-release from Kernel:next (but I also don't any support for Bumblebee/NVIDIA in that kernel).
Do you mean OBS Kernel:stable repo? The difference regarding i915 between 5.4.12 and 5.4.13 is just one patch to add the inclusion of linux/math64.h, so it's likely irrelevant. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c2 --- Comment #2 from Clarence Dillon <cwdillon@gmail.com> --- Sorry, you're correct. I got the 5.4.13 kernel from Kernel:stable/standard I saw _some_ drm/i915/gt fix was applied to to 5.4.12 because of this comment in Factory/kernel-source (line 56) and expected it to be included in the next TW release. ...don't changes to Factory get released as the next TW release when it passes the automated testing? https://build.opensuse.org/package/rdiff/openSUSE:Factory/kernel-source?linkrev=base&rev=521 Anyway, when I first reported the i915 hang to the Intel project team on freedesktop.org (https://gitlab.freedesktop.org/drm/intel/issues/993) they told me that it should be fixed by that patch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c3 --- Comment #3 from Takashi Iwai <tiwai@suse.com> --- The TW release testing is mostly performed on openQA, so i915 issues aren't covered. So, you need a backport fix of the suggested patch? Then I can try to build a test kernel (if possible). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c4 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cwdillon@gmail.com Flags| |needinfo?(cwdillon@gmail.co | |m) --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- A test kernel package with the backported patch is being built in OBS home:tiwai:bsc1161720 repo now. It'll take some time (for an hour or so) until the build finishes. Please give it a try after the build finishes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c5 --- Comment #5 from Clarence Dillon <cwdillon@gmail.com> --- Thank you! I'll watch it and switch over when it's ready. Then I can follow up tomorrow when I've had some time to check. The hang is intermittent, so... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c6 Martin Wilck <martin.wilck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |martin.wilck@suse.com --- Comment #6 from Martin Wilck <martin.wilck@suse.com> --- *** Bug 1161785 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c7 --- Comment #7 from Clarence Dillon <cwdillon@gmail.com> --- So far today, I've had no desktop freezes, which makes me pretty happy. There are some related problem with the Intel driver that causes other symptoms. - Booting takes an unusually long time, then open onto a blank screen, which turns out to be in some power saving mode. Mouse or keyboard wakes the screen up to a login prompt, but no dots are visible in the fields. Still, I can login and wait again for the desktop to appear. - I'm running on a laptop + docking station + external monitor. After about 60 sec inactivity on either screen, that screen enters a power save mode (fade to black) which wakes up if I move the mouse to that desktop. This is annoying since I often read from one screen and work on the other, making me have to wake up the laptop screen every other minute. - Chromium (and all Chrome browser based apps) fail to start. The error is `libva error: /usr/local/lib/dri/iHD_drv_video.so init failed`. Web search shows some others getting the same error recently, but I have not looked into enough to know whether the cause is related. This libva issue is not present on the current TW kernel 5.4.13, at least for me. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c8 --- Comment #8 from Takashi Iwai <tiwai@suse.com> --- OK, I pushed the fix to stable/for-next branch now. Hopefully it'll be merged soon later and will be included in the TW kernel later. I also backported the fix to SLE15-SP2 (i.e. Leap 15.2) branch too. Are the rest issues the regressions from the earlier kernels? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c9 --- Comment #9 from Clarence Dillon <cwdillon@gmail.com> --- No, those are all new in this kernel. I suspect the underlying cause is that there is no bbswitch-kmp-default for 5.4.14 yet. The specific error I gave you was the wrong line from my cut-paste history of issues and searches. (That error is what Ubuntu is giving--most of the reports. Of course, we have it at `/usr/lib64/dri/iHD_drv_video.so` . Should I open a new bug for that? Or just wait for the next TW release and see if it's still present with the rest of the libraries in alignment? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c10 --- Comment #10 from Clarence Dillon <cwdillon@gmail.com> --- I just added the bbswitch-kmp-default 5.4.14 from X11:/Bumblebee/Kernel_stable_standard and Chromium & Chromium-based apps are now working again, so that seems to have been the cause. As far as I can tell, everything is working properly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c11 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED Flags|needinfo?(cwdillon@gmail.co | |m) | --- Comment #11 from Takashi Iwai <tiwai@suse.com> --- Thanks, then let's close now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c12 --- Comment #12 from Takashi Iwai <tiwai@suse.com> --- It turned out that my backport fix had an off-by-one error and caused another regression. Meanwhile the stable branch was already moved to 5.5 kernel base, and the upstream fix is included there. So this will be fixed in anyway in the next release with 5.5 kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c13 --- Comment #13 from Clarence Dillon <cwdillon@gmail.com> --- Thanks. I'll stay in this config until TW is released with 5.5. I have discovered a few glitchy things (like laptop screen keeps falling asleep) but I can live with things like this for a while. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c16 --- Comment #16 from Martin Wilck <martin.wilck@suse.com> --- It just happened again with 5.4.14-2.1.g3041591. I guess I can't keep the assertion that that patch actually fixed anything. I have the vague impression though that the problems I see are related to chrome (RocketChat), which suggests that I may be looking at a Mesa-related issue. I don't feel qualified to dig much deeper, I'm more a dumb user than anything else in this area. (In reply to Takashi Iwai from comment #15)
@Takashi, would updating SLE15-SP2 to the 5.5 code base be an option?
Unlikely. We should ask upstream for fixing 5.4.y properly. 5.4.y is LTS stable kernel, so they are responsible for fixing it further.
Ack. I already said so in the Gitlab issue. Maybe you can reach out to Chris/Intel, too? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c17 --- Comment #17 from Takashi Iwai <tiwai@suse.com> --- Martin, could you confirm that the issue is still present on SLE15-SP2 / Leap 15.2 kernels? Since TW shall be fixed after moving to 5.5, we'd need to track the bug specifically for SLE15-SP2. I wonder how is the best way to trigger the bug. I can provide a hackish patch, judging from the information in gitlab issue, a partial revert of f8c08d8faee5567803c8c533865296ca30286bbf. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1161720 http://bugzilla.opensuse.org/show_bug.cgi?id=1161720#c18 --- Comment #18 from Clarence Dillon <cwdillon@gmail.com> --- Takashi, Since it looks like the patch for this is still in the pipeline for a fix in 5.5, I thought I'd try to get ahead of an issue that will arise there. I mentioned before that the only problem I was experiencing with 5.5 was that I needed to reinstall nvidia/bumblebee. It turns out that the _actual problem_ is that nvidia drivers will not build on 5.5. There is a patch, but nvidia seems to be planning to wait until <after v440.44> to implement it. Our bumblebee driver is currently at 418.113. https://devtalk.nvidia.com/default/topic/1068332/linux/nvidia-driver-does-no... So, not much to look forward to in 5.5, I'm afraid. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com