[Bug 1183872] New: Regression: System hang when connecting HMDI with i915
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 Bug ID: 1183872 Summary: Regression: System hang when connecting HMDI with i915 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: milanfix@protonmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 847526 --> http://bugzilla.opensuse.org/attachment.cgi?id=847526&action=edit Hardware info This bug doesn't occur on Leap 15.2 but it happens very consistently on Tumbleweed. How to reproduce: 1) Connect an external screen by HDMI *after* turning on the computer Expected behavior: The screen is recognized Actual behavior: Both screen go blank and the system is unusable If the screen is connected before turning on the computer then it will work as expected, but if it's re-plugged it will still hang. This isn't a desktop environment specific issue and it happens even without any X server at all. Speculation: This is most likely a kernel issue, specifically a bug in the i915 driver. If so, it was introduced between Linux 5.3 and 5.11. Things that were tried: i915.enable_psr=0 still, intel_idle.max_cstate=1 and i915.enable_dc=0 where all tried but the bug still occur. Tried to get a dump with kdump but nothing was written in /var/crash. Any tips to debug this issue so it can be narrowed down would be really helpful. I'm currently trying to find a way to test every kernel from 5.3 to 5.11 without having to actually compile them on tumbleweed, so it can, without a doubt confirmed it's a kernel issue and also find the version on which the regression was introduced. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiwai@suse.com --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Could you check the older kernels in TW history repo and see whether the problem is seen there? http://download.opensuse.org/history/ Some even older kernels are found in my OBS kernel repos, e.g. OBS home:tiwai:kernel:5.9, home:tiwai:kernel:5.8, ..., too: http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.9/standard/ -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c2 --- Comment #2 from Takashi Iwai <tiwai@suse.com> --- For the kdump: might it help to set panic_on_oops sysctl? And, make sure that kdump really works as expected beforehand, e.g. triggering via "echo -c > /proc/sysrq-trigger". Last but not least, if you find some more information, please report it to the upstream, gitlab.freedesktop.org Issues. Thanks. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c3 Imnotgivingmy nametoamachine <milanfix@protonmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED --- Comment #3 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- Thank you very much for maintaining your repository with all those previous versions. I found this regression was introduced between Linux 5.8.15-1.1.gc680e93 and 5.9.14-1.1.gc648a46 Then tested with 5.12.rc4-1.1.g094141b just to make sure it's still present. I'll report this to upstream then in https://gitlab.freedesktop.org/drm/intel now that i'm 100% sure that this is a kernel bug. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c4 Felix Miata <mrmazda@earthlink.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mrmazda@earthlink.net --- Comment #4 from Felix Miata <mrmazda@earthlink.net> --- FWIW I think this is a laptop-only issue. I just booted TW20210320 on Kaby Lake graphics while connected to display via DisplayPort. After boot completion I plugged in an HDMI cable to a second display, and both are working as expected producing 2560x2520 desktop via modesetting DDX. Is the external display connected only to the laptop's HDMI, or is some other cable simultaneously connected to some other device's output even though not powered up? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c6 --- Comment #6 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- Created attachment 847713 --> http://bugzilla.opensuse.org/attachment.cgi?id=847713&action=edit dmesg with call trace referencing i915 after connecting a screen by HDMI
FWIW I think this is a laptop-only issue. Or a GeminiLake-only issue. But yes it doesn't seem to affect every device using i915.
Is the external display connected only to the laptop's HDMI? Yes, that's the only external screen connected to the only HDMI port on that laptop. There's also the internal screen of course.
or is some other cable simultaneously connected to some other device's output even though not powered up? No
Please give the gitlab issues URL https://gitlab.freedesktop.org/drm/intel/-/issues/3285
And, any attemp to kdump wasn't successful? Nope, I didn't had any luck with kdump, ended up disabling it since I never got any logs but still consumed the 210MB of memory. On the bright side just a few hours ago I finally got a dmesg log with a call trace that reference i915 in it. It can be found on the link but I'll attach it here for completeness.
-- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c7 --- Comment #7 from Takashi Iwai <tiwai@suse.com> --- (In reply to Imnotgivingmy nametoamachine from comment #6)
Created attachment 847713 [details] dmesg with call trace referencing i915 after connecting a screen by HDMI
The kernel stack trace shown there is from the wireless stack, so it's not directly something to do with i915 stuff. Are you enabling netconsole? It might be the reason. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c8 --- Comment #8 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Takashi Iwai from comment #7)
(In reply to Imnotgivingmy nametoamachine from comment #6)
Created attachment 847713 [details] dmesg with call trace referencing i915 after connecting a screen by HDMI
The kernel stack trace shown there is from the wireless stack, so it's not directly something to do with i915 stuff.
Are you enabling netconsole? It might be the reason.
Yes, I was told to use netconsole to capture the dmesg log as I reproduced the bug. Dang I was hoping this log would be it. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c9 --- Comment #9 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- I'm very confused about the nature of this bug and if it's really a bug in the kernel and not some other component as I have tried with Arch and Ubuntu and I was able to reproduce this bug on Arch with their install media iso that doesn't have xorg, and so far that seemed normal. But I also tested this bug on the same hardware with ubuntu 18.04, 20.04, 20.10 and 21.04 all with the kernels they come with and also with https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.12-rc4/ As far as I know mainline doesn't have ubuntu's patches so it should be "vanilla". But I have this bug on opensuse with 5.12-rc4-vanilla (and arch) and not on ubuntu with 5.12-rc4-mainline. Takashi Iwai, is it possible for you to do an updated vanilla build of drm-tip as you have in http://download.opensuse.org/repositories/home:/tiwai:/kernel:/drm-tip/stand... but updated to include commits from the last two weeks? I'm very confused by not being able to reproduce this bug on ubuntu and testing with drm-tip on opensuse is one of the few things I haven't tried. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c10 --- Comment #10 from Takashi Iwai <tiwai@suse.com> --- It's being rebuilt now. I didn't notice that the automatic update stopped working... -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c11 Imnotgivingmy nametoamachine <milanfix@protonmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |IN_PROGRESS --- Comment #11 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Takashi Iwai from comment #10)
It's being rebuilt now. I didn't notice that the automatic update stopped working...
Thank you very much Takashi! I tested with (drm-tip) kernel-vanilla-5.12.rc4-2.1.ga3c6ee1 but sadly the bug was still present. I then went ahead and tested with kernel-vanilla-5.12.rc5-1.1.g5fe2d5c because I notice it was available too, you know just to make sure the bug was indeed still present and all... But the bug doesn't happen with 5.12.rc5-vanilla... Just what the fuck? Was the bug really fixed in exactly the week I reported the bug but without finding an useful log or call trace? And why the bug didn't happened on ubuntu? To be honest despite the bug looking like it's fixed on 5.12.rc5 it's now more annoying not knowing why or how it was fixed haha I'm legit going to read all the commits between 5.12.rc4 and 5.12.rc5 That aside what should the status of this bug be? Is it "Resolved" as soon as there is a patch coming up in a future stable version or it's Resolved until that patch finally reaches a stable kernel in opensuse? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c13 --- Comment #13 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Takashi Iwai from comment #12)
(In reply to Imnotgivingmy nametoamachine from comment #11)
(In reply to Takashi Iwai from comment #10)
It's being rebuilt now. I didn't notice that the automatic update stopped working...
Thank you very much Takashi! I tested with (drm-tip) kernel-vanilla-5.12.rc4-2.1.ga3c6ee1 but sadly the bug was still present.
I then went ahead and tested with kernel-vanilla-5.12.rc5-1.1.g5fe2d5c because I notice it was available too, you know just to make sure the bug was indeed still present and all... But the bug doesn't happen with 5.12.rc5-vanilla...
Just what the fuck? Was the bug really fixed in exactly the week I reported the bug but without finding an useful log or call trace? And why the bug didn't happened on ubuntu? To be honest despite the bug looking like it's fixed on 5.12.rc5 it's now more annoying not knowing why or how it was fixed haha
I'm legit going to read all the commits between 5.12.rc4 and 5.12.rc5
That aside what should the status of this bug be? Is it "Resolved" as soon as there is a patch coming up in a future stable version or it's Resolved until that patch finally reaches a stable kernel in opensuse?
You need to really make sure that it's fixed in 5.12-rc5 code itself. For example, 5.12-rc5 kernel you've tested also contains the kernel configuration change to make CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=n. This might be the cause, too.
And, if that's the case, you could work around in other kernels by passing snd_hda_codec_hdmi.enable_silent_stream=0 boot option, too.
This might be the issue (affected by this bug) config-5.11.6-1-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
(affected by this bug) config-5.12.0-rc4-2.ga3c6ee1-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
(drm-tip, affected by this bug) config-5.12.0-rc4-3.ge9c25f7-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
All kernels build by ubuntu (none affected):
# CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM is not set
(not affected) config-5.12.0-rc5-1.g5fe2d5c-vanilla:
# CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM is not set
5.12.0-rc4-2.ga3c6ee1-vanilla booted with snd_hda_codec_hdmi.enable_silent_stream=0 results in the gpu freeze to never happen Also from https://cateee.net/lkddb/web-lkddb/SND_HDA_INTEL_HDMI_SILENT_STREAM.html
found in Linux kernels: 5.9���5.11, 5.12-rc+HEAD 5.9 was the first kernel I had issues when testing with the kernels in your repo.
Looks like silent_stream is indeed the issue, how did you find it? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c14 --- Comment #14 from Takashi Iwai <tiwai@suse.com> --- Heh, it was a blind shot. OK, then the next problem is why this causes a problem. Does your hardware have only Intel graphics chip, and no discrete GPU for rendering, right? Just to be sure which driver is involved in this game. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c15 --- Comment #15 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Takashi Iwai from comment #14)
Heh, it was a blind shot.
OK, then the next problem is why this causes a problem.
Does your hardware have only Intel graphics chip, and no discrete GPU for rendering, right? Just to be sure which driver is involved in this game.
Exactly so, only Intel graphics UHD Graphics 600 and no discrete GPU. I'm not sure how the hdmi port is wired inside the laptop, as in if it goes directly to the GPU or goes through some other controller. A motherboard schematic might be useful maybe? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 Imnotgivingmy nametoamachine <milanfix@protonmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Regression: System hang |System hang when connecting |when connecting HMDI with |HMDI on i915 with |i915 |CONFIG_SND_HDA_INTEL_HDMI_S | |ILENT_STREAM=y -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c16 --- Comment #16 from Takashi Iwai <tiwai@suse.com> --- OK, then I need to ping Intel audio people. Since the issue is with the HDMI audio: can you play via HDMI audio at all? At least no hang happens? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c17 --- Comment #17 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Takashi Iwai from comment #16)
OK, then I need to ping Intel audio people.
Since the issue is with the HDMI audio: can you play via HDMI audio at all? At least no hang happens?
Yes, I can play audio over HDMI on the TV just fine with Linux 5.8 or previous, or Linux 5.9 and later with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM disabled. Should I close the issue on DRM Intel? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c18 --- Comment #18 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Imnotgivingmy nametoamachine from comment #17)
(In reply to Takashi Iwai from comment #16)
OK, then I need to ping Intel audio people.
Since the issue is with the HDMI audio: can you play via HDMI audio at all? At least no hang happens?
Yes, I can play audio over HDMI on the TV just fine with Linux 5.8 or previous, or Linux 5.9 and later with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM disabled.
Should I close the issue on DRM Intel?
Oh, another detail that might be useful. Even with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM enabled this bug doesn't happen if the screen is connected *before* turning on the computer, and HDMI audio works too. But it isn't reliable as reconnecting the external screen, or changing the resolution, or even so much as changing the screen mode (changing to mirror for example) will then trigger this bug crashing the system. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c19 Kai Vehmanen <kai.vehmanen@intel.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kai.vehmanen@intel.com --- Comment #19 from Kai Vehmanen <kai.vehmanen@intel.com> --- Kai from Intel audio team joining the thread. Thanks for the bug report and providing lot of details on the case. We'll try to reproduce this bug on some GeminiLake system. If that fails, we might need some further details. This is probably happening in linux/sound/pci/hda/patch_hdmi.c:sync_eld_via_acomp(). Commenting out "silent_stream_enable()" and "silent_stream_disable()" calls in that function would be interesting experiment on a setup that triggers this. It could be the snd_hda_power_up_pm() call, but that would seem unlikely, so most probably something goes wrong in silent_stream_enable(). -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c20 --- Comment #20 from Imnotgivingmy nametoamachine <milanfix@protonmail.com> --- (In reply to Kai Vehmanen from comment #19)
This is probably happening in linux/sound/pci/hda/patch_hdmi.c:sync_eld_via_acomp(). Commenting out "silent_stream_enable()" and "silent_stream_disable()" calls in that function would be interesting experiment on a setup that triggers this. It could be the snd_hda_power_up_pm() call, but that would seem unlikely, so most probably something goes wrong in silent_stream_enable().
Say no more! One kernel compilation that might yield some concrete results I can bare (since I'm currently with very weak hardware and compiling the kernel takes over an hour at best, and not a few minutes or even seconds like probably does in your development machines) I'll do it tomorrow since it's late here, or if Takashi can spoil me a bit more with a patched kernel from OBS that would be awesome *wink* *wink* Also speaking of OBS is it open to everyone or just for suse developers? I can probably learn to use it and patch test kernels myself, (if I can use it) so I don't waste Takashi's time with boring, one time kernel builds for testing. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c21 --- Comment #21 from Takashi Iwai <tiwai@suse.com> --- Don't worry about the kernel build; it's an easy task for me if a proper patch is provided. After all, it's about the code I've been maintaining in the upstream, so I'd have to be involved in anyway :) About OBS: yes, every one can use it and can build the stuff as they want. It's only that the (open)SUSE kernel build is a bit tricky in comparison with other packages. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com