[Bug 1183872] New: Regression: System hang when connecting HMDI with i915
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872 Bug ID: 1183872 Summary: Regression: System hang when connecting HMDI with i915 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: milanfix@protonmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 847526 --> http://bugzilla.opensuse.org/attachment.cgi?id=847526&action=edit Hardware info This bug doesn't occur on Leap 15.2 but it happens very consistently on Tumbleweed. How to reproduce: 1) Connect an external screen by HDMI *after* turning on the computer Expected behavior: The screen is recognized Actual behavior: Both screen go blank and the system is unusable If the screen is connected before turning on the computer then it will work as expected, but if it's re-plugged it will still hang. This isn't a desktop environment specific issue and it happens even without any X server at all. Speculation: This is most likely a kernel issue, specifically a bug in the i915 driver. If so, it was introduced between Linux 5.3 and 5.11. Things that were tried: i915.enable_psr=0 still, intel_idle.max_cstate=1 and i915.enable_dc=0 where all tried but the bug still occur. Tried to get a dump with kdump but nothing was written in /var/crash. Any tips to debug this issue so it can be narrowed down would be really helpful. I'm currently trying to find a way to test every kernel from 5.3 to 5.11 without having to actually compile them on tumbleweed, so it can, without a doubt confirmed it's a kernel issue and also find the version on which the regression was introduced. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c1
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c2
--- Comment #2 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c3
Imnotgivingmy nametoamachine
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c4
Felix Miata
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c6
--- Comment #6 from Imnotgivingmy nametoamachine
FWIW I think this is a laptop-only issue. Or a GeminiLake-only issue. But yes it doesn't seem to affect every device using i915.
Is the external display connected only to the laptop's HDMI? Yes, that's the only external screen connected to the only HDMI port on that laptop. There's also the internal screen of course.
or is some other cable simultaneously connected to some other device's output even though not powered up? No
Please give the gitlab issues URL https://gitlab.freedesktop.org/drm/intel/-/issues/3285
And, any attemp to kdump wasn't successful? Nope, I didn't had any luck with kdump, ended up disabling it since I never got any logs but still consumed the 210MB of memory. On the bright side just a few hours ago I finally got a dmesg log with a call trace that reference i915 in it. It can be found on the link but I'll attach it here for completeness.
-- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c7
--- Comment #7 from Takashi Iwai
Created attachment 847713 [details] dmesg with call trace referencing i915 after connecting a screen by HDMI
The kernel stack trace shown there is from the wireless stack, so it's not directly something to do with i915 stuff. Are you enabling netconsole? It might be the reason. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c8
--- Comment #8 from Imnotgivingmy nametoamachine
(In reply to Imnotgivingmy nametoamachine from comment #6)
Created attachment 847713 [details] dmesg with call trace referencing i915 after connecting a screen by HDMI
The kernel stack trace shown there is from the wireless stack, so it's not directly something to do with i915 stuff.
Are you enabling netconsole? It might be the reason.
Yes, I was told to use netconsole to capture the dmesg log as I reproduced the bug. Dang I was hoping this log would be it. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c9
--- Comment #9 from Imnotgivingmy nametoamachine
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c10
--- Comment #10 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c11
Imnotgivingmy nametoamachine
It's being rebuilt now. I didn't notice that the automatic update stopped working...
Thank you very much Takashi! I tested with (drm-tip) kernel-vanilla-5.12.rc4-2.1.ga3c6ee1 but sadly the bug was still present. I then went ahead and tested with kernel-vanilla-5.12.rc5-1.1.g5fe2d5c because I notice it was available too, you know just to make sure the bug was indeed still present and all... But the bug doesn't happen with 5.12.rc5-vanilla... Just what the fuck? Was the bug really fixed in exactly the week I reported the bug but without finding an useful log or call trace? And why the bug didn't happened on ubuntu? To be honest despite the bug looking like it's fixed on 5.12.rc5 it's now more annoying not knowing why or how it was fixed haha I'm legit going to read all the commits between 5.12.rc4 and 5.12.rc5 That aside what should the status of this bug be? Is it "Resolved" as soon as there is a patch coming up in a future stable version or it's Resolved until that patch finally reaches a stable kernel in opensuse? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c13
--- Comment #13 from Imnotgivingmy nametoamachine
(In reply to Imnotgivingmy nametoamachine from comment #11)
(In reply to Takashi Iwai from comment #10)
It's being rebuilt now. I didn't notice that the automatic update stopped working...
Thank you very much Takashi! I tested with (drm-tip) kernel-vanilla-5.12.rc4-2.1.ga3c6ee1 but sadly the bug was still present.
I then went ahead and tested with kernel-vanilla-5.12.rc5-1.1.g5fe2d5c because I notice it was available too, you know just to make sure the bug was indeed still present and all... But the bug doesn't happen with 5.12.rc5-vanilla...
Just what the fuck? Was the bug really fixed in exactly the week I reported the bug but without finding an useful log or call trace? And why the bug didn't happened on ubuntu? To be honest despite the bug looking like it's fixed on 5.12.rc5 it's now more annoying not knowing why or how it was fixed haha
I'm legit going to read all the commits between 5.12.rc4 and 5.12.rc5
That aside what should the status of this bug be? Is it "Resolved" as soon as there is a patch coming up in a future stable version or it's Resolved until that patch finally reaches a stable kernel in opensuse?
You need to really make sure that it's fixed in 5.12-rc5 code itself. For example, 5.12-rc5 kernel you've tested also contains the kernel configuration change to make CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=n. This might be the cause, too.
And, if that's the case, you could work around in other kernels by passing snd_hda_codec_hdmi.enable_silent_stream=0 boot option, too.
This might be the issue (affected by this bug) config-5.11.6-1-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
(affected by this bug) config-5.12.0-rc4-2.ga3c6ee1-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
(drm-tip, affected by this bug) config-5.12.0-rc4-3.ge9c25f7-vanilla:
CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM=y
All kernels build by ubuntu (none affected):
# CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM is not set
(not affected) config-5.12.0-rc5-1.g5fe2d5c-vanilla:
# CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM is not set
5.12.0-rc4-2.ga3c6ee1-vanilla booted with snd_hda_codec_hdmi.enable_silent_stream=0 results in the gpu freeze to never happen Also from https://cateee.net/lkddb/web-lkddb/SND_HDA_INTEL_HDMI_SILENT_STREAM.html
found in Linux kernels: 5.9���5.11, 5.12-rc+HEAD 5.9 was the first kernel I had issues when testing with the kernels in your repo.
Looks like silent_stream is indeed the issue, how did you find it? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c14
--- Comment #14 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c15
--- Comment #15 from Imnotgivingmy nametoamachine
Heh, it was a blind shot.
OK, then the next problem is why this causes a problem.
Does your hardware have only Intel graphics chip, and no discrete GPU for rendering, right? Just to be sure which driver is involved in this game.
Exactly so, only Intel graphics UHD Graphics 600 and no discrete GPU. I'm not sure how the hdmi port is wired inside the laptop, as in if it goes directly to the GPU or goes through some other controller. A motherboard schematic might be useful maybe? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
Imnotgivingmy nametoamachine
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c16
--- Comment #16 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c17
--- Comment #17 from Imnotgivingmy nametoamachine
OK, then I need to ping Intel audio people.
Since the issue is with the HDMI audio: can you play via HDMI audio at all? At least no hang happens?
Yes, I can play audio over HDMI on the TV just fine with Linux 5.8 or previous, or Linux 5.9 and later with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM disabled. Should I close the issue on DRM Intel? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c18
--- Comment #18 from Imnotgivingmy nametoamachine
(In reply to Takashi Iwai from comment #16)
OK, then I need to ping Intel audio people.
Since the issue is with the HDMI audio: can you play via HDMI audio at all? At least no hang happens?
Yes, I can play audio over HDMI on the TV just fine with Linux 5.8 or previous, or Linux 5.9 and later with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM disabled.
Should I close the issue on DRM Intel?
Oh, another detail that might be useful. Even with CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM enabled this bug doesn't happen if the screen is connected *before* turning on the computer, and HDMI audio works too. But it isn't reliable as reconnecting the external screen, or changing the resolution, or even so much as changing the screen mode (changing to mirror for example) will then trigger this bug crashing the system. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c19
Kai Vehmanen
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c20
--- Comment #20 from Imnotgivingmy nametoamachine
This is probably happening in linux/sound/pci/hda/patch_hdmi.c:sync_eld_via_acomp(). Commenting out "silent_stream_enable()" and "silent_stream_disable()" calls in that function would be interesting experiment on a setup that triggers this. It could be the snd_hda_power_up_pm() call, but that would seem unlikely, so most probably something goes wrong in silent_stream_enable().
Say no more! One kernel compilation that might yield some concrete results I can bare (since I'm currently with very weak hardware and compiling the kernel takes over an hour at best, and not a few minutes or even seconds like probably does in your development machines) I'll do it tomorrow since it's late here, or if Takashi can spoil me a bit more with a patched kernel from OBS that would be awesome *wink* *wink* Also speaking of OBS is it open to everyone or just for suse developers? I can probably learn to use it and patch test kernels myself, (if I can use it) so I don't waste Takashi's time with boring, one time kernel builds for testing. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872
http://bugzilla.opensuse.org/show_bug.cgi?id=1183872#c21
--- Comment #21 from Takashi Iwai
participants (1)
-
bugzilla_noreply@suse.com