[Bug 1217344] New: Linux 6.6.1 fails to boot
https://bugzilla.suse.com/show_bug.cgi?id=1217344 Bug ID: 1217344 Summary: Linux 6.6.1 fails to boot Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: neronio@outlook.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- The default 6.6.1 kernel installed on my Tumbleweed system does not boot. Upon selecting it in GRUB, I see: "Loading Linux 6.6.1-1-default ..." "Loading initial ramdisk ..." and the system hangs on this screen forever. My workstation's diagnostic LEDs turn on in a pattern that, according to the service manual, indicates an "other type of failure". The workstation is a Dell T5600. CPU: 2x Intel Xeon E5-2680 (v1) RAM: 8x 8 GB DDR3 1600 MHz GPU: Nvidia NVS510, I'm using nouveau drivers. My kernel cmdline: root=UUID=5136bb35-acaa-4796-b463-8c9ef307025d splash=silent quiet mitigations=auto The latest default 6.5.9 kernel works fine on this machine. I'm available for all kinds of testing/debugging, including kernel recompilation with experimental patches and the like. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c2 --- Comment #2 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #1)
Could you try to boot without native graphics via nomodeset boot option? If it boots, the problem is in the graphics driver. If it still doesn't boot, it's something else.
Sorry, I forgot to mention I have already tried this, and it did not boot. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c4 --- Comment #4 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #3)
Hm, then keep nomodeset and also drop quiet and splash=silent options. This will show more lines at boot and we may see at which line it hangs up.
If this doesn't show any useful lines, we might need to enable the early printk.
... or it might be something to do with the CPU microcode loading. In that case, you can try to pass "dis_ucode_ldr" boot option to disable CPU microcode. nomodeset without quiet and splash=silent changes nothing. Adding dis_ucode_ldr changes nothing. I've tried enabling earlyprintk (not sure I did it the right way) by removing quiet and splash=silent and adding earlyprintk=vga,keep and earlyprintk=serial,ttyS0,keep and this too changed nothing. During this earlyprintk test, I forgot to add nomodeset, though. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c5 --- Comment #5 from Tommaso Fonda <neronio@outlook.com> --- From my previous comment it sounds like I added both the vga and serial earlyprintk parameters at the same time. This is not the case: I've tested one first, and then the other, to no avail. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 Tommaso Fonda <neronio@outlook.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Linux 6.6.1 fails to boot |Linux 6.6.x fails to boot -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c6 --- Comment #6 from Tommaso Fonda <neronio@outlook.com> --- The same occurs on Linux 6.6.2 too. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c7 --- Comment #7 from Tommaso Fonda <neronio@outlook.com> --- Sorry for the bump, is there anything else we can try to debug this issue, or shall I start bisecting all the patches in the 6.5 -> 6.6 chain to find the problematic one? It sounds very tie consuming, but if it's my only option, I'll have to do it... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c10 --- Comment #10 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #8)
At first, try to check whether the latest 6.7-rc still suffers from the same problem. If it boots, it's something specific to 6.6.x, and we need to find an upstream fix.
If 6.7-rc still doesn't boot, it has to be reported to the upstream. But without a proper log, it's a bit difficult to know to whom reporting. I guess you'd be asked to perform git bisect in such a case.
Thanks, I will build 6.7-rc as soon as possible and let you know. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c13 --- Comment #13 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #12)
There is already a kernel package for 6.7-rc available in OBS Kernel:HEAD repo. It can be tested quickly as well as the kernel in OBS Kernel:stable repo.
I suppose you've tested only the Leap standard kernels, so far, right? i.e. you didn't build kernels by yourself?
Nice to know! I'll test it later today. So far, I've only tested official Tumbleweed 6.6.x kernels. I did not build 6.6.x myself yet. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c16 --- Comment #16 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #12)
There is already a kernel package for 6.7-rc available in OBS Kernel:HEAD repo. It can be tested quickly as well as the kernel in OBS Kernel:stable repo.
I suppose you've tested only the Leap standard kernels, so far, right? i.e. you didn't build kernels by yourself?
6.7 rc4 from Kernel:HEAD doesn't boot either. Bisection, here I come... I guess it makes no sense to report this upstream without any log, right? I shall bisect and find the problematic patch in advance. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c18 --- Comment #18 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #17)
Yeah, even if you can't chase at the end, it'd be helpful to narrow down the regression range.
I'm interested in whether you can boot your local built 6.5.x kernel properly. It might be something else, such as grub.
In below you can find some info to reduce the kernel build time: https://docs.kernel.org/admin-guide/quickly-build-trimmed-linux.html This will help especially for git bisection.
I've been building my custom kernels for a long time, and my custom 6.5.x boots fine (just like TW's 6.5.x). I'll keep you updated. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c21 --- Comment #21 from Tommaso Fonda <neronio@outlook.com> --- (In reply to Takashi Iwai from comment #20)
(In reply to Frank Krüger from comment #19)
Just as a guess, this might be related: https://bugzilla.kernel.org/show_bug.cgi?id=218173#c20
Thanks, that looks suspicious, indeed. FWIW, the corresponding upstream commit is: a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a x86/efistub: Avoid legacy decompressor when doing EFI boot
Tommaso, could you try to revert the commit? It'll lead to a conflict in arch/x86/include/asm/efi.h, but it should be trivially resolvable.
Meanwhile I'm building a test kernel package with the revert in OBS home:tiwai:bsc1217344 repo.
Once after we confirm it's the same problem, you can join to the upstream bugzilla entry for helping them to resolve the issue properly.
Yes!!! Reverting that commit fixed the issue. I'll leave a message in the upstream bug report right now. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217344 https://bugzilla.suse.com/show_bug.cgi?id=1217344#c24 --- Comment #24 from Frank Krüger <fkrueger@mailbox.org> --- (In reply to Takashi Iwai from comment #22)
The upstream fix commit 50d7cdf7a9b1 landed to Linus tree: efi/x86: Avoid physical KASLR on older Dell systems
I'm going to backport the fix.
JFYI: The fix is in kernel-default-6.6.7-1.1.g6869d09.x86_64 from Kernel:stable. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com