[Bug 1017783] New: Boot issues with Tumbleweed and Kernel 4.9 (booting stops at Loading initial ramdisk; kernel 4.9 does not boot)
http://bugzilla.suse.com/show_bug.cgi?id=1017783 Bug ID: 1017783 Summary: Boot issues with Tumbleweed and Kernel 4.9 (booting stops at Loading initial ramdisk; kernel 4.9 does not boot) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: r.pretki@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- My system: OpenSUSE Tumbleweed. My hardware: Toshiba A200-209 laptop http://www.toshiba.eu/discontinued-products/satellite-a200-209/ The system worked properly with kernel 4.8.14-1-default. I updated the kernel to the latest current version for Tumbleweed: kernel-default-4.9.0-1.1.x86_64 (http://download.opensuse.org/tumbleweed/repo/oss/suse/x86_64/) and kernel-default-4.9.0-2.1.x86_64 (http://download.opensuse.org/repositories/openSUSE:/Factory:/Update/standard...). Every time when I try to start the system, the kernel does not boots, but immediately automatic reboot (before loading GRUB2). Booting stops (hangs) on “Loading initial ramdisk” and kernel 4.9 does not boot. Only kernels older than 4.9 work. I tried to install the kernel-default-4.9.0-3.1.gf4d3acd.x86_64 (http://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/), but the situation every time is the same. Reproducible: Always. Anticipating possible questions: (1) Yes, I used the option zypper ref, zypper dup. It does not help. (2) journalctl: the lack of any logs (I do not see anything). Removing the “quiet” option from GRUB2 did not help. No information or logs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c1
Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Max Lin
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c2
Emanuel Castelo
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c4
--- Comment #4 from Robert Pretki
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c5
--- Comment #5 from Robert Pretki
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c6
--- Comment #6 from Robert Pretki
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c7
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c8
Takashi Iwai
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c9
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c10
Guy Dawson
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c11
--- Comment #11 from Guy Dawson
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c12
Borislav Petkov
I've had exactly the same problem on an HP GL360 Gen G4p server.
Oh good, yours is not a laptop: can you catch serial console on it when booting 4.9 and upload it here? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c13
--- Comment #13 from Guy Dawson
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c14
--- Comment #14 from Robert Pretki
You can make a video of the boot and upload it somewhere so that I can take a look at what exactly happens.
I have just sent to your e-mail address a link to the video (MKV, x264), which includes: (1) boot with the default settings; (2) boot without "quiet"; (3) boot with "dis_ucode_ldr" (instead of "quiet"); (4) boot without "initrd" line; (5) boot without "initrd" line and with "log_buf_len=16M ignore_loglevel initcall_debug" parameters (instead of "quiet"); (6) boot with "initrd" line and with line "log_buf_len=16M ignore_loglevel initcall_debug" parameters (instead of "quiet"). Unfortunately, no changes or any additional messages. I'm sorry if I'm doing something wrong. Please let me know if I should do something in a different way. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c16
--- Comment #16 from Borislav Petkov
I have just sent to your e-mail address a link to the video (MKV, x264), which includes: (1) boot with the default settings; (2) boot without "quiet"; (3) boot with "dis_ucode_ldr" (instead of "quiet"); (4) boot without "initrd" line;
Thanks for doing that. It doesn't tell us much but at least it doesn't look like the initrd changes anything so it must be something happening very early causing the machine to triple-fault. The nasty thing is that debugging such an issue on a laptop is always a PITA as we can't log serial console. One thing I could think of is to try booting with "nosmp nomodeset" so that we init only one CPU and thus avoid a whole lot of init code and see whether that works. The "nomodeset" should switch off the gfx modesetting so that we can maybe see more output from the kernel. Before you reboot though, edit /etc/default/grub and change GRUB_TERMINAL to "console": GRUB_TERMINAL="console" Then do: # grub2-mkconfig -o /boot/grub2/grub.cfg and reboot. When you're back in the grub menu, remove "quiet" and add "nosmp nomodeset ignore_loglevel" and then press F10. And upload the video again. Hopefully, we'll be able to see more output. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c17
--- Comment #17 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c18
--- Comment #18 from Emanuel Castelo
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Dominique Leuenberger
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Joey Lee
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c19
Stan Kain
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Dominique Leuenberger
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c20
--- Comment #20 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c21
--- Comment #21 from Dominique Leuenberger
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c22
--- Comment #22 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c23
--- Comment #23 from Joey Lee
possible related upstream bug: https://bugzilla.kernel.org/show_bug.cgi?id=192111
In bko#192111, Ivan bisected out that the bad patch is 8b355e3bc. The result is possible because acpi/osl.c uses synchronize_rcu_expedited to speed up grace period in acpi_os_map_cleanup() since v3.19: 74b51ee152b ACPI / osl: speedup grace period in acpi_os_map_cleanup The detail of 8b355e3bc needs RCU expert's comment. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c26
--- Comment #26 from Joey Lee
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c27
Bruno Pesavento
Could you please help to test it?
FYI, successfully booted that kernel on Leap 42.2 on ASUS N551 (i7-4720HQ+GTX960M), everything apparently fine so far. Previously it was impossible to boot kernel 4.9.0-4.g1af4b0f-default from Kernel:/stable/standard/. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c28
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c29
--- Comment #29 from Emanuel Castelo
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c32
--- Comment #32 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c33
--- Comment #33 from Stan Kain
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c34
--- Comment #34 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c35
--- Comment #35 from Stan Kain
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c36
--- Comment #36 from Emanuel Castelo
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c37
--- Comment #37 from Bruno Pesavento
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c38
--- Comment #38 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c39
--- Comment #39 from Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c40
--- Comment #40 from Stan Kain
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c41
--- Comment #41 from Bruno Pesavento
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c42
Borislav Petkov
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c44
--- Comment #44 from Borislav Petkov
I'm not sure if it's the same problem,
Well, "freezes at boot" is not the same as "RCU stalls".
but with kernel 4.9.0-2 I can boot fine, but after a few minutes I start getting RCU stalls. That makes processes hang and rebooting impossible (systemd shows that it's waiting for services to turn off, but never kills them, and pressing ctrl+alt+del makes it hang too). This is a desktop system with i5-3470.
Workaround is to install kernel 4.9.3 from the Kernel:stable repo.
Sounds like the fix for your issue went into 4.9.3 and you're good to go. Or what is the problem? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c45
--- Comment #45 from Dainius Masiliunas
Well, "freezes at boot" is not the same as "RCU stalls".
Yes, though from what I can see this issue was related to RCU as well, hence why I'm wondering about that.
Sounds like the fix for your issue went into 4.9.3 and you're good to go. Or what is the problem?
Well, it's not in base Tumbleweed repository yet. And I'm not sure if the fix for this issue is in base Tumbleweed yet either (I see references to 4.9.2 here). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c46
--- Comment #46 from Borislav Petkov
Well, it's not in base Tumbleweed repository yet. And I'm not sure if the fix for this issue is in base Tumbleweed yet either (I see references to 4.9.2 here).
You just have to be patient - they will all land in 4.9.x sooner rather than later. Also, there's nothing wrong with running 4.9.3 from Kernel:stable as long as your machine works fine with it. Actually, Kernel:stable is what is going to land in tumbleweed later so you're previewing it :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c47
patrick shanahan
http://bugzilla.suse.com/show_bug.cgi?id=1017783
http://bugzilla.suse.com/show_bug.cgi?id=1017783#c48
Borislav Petkov
participants (1)
-
bugzilla_noreply@novell.com