[Bug 890702] New: pae kernel doesn't boot - appears to cause NMI when unpacking initramfs
https://bugzilla.novell.com/show_bug.cgi?id=890702 https://bugzilla.novell.com/show_bug.cgi?id=890702#c0 Summary: pae kernel doesn't boot - appears to cause NMI when unpacking initramfs Classification: openSUSE Product: openSUSE 13.1 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: per@computer.org QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Hardware: Proliant DL580G2, HT, 4 CPUs, 32bit, 12Gb RAM. ("hamburg") Software: openSUSE 13.1+updates Process: PXE+ssh install from download.opensuse.org When trying to install, the initial boot kept failing. I hooked up a serial console, which showed the system generated an NMI apparently when unpacking initramfs. I then tried an installation with the -default kernel which worked fine. Reproducible: Always -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c1
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c2
--- Comment #2 from Per Jessen
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c3
--- Comment #3 from Michal Hocko
[ 0.000000] Kernel command line: BOOT_IMAGE=openSUSE root=/dev/disk/by-id/cciss-3600508b100184155435050384d320013-part1 noresume maxcpus=0 console=ttyS0,115200,8n1 [...] [ 1.459732] Failed to execute /init [ 1.463297] Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.
I do not see initrd in the command line. Are you sure you don't need it? Also have you tried to follow recommendations from Documentation/init.txt (in the kernel source tree)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c4
--- Comment #4 from Per Jessen
[ 0.000000] Kernel command line: BOOT_IMAGE=openSUSE root=/dev/disk/by-id/cciss-3600508b100184155435050384d320013-part1 noresume maxcpus=0 console=ttyS0,115200,8n1 [...] [ 1.459732] Failed to execute /init [ 1.463297] Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.
I do not see initrd in the command line. Are you sure you don't need it?
With the kernel that works, the initrd isn't mentioned in the command line either: Kernel command line: BOOT_IMAGE=openSUSE root=/dev/disk/by-id/cciss-3600508b100184155435050384d320013-part1 noresume Looking at other systems, there is also no initrd mentioned in the command line arguments.
Also have you tried to follow recommendations from Documentation/init.txt (in the kernel source tree)?
Uh no - when one kernel works, and the other doesn't, it seems quite clear. Is there anything specific in Documentation/init.txt you believe will help? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c5
--- Comment #5 from Michal Hocko
Uh no - when one kernel works, and the other doesn't, it seems quite clear.
I do not see why pae should make any difference that early during the boot. So it doesn't sound entirely clear to me.
Is there anything specific in Documentation/init.txt you believe will help?
At least debug cmd option might tell us more. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c6
--- Comment #6 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c7
--- Comment #7 from Per Jessen
(In reply to comment #4) [...]
Uh no - when one kernel works, and the other doesn't, it seems quite clear.
I do not see why pae should make any difference that early during the boot. So it doesn't sound entirely clear to me.
I meant it is clearly a kernel issue, not related to the command line and not the initrd.
Is there anything specific in Documentation/init.txt you believe will help?
At least debug cmd option might tell us more.
Will do. Although doesn't that only affect the init processing? (which I never get to). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c8
--- Comment #8 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c9
--- Comment #9 from Per Jessen
In the initial comment you've said that the -default kernel boots just fine. Was that a 32b -default kernel? I suppose so but wanted to be sure.
Sorry, yes, that was the 32bit kernel-default package.
Also have you tried to boot 64b kernel on that machine?
No, the machine doesn't support 64bit.
Finally have you ever tried to install different PAE kernels on that machine? E.g. the current upstream vanilla? There is a HEAD repository in build service where you can find it.
I am not certain, but I am pretty certain I have had 12.1 running with -pae on this machine earlier. I'll try out some older kernels and see what happens. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c10
--- Comment #10 from Per Jessen
I am not certain, but I am pretty certain I have had 12.1 running with -pae on this machine earlier. I'll try out some older kernels and see what happens.
Have just installed and booted with 3.1.0-1.2-pae, works fine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c11
--- Comment #11 from Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c12
--- Comment #12 from Per Jessen
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c13
Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c14
--- Comment #14 from Per Jessen
Sounds like you didn't get the hw error this time, assuming it is a hw error this NMI reports. How reproducible is this issue?
I doubt if the NMI is a hardware error, but that's just my gut feeling. The issue is easily reproducable, except right now when I booted 3.11.10-17-pae with all the debug options you requested - this time it worked, no NMI. I'll upload both console logs in a minute, then try 3.11.10 again without the debug options. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c15
--- Comment #15 from Per Jessen
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c16
--- Comment #16 from Per Jessen
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c17
Per Jessen
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c18
--- Comment #18 from Borislav Petkov
I doubt if the NMI is a hardware error, but that's just my gut feeling.
The NMI is used to report a hw error.
I noticed it automagically pulled in kernel-firmware. Presumably this means kernel-pae did not. Packaging issue?
Do you start getting the error again if you forcibly remove kernel-firmware and reboot the PAE kernel? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c
Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=890702
https://bugzilla.novell.com/show_bug.cgi?id=890702#c19
Per Jessen
I doubt if the NMI is a hardware error, but that's just my gut feeling.
The NMI is used to report a hw error.
I noticed it automagically pulled in kernel-firmware. Presumably this means kernel-pae did not. Packaging issue?
Do you start getting the error again if you forcibly remove kernel-firmware and reboot the PAE kernel?
Removed kernel-firmware, rebooted, no problem. I mistook kernel-firmware for being the microcode updates, so I also removed ucode-intel, and rebooted. Again no problem. I am unable to reproduce the problem, I am closing for now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com