[Bug 1177009] New: Leap 15.2 stopped working in KVM with ovmf-ia32 firmware
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 Bug ID: 1177009 Summary: Leap 15.2 stopped working in KVM with ovmf-ia32 firmware Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: x86-64 OS: openSUSE Leap 15.2 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: nwr10cst-oslnx@yahoo.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 Build Identifier: I'm reporting as a kernel bug, but it might be a bug elsewhere. And yes, I know that this is not a supported configuration. Situation: I have both Leap 15.2 and Tumbleweed installed side by side in a KVM virtual machine, using 32-bit efi for booting. Both Leap and Tumbleweed are 64-bit. I originally setup this VM under Leap 15.0 (as KVM host machine). Everything had been working well. However, I recently tried to boot the Leap 15.2 system in that VM, and it failed to boot. I see the kernel and initrd being loaded. And then there is a reset and I get back to the boot screen. However, Tumbleweed continues to boot without a problem on that virtual machine. I am booting with "grub2-i386-efi", which is installed for both Tumbleweed and for Leap 15.2. This does not appear to be a grub2 problem, because the kernel and initrd are being loaded. Before today, I most recently booted Leap 15.2 on Sept 22, and I updated it at that time. I then rebooted (successfully) to check after the update. I have not been able to boot since, until my workaround today. The KVM host is also running Leap 15.2, and I did apply some updates since Sept 22. My guess is that one of those updates affected how the virtualization is working. I have a similar install of Leap 15.2 on an external USB drive, and I have that setup so that it can boot with legacy BIOS, with 64-bit efi and with 32-bit efi. Testing with that, it also fails to boot the same virtual machine with 32-bit efi, bit it successfully boots with either legacy BIOS or with 64-bit efi. So it looks as if whatever is going wrong has to do with communication between the kernel and the 32-bit efi firmware. Booting the virtual machine to Tumbleweed, then setting up for rescue/chroot, I have installed kernel 5.8.11-1.gf4bb27a-default from the stable kernels repo. And now Leap 15.2 does boot successfully with that kernel. It does not boot with any of the 5.3.x kernels for Leap 15.2 that I have tested. Reproducible: Always -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c1 --- Comment #1 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Created attachment 841980 --> http://bugzilla.opensuse.org/attachment.cgi?id=841980&action=edit Output of "last" command Here's output of the "last" command. Entries for today (Sept 26) are all with my workaround of using the 5.8.11 kernel. Before today, you can see that previous kernels worked. On Sept 22, you can see that I booted, then rebooted. The reboot was after an update. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c2 --- Comment #2 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Created attachment 841981 --> http://bugzilla.opensuse.org/attachment.cgi?id=841981&action=edit Software updates on KVM host since 9/21 This is selected from from the output of grep '^2020-09-2.*install' /var/log/history on the host system (where libvirt is running for the virtualization). I deleted lines with dates on 9/21 or earlier. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c4 --- Comment #4 from Neil Rickert <nwr10cst-oslnx@yahoo.com> ---
Per your description, the issue happened after kernel upgrade in the VM.
No, that's not quite right. The system booted fine with all kernels, up through Sept 22. This included booting with 5.3.18-lp152.41 However, something changed (I'm not sure what), and now neither 5.3.18-lp152.41 nor 5.3.18-lp152.36 will boot. The kernel from the release iso also will not boot, but it did at one time. I originally installed Leap 15.2 into this VM on Aug 30, 2019 when it was an alpha release. I'm checking that date from the date of "/lost+found". And it has worked well ever since, until a few days ago. To check the possibility that the NVRAM might be corrupt, I recreated the VM (with a different name) using "virt-install". And it still would not boot those kernels. So I don't think it is an NVRAM corruption problem. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c6 --- Comment #6 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Thank you for that suggestion. So grub2-i386-efi version 2.04-lp152.7.15.1 is broken. But everything works correctly with version 2.04-lp152.7.12.1 My tests: I downgraded all grub2 to the version in the main repo. I then ran grub2-install (this is not automatic for i386-efi). On reboot, all kernels work. I then upgrade again to the latest version, but I did not run "grub2-install". Again, everything worked. Then I ran "grub2-install", and after that I could not boot any of the leap kernels, so I booted to kernel 5.8.11. Finally, I downgraded only grub2-i386-efi to version 2.04-lp152.7.12.1, and ran "grub2-install". And everything works correctly (all kernels boot). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c7 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mchang@suse.com --- Comment #7 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- I am adding Michael Chang to the CC list, because I think he maintains grub. Current situation: grub2-i386-efi version 2.04-lp152.7.15.1 is broken. The fix is to install the previous version (2.04-lp152.7.12.1). In Tumbleweed, verson 2.04-17.1 is broken. To fix this, I installed 2.04-16.1, which I found in the history archives (I think for 20200915). In both cases, the broken version of grub2-i386-efi will boot the Tumbleweed 5.8 kernels, but fails to boot the Leap 5.3 kernels. In response, I am changing my testing procedure. Previous testing procedure: When I see that this package is updated, I run: grub2-install --target=i386-efi and then I checked whether it would boot the system. New testing procedure: When the package has been updated: cd /boot/grub2 rm -rf i386-efi.old mv i386-efi i386-efi.old grub2-install --target=i386-efi and then check that it can boot both Tumbleweed and Leap 15.2 Moving the directory is to allow easier recovery in case of problems. Note that all of my test is in a KVM virtual machine. I do not have a physical machine with this firmware, but such machines do exist. I'm guessing that I'm probably the only person routinely testing this package when it is updated. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c8 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nwr10cst-oslnx@yahoo.com Flags| |needinfo?(nwr10cst-oslnx@ya | |hoo.com) --- Comment #8 from Michael Chang <mchang@suse.com> --- Hi Neil, Are you using `linux` command in 32-bit efi for booting kernel ? If so, could you please revert to old "working" grub, and try "linuxefi" there to see if the same problem can be reproduced? The last modification in grub will change the linux command to use efi handover protocol on efi platforms to boot the kernel, and probably ia32 kernel is having some issues dealing with that. TIA. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c9 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nwr10cst-oslnx@ya | |hoo.com) | --- Comment #9 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Yes, I am using "linux". I did just try with "linuxefi" (and "initrdefi") a few minutes ago, using the grub2-i386-2.04-lp152.7.15.1 and it does not work. It gives the same problem as using "linux". This is Leap 15.2. The kernel is 64-bit (x64, not ia32). If I understand it correctly, "linuxefi" would try to load the kernel as an ia32 efi binary. But it is an x64 efi binary, so that should not work. I don't know why the grub code does not recognize this and drop back to using "linux". That seems to work with the 5.8 kernel for Tumbleweed (also 64-bit), but not with the 5.3 kernel for Leap 15.2 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c10 --- Comment #10 from Michael Chang <mchang@suse.com> --- (In reply to Neil Rickert from comment #9)
I don't know why the grub code does not recognize this and drop back to using "linux". That seems to work with the 5.8 kernel for Tumbleweed (also 64-bit), but not with the 5.3 kernel for Leap 15.2
Because "Secure Boot", it just cannot provide any vehicle to bypass efistub loader. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c11 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fvogt@suse.com Flags| |needinfo?(fvogt@suse.com) --- Comment #11 from Michael Chang <mchang@suse.com> --- If there's no secure boot support required for ia32 and we used to come acorss some absurd x64 shipped with 32-bit efi firmware, maybe we should revert the last change made to i386-efi to make sure existing hardware can continue to work. I am not sure whether this has any implication to kiwi again, since they might see different needs and prefer different "default".. @Fabian, What do you think ? Thanks in advanced. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c12 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nwr10cst-oslnx@ya | |hoo.com) --- Comment #12 from Michael Chang <mchang@suse.com> --- I have tested ovmf-ia32 and openSUSE Tumbleweed (64-bit) booted fine with latest grub2-i386-efi (2.04-17.1) . The kernel version is 5.8.12-1-default. Hi Neil Just to make sure we're on the same page. It is not clear to me. In comment#7 you described grub version 2.04-17.1 is broken, but you also mentioned that "In both cases, the broken version of grub2-i386-efi will boot the Tumbleweed 5.8 kernels, but fails to boot the Leap 5.3 kernels." If some kernel works and others don't, maybe we have to include kernel team to help, or we just revert the change made to i386-efi (comment#11) given the 32-bit efi handover entry and 64/32 mixed mode is a bit too chaos that we probably need to do more test before switching. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c14 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nwr10cst-oslnx@ya | |hoo.com) | --- Comment #14 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Responding to Michael at comment #12 Note that I originally reported this as a kernel bug, so Takashi is on the CC list for this bug. Let me give a few more details that I may have missed. I have both Tumbleweed and Leap 15.2 installed in the same VM. Currently, Leap 15.2 controls the booting. But it is easy to switch to have Tumbleweed control the booting. Whichever controls the booting, the grub menu includes an entry to boot Tumbleweed and an entry to boot Leap 15.2. When I said that the Tumbleweed grub would not boot the 5.3 kernel, I was referring to booting Leap 15.2 from the Tumbleweed grub menu. I could try installing that kernel in Tumbleweed, but it wouldn't make any difference since the problem seems to be a failure to load the kernel. I did install a 5.8 kernel into Leap 15.2, and that was able to boot. Using the working grub2-i386-efi (the previous version), I did try booting using "linuxefi" rather than "linux". And that successfully boots Tumbleweed (5.8.x kernel) but fails to boot Leap 15.2 (5.3.x kernel). I have noticed, from another bug report, that Takashi has a repo with old Tumbleweed kernels back to 5.4.x. I may try some of those to see where the kernel behavior changes with "linuxefi". I have already tested kernel 5.7.11, which I still have in the Tumbleweed system. And that boots fine with "linuxefi". -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c15 --- Comment #15 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- A followup to my last comment. In final paragraph, I mentioned that Takashi has older kernels. I found that information in bug 1175908 I have tried several of those older kernels in Tumbleweed with ia32 efi booting. I used "linuxefi" in my attempts to boot. kernel-default-5.6.15-1.1.gbfa465b.x86_64.rpm kernel-default-5.5.13-1.1.g0af205d.x86_64.rpm kernel-default-5.4.14-1.1.gfc4ea7a.x86_64.rpm Those all boot without a problem. However: kernel-default-5.3.12-1.1.g60a2268.x86_64.rpm will not boot with "linuxefi". It looks as if there was a change between 5.3 kernels and 5.4 kernels. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c16 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kernel-bugs@opensuse.org Flags| |needinfo?(kernel-bugs@opens | |use.org) --- Comment #16 from Michael Chang <mchang@suse.com> --- Hi Neil and Fabian, Thanks a lot for all the valuable and informative feedback, especially Neil who took his time to test on different kernel versions that we definitely can use the result to pin down the cause. Now setting needinfo to the kernel team as this is about changes in between 5.3 and 5.4 kernels. Let's see will come up with. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c17 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nwr10cst-oslnx@ya | |hoo.com) --- Comment #17 from Michael Chang <mchang@suse.com> --- Hi Neil, I have verified this kernel commit which fixed the problem for me ... https://github.com/torvalds/linux/commit/4911ee401b7ceff8f38e0ac597cbf503d71... The test package for kernel-5.3.12 with above patch backported can be downloaded here: https://download.opensuse.org/repositories/home:/michael-chang:/kernel/stand... Would you please help to verify if the test package also fix the problem for you? Thanks in advanced. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c18 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nwr10cst-oslnx@ya | |hoo.com) | --- Comment #18 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Yes, kernel-default-5.3.18-1.1.ge31647a from your repo is working fine in Leap 15.2, with the current grub2-i386-efi (version 2.04-lp152.7.15.1). Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c19 --- Comment #19 from Michael Chang <mchang@suse.com> --- Hi Neil, Many thanks for taking your time to verify the patch, now it's clear for what's missing in the old kernel. Let's wait for the response from kernel team if they would cherry-pick the patch for maintenance update. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c20 --- Comment #20 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- It turns out that this might be a issue also with x86_64. I have Tumbleweed 32-bit installed in an external drive. And I have it setup so that it can boot with legacy mbr booting, with 32-bit efi booting and with 64-bit efi booting. Yesterday, I connected the external drive to a system with 64-bit efi booting, but with secure-boot disabled. And the Tumbleweed 32-bit system would not boot. It used to boot. So I reverted grub2-x86_64-efi back to version 2.04-16.1 (the version just before the update around Sept 15th). And after going back to the old version, I can now boot it again with 64-bit efi. By the way, to install x86_64-efi boot support, I use: grub2-install --target=x86-64-efi --removable -no-nvram For further testing, I have setup a VM with Leap 15.2 and Tumbleweed 32-bit installed side by side. In the Leap install, I used version 2.04-lp152.7.12.1 of grub2-x86_64.efi so that it can boot Tumbleweed. Or I can directly use the grub2 boot code installed from Tumbleweed (version 2.04-16.1) to boot either. Creating that VM was tricky, because of bug 1177849 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c21 Michael Chang <mchang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nwr10cst-oslnx@ya | |hoo.com) --- Comment #21 from Michael Chang <mchang@suse.com> --- (In reply to Neil Rickert from comment #20)
It turns out that this might be a issue also with x86_64.
[snip]
Yesterday, I connected the external drive to a system with 64-bit efi booting, but with secure-boot disabled. And the Tumbleweed 32-bit system would not boot. It used to boot. So I reverted grub2-x86_64-efi back to version 2.04-16.1 (the version just before the update around Sept 15th). And after going back to the old version, I can now boot it again with 64-bit efi.
I'm confused on why you would have to revert grub2-x86_64-efi in order to have 32-bit efi to boot ? I suppose a typo was in "And the Tumbleweed 32-bit system would not boot" with which the 32-bit should be replaced by 64-bit ? Is the tumbleweed system fully updated ? If the problem is reproducible only with "secure-boot disabled", then is there anything to do with bsc#1165773 ? Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c22 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nwr10cst-oslnx@ya | |hoo.com) | --- Comment #22 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- Sorry. I guess that was confusing. This was about a different system, that does not fit the title of this bug report. But it seems to be the same problem, except for grub2-x86_64-efi. I was previously able to boot this system using the grub "linux" command. I doubt that it would ever boot using "linuxefi". With the latest grub2 changes, it no longer boots. Reverting to an earlier version of grub2-x86_64-efi allows it go again boot. The system itself is running 32-bit Tumbleweed, and is installed on an external drive. I installed with the drive plugged into a system using BIOS/MBR booting. And then I configured it so that it could also boot on a UEFI (X64) system. And that last part is what stopped working with the latest grub2 updates. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177009 http://bugzilla.opensuse.org/show_bug.cgi?id=1177009#c23 --- Comment #23 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- With the latest kernel in Leap 15.3 (kernel-default-5.3.18-59.5.2) I am now able to boot using the latest grub2-i386-efi. It still won't work with Leap 15.2 unless I use an older "grub2". But, since it does work now with Leap 15.3, feel free to close this bug. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com