[Bug 1157366] New: grub stuck in loop when using '-display none' qemu option
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 Bug ID: 1157366 Summary: grub stuck in loop when using '-display none' qemu option Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KVM Assignee: kvm-bugs@suse.de Reporter: jfehlig@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 824593 --> http://bugzilla.opensuse.org/attachment.cgi?id=824593&action=edit seabios debug log When starting Leap 15.1 VM with qemu option '-display none', grub inside the VM will constantly loop between showing the grub menu and attempting to boot the default selection. AFAICT, the kernel is never loaded before the grub menu reappears. I see "Booting openSUSE Leap 15.1...", but never any "Loading Linux 4.12.14-lp151.28.32-default..." or "Loading initial ramdisk ..." messages. Here's the qemu command line I'm using to reproduce the issue /usr/bin/qemu-system-x86_64 \ -name guest=libvirt-opensuse-151,debug-threads=on \ -S \ -machine pc-i440fx-3.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off \ -cpu host \ -m 2048 \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 1c9ba5de-915a-4d96-8113-f7c268d52179 \ -display none \ -no-user-config \ -nodefaults \ -monitor stdio \ -debugcon file:/tmp/seabios-debug.log -global isa-debugcon.iobase=0x402 \ -serial unix:serial,server,nowait \ -drive file=/home/jfehlig/virt/images/test/disk0.qcow2,format=qcow2,if=virtio \ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |brogers@suse.com, | |glin@suse.com, | |lyan@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c1 Bruce Rogers <brogers@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jfehlig@suse.com Flags| |needinfo?(jfehlig@suse.com) --- Comment #1 from Bruce Rogers <brogers@suse.com> --- What I see in your seabios debug log is that after the normal "Booting from 0000:7c00" the next thing printed by the bios is "In resume (status=0)", which indicates the CPU was reset. Some something bad is happening. Since you tie this report to the -display none option, I assume that without "-display none", the guest boots ok? For the record, I'm booting a Leap 15.1 guest similarly to what you have for a qemu commandline, and don't see this problem, so more info is needed. I am using Tumbleweed as the host OS though. (If I remember right in private conversation, you thought the host didn't matter. Is that right?) What additional details can be provided? Can you try w/out using -cpu host and see if that affects the behavior? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c2 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jfehlig@suse.com) | --- Comment #2 from James Fehlig <jfehlig@suse.com> --- (In reply to Bruce Rogers from comment #1)
What I see in your seabios debug log is that after the normal "Booting from 0000:7c00" the next thing printed by the bios is "In resume (status=0)", which indicates the CPU was reset. Some something bad is happening.
Since you tie this report to the -display none option, I assume that without "-display none", the guest boots ok?
Yes, it boots fine without '-display none'.
For the record, I'm booting a Leap 15.1 guest similarly to what you have for a qemu commandline, and don't see this problem, so more info is needed. I am using Tumbleweed as the host OS though. (If I remember right in private conversation, you thought the host didn't matter. Is that right?)
Right. I've seen the problem on Leap 15.1 and TW hosts. IIRC, Liang has also seen the problem with TW guest.
What additional details can be provided?
Another way to reproduce is with virt-install. E.g. virt-install --name leap15.1 --ram 2048 --vcpus 2 --machine pc --virt-type kvm --location http://download.opensuse.org/distribution/leap/15.1/repo/oss/ --extra-args console=ttyS0 --arch x86_64 --disk /home/jfehlig/virt/images/test/disk0.qcow2 --network bridge=br0,mac=00:16:3e:0d:e4:15 --wait 0 --initrd-inject /home/jfehlig/virt/images/test/autoinst.xml --graphics none Once the install starts, you can connect to the console with 'virsh console leap15.1' and watch the install progress. After install the machine will shutdown. Restart it again with 'virsh start leap15.1 --console' and you will see the problem. You wont see the problem if '--graphics none' is dropped from the virt-install command. BTW, I'll attache the autoyast file I'm using, which installs a minimal Leap 15.1 system.
Can you try w/out using -cpu host and see if that affects the behavior?
I've tried without and see the same problem. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c3 --- Comment #3 from James Fehlig <jfehlig@suse.com> --- Created attachment 824616 --> http://bugzilla.opensuse.org/attachment.cgi?id=824616&action=edit autoyast file for minimal Leap 15.1 system Copy this to some path on your host and then have virt-install put it in the install initrd with the '--initrd-inject' option. The installer within the VM will automatically pick it up and use if for auto installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c4 --- Comment #4 from James Fehlig <jfehlig@suse.com> --- I was also able to reproduce by taking an existing libvirt-managed Leap 15.1 guest, 'virsh edit' to remove <graphics>, <video>, and related devices, then starting the guest and connecting to console with 'virsh start leap15.1 --console'. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c5 --- Comment #5 from Gary Ching-Pang Lin <glin@suse.com> --- The "-nodefaults" parameter seems to be the key. Here is my qemu command for the VM with seabios: qemu-system-x86_64 \ -enable-kvm -s -S -smp 2 -machine type=q35 \ -drive file=opensuse-15.1.img,if=virtio \ -m 2048 \ -virtfs local,id=fsdev,path=share,security_model=passthrough,mount_tag=v_share \ -monitor stdio \ -debugcon file:debug.log -global isa-debugcon.iobase=0x402 \ -serial unix:serial,server,nowait \ -display none \ -netdev user,id=hostnet0 -device virtio-net-pci,romfile=,netdev=hostnet0 It worked fine until I appended "-nodefaults". Maybe some default device was disabled and caused the problem. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1157366 http://bugzilla.opensuse.org/show_bug.cgi?id=1157366#c6 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mchang@suse.com --- Comment #6 from James Fehlig <jfehlig@suse.com> --- Michael Chang provided the follow analysis in a private mail thread: ----- Are you having gfxterm in grub.cfg ? It looks like the installer would set it along with serial console, which could provide access via local attached monitor at the same. GRUB_TERMINAL="gfxterm serial" While in my case I always modify it to GRUB_TERMINAL="console serial" The problem can only be reproduced if these three options enabled at the same time. 1. nodefaults (seabios) 2. display none (seabios) 3. gfxterm (grub) Likely that the initialize sequence of gfxterm tinkering with the display device will break the system with display none + nodefaults, which makes the vge device effectively seen but disabled. The failure in linux command is a red herring, because the system is in broken state that many grub commands are also not work (insmod, ls .. etc). In my other tests were not even reach the boot menu (system looks to be constant reset/reboot). Please use GRUB_TERMINAL="console serial" or GRUB_TERMINAL="serial" for the time being. Remeber to run "update-bootloader --refresh" to update grub.cfg with the settings. I think in the future we should propose the installer to set "console serial" because gfxterm has no input of its own, hence will complete with serial for input and is never work as expected. Besides now it looks worse as accessing display device might have side effect when not available. ----- Changing GRUB_TERMINAL from 'gfxterm serial' to 'console serial' indeed "fixes" the problem. FTR, this can be done via autoyast with the follow <bootloader> configuration <bootloader> <global> <terminal>console serial</terminal> </global> </bootloader> Perhaps we should create a jira that proposes changing the installer to s/gfxterm/console/ ? What would be the ramifications of that? -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com