[Bug 1109471] New: VM hangs installling TW or Leap 15 on Intel Core 2 Duo
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471 Bug ID: 1109471 Summary: VM hangs installling TW or Leap 15 on Intel Core 2 Duo Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Factory Status: NEW Severity: Normal Priority: P5 - None Component: Virtualization:Tools Assignee: virt-bugs@suse.de Reporter: tjcw@cantab.net QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 784008 --> http://bugzilla.opensuse.org/attachment.cgi?id=784008&action=edit Screenshot of hang when booting TW 20180920 in an VM I get a hang booting the install kernel in a VM (qemu/kvm) under TW 20180920 on an Intel Core 2 Duo processor. I am attaching 2 screenshots; one from booting TW 20180920 and one from booting Leap 15.0 . I get success booting these install kernels under Leap 15.0 on the same hardware. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c1
--- Comment #1 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c2
Fei Li
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c3
--- Comment #3 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c4
--- Comment #4 from Chris Ward
Created attachment 784140 [details] /var/log/libvirt/qemu log of attempt to install TW
I attempted to install TW from virt-manager. I think this is the log file that you want. Is there any virt-manager error message? I did not see any error in the first log file you provide except some Spice-WARNINGs. From the screenshot, it seems
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c5
--- Comment #5 from Fei Li
The log says libvirt version: 4.6.0, qemu version: 3.0.0
-- You are receiving this mail because: You are on the CC list for the bug.
Is there any virt-manager error message? I did not see any error in the first log file you provide except some Spice-WARNINGs. From the screenshot, it seems the guest's processor has not been booted. No, I couldn't see any error messages either. According to the screenshot, the hang occurred 0.02 seconds after the guest's
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c6
--- Comment #6 from Chris Ward
Edit the /etc/libvirt/libvirtd.conf as follows and restart the libvirtd.service, and provide the /tmp/libvirtd.log would be much helpful. Log file attached.
A few questions: - do you mean both are failed: no matter install the TW-920 guest and leap15.0 guest? While booting the leap15.0 host on Conroe process is ok? - have you enabled any spectre/meltdown mitigation on the host? TW-920 guest and leap15.0 guest both fail on TW host, and both work on leap15.0 host. The host system was installed as leap 15.0 from DVD, brought up to date with 'zypper up', then moved to TW by changing the software repositories and doing 'zypper dup'. I don't know what that sets up for spectre/meltdown mitigation, but I have not done anything special.
openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180919-Media.iso, would you like to try the latest TW iso and see whether the problem still occur? The latest TW snapshot (20180924) has the same problem.
My theory is that there is some change in qemu between the version installed on on leap15.0 and the version installed on TW that is incompatible with the Conroe processor. Is this likely ? Did qemu intend to drop support for Conroe ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c7
--- Comment #7 from Fei Li
Created attachment 784256 [details] 'tar -cJf' of libvirt.log log file
Is there any virt-manager error message? I did not see any error in the first log file you provide except some Spice-WARNINGs. From the screenshot, it seems the guest's processor has not been booted. No, I couldn't see any error messages either. According to the screenshot, the hang occurred 0.02 seconds after the guest's processor was booted. 'isolinux' ran normally, and the OpenSUSE initial install boot screen was presented normally; I selected 'Installation'. Sorry that I am a little confused by the hang phenomenon. Do you mean the hang occurs during the installation process after you selected the 'Installation', or the 0.02s hang you mentioned above?
Edit the /etc/libvirt/libvirtd.conf as follows and restart the libvirtd.service, and provide the /tmp/libvirtd.log would be much helpful. Log file attached. Thanks.
A few questions: - do you mean both are failed: no matter install the TW-920 guest and leap15.0 guest? While booting the leap15.0 host on Conroe process is ok? - have you enabled any spectre/meltdown mitigation on the host? TW-920 guest and leap15.0 guest both fail on TW host, and both work on leap15.0 host. The host system was installed as leap 15.0 from DVD, brought up to date with 'zypper up', then moved to TW by changing the software repositories and doing 'zypper dup'. I don't know what that sets up for spectre/meltdown mitigation, but I have not done anything special.
openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180919-Media.iso, would you like to try the latest TW iso and see whether the problem still occur? The latest TW snapshot (20180924) has the same problem.
My theory is that there is some change in qemu between the version installed on on leap15.0 and the version installed on TW that is incompatible with the Conroe processor. Is this likely ? Did qemu intend to drop support for Conroe ? Emm, no, I do not think we change any qemu code about Conroe. BTW, could the TW-920 guest and leap15.0 guest be successfully installed just using the following brief qemu command line? /usr/bin/qemu-system-x86_64 -monitor stdio \ -serial none \ -parallel none \ -enable-kvm \ -name "leap15-guest" \ -smp sockets=1,cores=2,threads=2 \ -m 8192 \ -mem-prealloc \ -usb \ -device usb-ehci,id=ehci \ -machine pc,accel=kvm,kernel_irqchip=on,mem-merge=off \ -drive file=/path/leap15.qcow2,format=qcow2,if=none,id=drive-sata0-0-0 \ -device virtio-blk-pci,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 \ -cdrom /path/openSUSE-Tumbleweed-DVD-x86_64-Snapshotxxx-Media.iso \ -boot order=c
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c8
--- Comment #8 from Chris Ward
According to the screenshot, the hang occurred 0.02 seconds after the guest's processor was booted. 'isolinux' ran normally, and the OpenSUSE initial install boot screen was presented normally; I selected 'Installation'. Sorry that I am a little confused by the hang phenomenon. Do you mean the hang occurs during the installation process after you selected the 'Installation', or the 0.02s hang you mentioned above?
The hang occurs during the installation process, 0.02 seconds after the kernel is loaded and the console log starts. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c9
--- Comment #9 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c10
--- Comment #10 from Fei Li
Created attachment 784323 [details] Screenshot of hang when installing with qemu-system-x86_64 command
I ran your command, modified for '-m 1024' because I don't have 8GB of memory, and got the hang as before. Screenshot is attached, and run log of the qemu-system-x86_64 command is tjcw@linux-ta3i:~> do_install + /usr/bin/qemu-system-x86_64 -monitor stdio -serial none -parallel none -enable-kvm -name tumbleweed-guest -smp sockets=1,cores=2,threads=2 -m 1024 -mem-prealloc -usb -device usb-ehci,id=ehci -machine pc,accel=kvm,kernel_irqchip=on,mem-merge=off -drive file=/home/tjcw/tumbleweed.qcow2,format=qcow2,if=none,id=drive-sata0-0-0 -device virtio-blk-pci,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 -cdrom /home/tjcw/Downloads/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180924-Media. iso -boot order=c QEMU 3.0.0 monitor - type 'help' for more information (qemu)
Sorry that I do not have the Conroe processor in my hand and can not reproduce this by myself. So would you like to use gdb to debug in your environment and see whether there is any error message be printed in the console? Just add `gdb --args` before the above `/usr/bin/qemu-system-x86_64 ...`. Another try to locate whether this is a qemu issue is reinstalling the former v2.11.2 qemu which is used in leap15.0. The URL is https://download.opensuse.org/distribution/leap/15.0/repo/oss/, but this needs you uninstall the current higher version v3.0 and then reinstall the former v2.11.2. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c11
--- Comment #11 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c12
--- Comment #12 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c13
--- Comment #13 from Bruce Rogers
I replaced "kernel-default" with the corresponding package from Leap 15.0; this also replaces the kernel loadable modules including "kvm.ko". With this done, my TW guest is installing under my TW-with-leap-kernel host with QEMU 3.0.0
So it looks like there is some problem either with the TW kernel or more likely with "kvm.ko" which is breaking things for the Conroe processor.
Do you have a changelog for "kvm.ko" between Leap 15.0 and TW ?
I intended to work on this today, since I've had access to a Conroe for quite some time, but I now find it doesn't work right, so I'm trying to get it working again. Thanks for the analysis on your end. Indeed it does seem to be some change in the kvm kernel module which has broken things on your box. Just so you know, there was a time when kvm was broken on the Conroe, and it is possible it got broken again. I'll look at recent changes done in the TW kernel and see what pops up. Hopefully I can also get some testing going on a Conroe so I can also duplicate it and work on it better on my side. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c14
--- Comment #14 from Bruce Rogers
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c15
--- Comment #15 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c16
--- Comment #16 from Bruce Rogers
I installed the kernel from Leap 15.1 (4,2,14-lp1515.16-default) and got success.
I don't have an archive of TW kernels to try; this machine was running Leap 15.0 until I installed TW 20180920 and got the failure. So I can't say where the problem was introduced more closely than 'somewhere between Leap 15.1 and TW 20180920', I'm afraid.
Is there a web site with archives of TW that I could try a few kernels from ?
http://download.opensuse.org/history/ would have some recent snapshots you can grab from. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c17
Bruce Rogers
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c18
--- Comment #18 from Bruce Rogers
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c19
--- Comment #19 from Chris Ward
I'm finding that something in kvm is broken on this machine back to v4.16.0. It seems people don't use this old hardware very much for kvm virtualization!
For me, exploring http://download.opensuse.org/history pays dividends. I find that the 4.18.7-1.5 kernel (in TW 20180917) works, but the 4.18.8-1.3 kernel (in TW 20180919) doesn't work. Hope this ties it down to only a few kernel changes ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c20
--- Comment #20 from Chris Ward
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c21
--- Comment #21 from Bruce Rogers
(In reply to Bruce Rogers from comment #18)
I'm finding that something in kvm is broken on this machine back to v4.16.0. It seems people don't use this old hardware very much for kvm virtualization!
For me, exploring http://download.opensuse.org/history pays dividends. I find that the 4.18.7-1.5 kernel (in TW 20180917) works, but the 4.18.8-1.3 kernel (in TW 20180919) doesn't work.
Hope this ties it down to only a few kernel changes !
I see a failure with the 4.18.7-1.5 kernel. The install does proceed further, though. I'm going to instrument the running kvm module to see where things go wrong. I had pursued a few other approaches which haven't panned out. As for your commit pointers, I'll look into it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471
http://bugzilla.opensuse.org/show_bug.cgi?id=1109471#c22
--- Comment #22 from Bruce Rogers
With the help of our awesome intern, I have a Conroe working again, and can reproduce. Here's the error were getting:
[ 2869.709261] vmwrite error: reg 401e value c0c3d700 (err 12) [ 2869.709269] CPU: 1 PID: 30898 Comm: CPU 0/KVM Not tainted 4.18.8-1-default #1 openSUSE Tumbleweed (unreleased) [ 2869.709270] Hardware name: /DG965MQ, BIOS MQ96510J.86A.1612.2006.1227.1513 12/27/2006 [ 2869.709271] Call Trace: [ 2869.709281] dump_stack+0x85/0xc0 [ 2869.709291] vmx_set_virtual_apic_mode+0x197/0x240 [kvm_intel] [ 2869.709334] kvm_lapic_set_base+0x7b/0x190 [kvm] [ 2869.709353] kvm_set_apic_base+0xac/0xd0 [kvm] [ 2869.709371] kvm_set_msr_common+0x837/0xc00 [kvm] ...
So the issue we're seeing is that this processor doesn't support the secondary vm exec controls, but kvm is trying to access that register anyways. BUG. Trying to figure out what code needs to be changed to address this. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com