[Bug 763898] New: KVM guests crash after several days (sometimes hours)
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c0 Summary: KVM guests crash after several days (sometimes hours) Classification: openSUSE Product: openSUSE 12.1 Version: Final Platform: x86-64 OS/Version: openSUSE 12.1 Status: NEW Severity: Critical Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: marcus.husar@googlemail.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 After a period of time my KVM guests (OpenSuSE 12.1 and Debian Squeeze, both x64_64) appear to be unresponsive. The host system uses OpenSuSE 12.1, too. First I thought it could be related to virtio drivers. According to a Red Hat bugreport (https://bugzilla.redhat.com/show_bug.cgi?id=802118). But switching to ide/rtl8139 didn’t help. After some additional searching I found a second matching bug report: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=757382 Some lines in my libvirtd.log are found in the Fedora bug report above. 18:18:30.967: 2393: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error 18:18:36.960: 2393: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error 18:19:04.000: 2394: error : qemuDomainObjBeginJobInternal:772 : Timed out during operation: cannot acquire state change lock 18:19:05.893: 2393: error : qemuMonitorIO:577 : internal error End of file from monitor 18:19:15.232: 2395: warning : qemuDomainObjTaint:1110 : Domain id=3 name='virtual1' uuid=7c7f469c-3d57-7bf5-c81a-d815966a948b is tainted: high-privileges Reproducible: Always Steps to Reproduce: 1. Start a KVM guest with any operating system. 2. Wait a few days/hours. 3. The system got unresponsive. Actual Results: When I try to restart the guest (virsh restart virtual1) it says: Fehler: Neustart der Domain virtual1 gescheitert (restart of domain failed) Fehler: Timed out during operation: cannot acquire state change lock Expected Results: My KVM guests should work properly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c kk zhang <kkzhang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kkzhang@suse.com AssignedTo|bnc-team-screening@forge.pr |brogers@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c1 Bruce Rogers <brogers@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |brogers@suse.com, | |jfehlig@suse.com InfoProvider| |jfehlig@suse.com --- Comment #1 from Bruce Rogers <brogers@suse.com> 2012-06-22 18:00:14 UTC --- Jim, does this sound like any libvirt issue you are aware of? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c2 James Fehlig <jfehlig@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- InfoProvider|jfehlig@suse.com |marcus.husar@googlemail.com --- Comment #2 from James Fehlig <jfehlig@suse.com> 2012-06-22 22:13:39 UTC --- (In reply to comment #0)
After a period of time my KVM guests (OpenSuSE 12.1 and Debian Squeeze, both x64_64) appear to be unresponsive. The host system uses OpenSuSE 12.1, too.
Do you mean the guests are hung? E.g. can't ping, ssh, interact via serial console, etc? Do you notice any problems on the host? Is libvirt responsive? E.g. virsh list, virsh dominfo, ...?
After some additional searching I found a second matching bug report: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=757382
That bug is a libvirtd hang. Do you see that? Even so, guests shouldn't become unresponsive. You can start guests and then kill off libvirtd afterall.
Some lines in my libvirtd.log are found in the Fedora bug report above.
18:18:30.967: 2393: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error 18:18:36.960: 2393: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error 18:19:04.000: 2394: error : qemuDomainObjBeginJobInternal:772 : Timed out during operation: cannot acquire state change lock
I don't think these are fatal. Should be warning or something.
18:19:05.893: 2393: error : qemuMonitorIO:577 : internal error End of file from monitor
This is always seen when a guest terminates.
18:19:15.232: 2395: warning : qemuDomainObjTaint:1110 : Domain id=3 name='virtual1' uuid=7c7f469c-3d57-7bf5-c81a-d815966a948b is tainted: high-privileges
And seen when the guest starts if /etc/libvirt/qemu.conf is configured with user="root and group="root", which you must have set since the default for 12.1 is to launch qemu instances as qemu:qemu.
When I try to restart the guest (virsh restart virtual1) it says: Fehler: Neustart der Domain virtual1 gescheitert (restart of domain failed) Fehler: Timed out during operation: cannot acquire state change lock
Is a qemu-kvm process still running for the guest? What state does libvirt think the guest is in (virsh dominfo)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c3 --- Comment #3 from Bruce Rogers <brogers@suse.com> 2013-03-11 16:41:27 UTC --- Hi Marcus - we haven't heard back from you on this issue. If this is still an issue for you, please respond to Jim's queries so we can help resolve it, otherwise we'll have to close it as "no response" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=763898 https://bugzilla.novell.com/show_bug.cgi?id=763898#c4 --- Comment #4 from Bruce Rogers <brogers@suse.com> 2013-05-07 17:12:04 UTC --- Marcus, Any additional info you can provide? Otherwise we'll have to close as no response. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com