[Bug 1174365] New: VM does not shutdown on host reboot
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365 Bug ID: 1174365 Summary: VM does not shutdown on host reboot Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KVM Assignee: kvm-bugs@suse.de Reporter: andihartmann@freenet.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Given is KDE / Plasma 5 with Leap 15.2 and 4 running VMs on the same host. Requirement: On Shutdown or reboot via KDE, the VMs should be shutdown (and not suspend). This has been working fine until Leap 15.1 Problem: On shutdown via KDE or sddm, the VMs are suspended before the systemd service, which previously started them, is called to shutdown them. The called "virsh shutdown" in the systemd service just says, that the VM wouldn't run any more. Analysis: KDE and sddm seem to do a reboot / shutdown via "systemctl reboot / shutdown". If I'm rebooting with "reboot" or "shutdown", the process works as expected again: all VMs can be shutdown and are not suspended before. Solutions: Remove sddm and use kdm - kdm can be configured how to exactly execute restart / shutdown. Questions: - How can I configure KDE to use reboot / shutdown instead of systemctl reboot... ? or - How can I block suspension of the VM in the XML definition of the VM (if a VM uses a PCIE device via vfio, the VM refuses to suspend and can be shutdown afterwards as usual via the existing systemd service)? or - Any other idea how to solve this problem? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c1
--- Comment #1 from Klaus Mueller
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c15
James Fehlig
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c16
Klaus Mueller
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c17
--- Comment #17 from James Fehlig
The question here is: who or what (and how) informs a running libvirtd (completely outside of the defined shutdown process of systemd) about the shutdown of the host, which in turn suspends the VMs belonging to this libvirtd instance? And why can't this be disabled?
Yes, who is asking libvirtd to suspend the VMs is the key question. AFAIK, the libvirt-guests service is the only thing that will suspend VMs on host shutdown. It is disabled by default, and you also verified it was disabled on your system. And FYI, libvirt-guests default action is to shutdown running VMs at host shutdown, not suspend them. Maybe we can get an idea about who is asking libvirtd to suspend the VMs by enabling debug logging in the remote driver and API. E.g. a log_filters in libvirtd.conf along the lines of log_filters="1:remote 1:libvirt"
BTW: If the VM contains a passed through PCIe card (something like a PCIe network card), e.g., the suspend isn't executed (it gives an error), because VMs containing passed through devices can't be suspended - those VMs are therefore always processed via the subsequent service definition and not before!
Right. VMs with physical hardware passed through cannot be suspended (aka saved) or migrated.
Another bad thing (that's maybe the primary point, why it is a problem to suspend VMs at all): If the suspended VM is restarted again, the VM proceeds with the wrong time (it is the time it has been shutdown). I found no way to fix the time on resume.
Yeah, that's a classic problem with save/restore. NTP in the guest helps to some extent. If the guest has been suspended for long periods of time, 'virsh domtime ...' or the qemu guest agent 'guest-set-time' command can be used to correct the guest's clock https://qemu.readthedocs.io/en/latest/interop/qemu-ga-ref.html#qapidoc-19 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c18
--- Comment #18 from Klaus Mueller
Maybe we can get an idea about who is asking libvirtd to suspend the VMs by enabling debug logging in the remote driver and API. E.g. a log_filters in libvirtd.conf along the lines of
log_filters="1:remote 1:libvirt"
Very good idea and thanks for this hint! Test gives: 2021-10-12 06:32:50.862+0000: 8176: debug : handleSystemMessageFunc:579 : dmn=0x563dd634c820 2021-10-12 06:32:50.863+0000: 9102: debug : daemonStopWorker:518 : Begin stop dmn=0x563dd634c820 ... 2021-10-12 06:32:50.863+0000: 9102: debug : virConnectOpenInternal:1128 : driver 8 QEMU returned SUCCESS 2021-10-12 06:32:50.863+0000: 9102: debug : virConnectListAllDomains:6642 : conn=0x7fea100073c0, domains=0x7fe9d49eacb0, flags=0x1 2021-10-12 06:32:50.863+0000: 9102: debug : virDomainGetState:2487 : dom=0x7fe9a0003410, (VM: name=CentOS-7.x, uuid=12f858d9-e0cd-a352-1a43-349ab1a7cd21), state=0x7fe9d49eacac, reason=(nil), flags=0x0 2021-10-12 06:32:50.863+0000: 9102: debug : virDomainSuspend:623 : dom=0x7fe9a0003410, (VM: name=CentOS-7.x, uuid=12f858d9-e0cd-a352-1a43-349ab1a7cd21) 2021-10-12 06:32:51.008+0000: 9102: debug : virDomainManagedSave:9547 : dom=0x7fe9a0003410, (VM: name=CentOS-7.x, uuid=12f858d9-e0cd-a352-1a43-349ab1a7cd21), flags=0x2 2021-10-12 06:32:53.294+0000: 9102: debug : daemonStopWorker:522 : Completed stop dmn=0x563dd634c820 (Leap 15.3 host stops Centos 7 VM) Searching for handleSystemMessageFunc in https://github.com/paboldin/libvirt/blob/master/daemon/libvirtd.c gives: static DBusHandlerResult handleSystemMessageFunc(DBusConnection *connection ATTRIBUTE_UNUSED, DBusMessage *message, void *opaque) ... Now it's sure, that a message sent via DBUS initiated the stop of the VM. This interferes with systemd stop services (-> racy). The first one wins ... .
Yeah, that's a classic problem with save/restore. NTP in the guest helps to some extent. If the guest has been suspended for long periods of time, 'virsh domtime ...' or the qemu guest agent 'guest-set-time' command can be used to correct the guest's clock
https://qemu.readthedocs.io/en/latest/interop/qemu-ga-ref.html#qapidoc-19
That's a very helpful hint, too! Thanks for it! I tested it and its working. Do I understand that I have to execute the guest-set-time myself? Maybe there is a switch to enable the execution on startup automatically? Or maybe there is a possibility to add "scripts" to be started after a VM has been resumed? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c19
--- Comment #19 from James Fehlig
Searching for handleSystemMessageFunc in https://github.com/paboldin/libvirt/blob/master/daemon/libvirtd.c gives:
static DBusHandlerResult handleSystemMessageFunc(DBusConnection *connection ATTRIBUTE_UNUSED, DBusMessage *message, void *opaque) ...
Now it's sure, that a message sent via DBUS initiated the stop of the VM. This interferes with systemd stop services (-> racy). The first one wins ... .
Heh, and I'm fine admitting I was wrong about libvirt-guests being the only way to suspend a VM. It is true for the system libvirtd running as root, but not session libvirtds running unprivileged https://gitlab.com/libvirt/libvirt/-/commit/b88b171731b6c00cd04c7ffd79b04ccd... That commit registers callbacks with dbus, but only if the daemon is unprivileged. The callbacks eventually invoke virStateStop for each driver, and in the case of qemu, the function indeed suspends (saves) all running/paused VMs https://gitlab.com/libvirt/libvirt/-/commit/8f9a69317daca80c64e7734c5d08186e... AFAICT, there is no way to change the behavior via config files, env vars, etc. Seems reasonable to allow it though. E.g. similar to libvirt-guests, allow shutting down VMs instead of suspend. Would you be willing to report this bug in the upstream issue tracker, where other libvirt devs can chime in? I can do it, but would like to avoid being a message broker only slowing down communication https://gitlab.com/libvirt/libvirt/-/issues/new
Yeah, that's a classic problem with save/restore. NTP in the guest helps to some extent. If the guest has been suspended for long periods of time, 'virsh domtime ...' or the qemu guest agent 'guest-set-time' command can be used to correct the guest's clock
https://qemu.readthedocs.io/en/latest/interop/qemu-ga-ref.html#qapidoc-19
That's a very helpful hint, too! Thanks for it! I tested it and its working. Do I understand that I have to execute the guest-set-time myself? Maybe there is a switch to enable the execution on startup automatically? Or maybe there is a possibility to add "scripts" to be started after a VM has been resumed?
I vaguely recall some facility for automating guest-set-time, but can't find any info on it now. My recollection could also be wrong. There are libvirt hooks, including one for "restore begin", but not "restore end". Also, I'm not sure if hooks work with unprivileged daemons https://libvirt.org/hooks.html There's probably other ways to automate it, e.g. registering for domain events from libvirt and running guest-set-time after the VM is running and the guest agent is alive. I'd also suggest asking on the livirt-users@redhat.com list, where you might get a better answer :-). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c20
--- Comment #20 from Klaus Mueller
AFAICT, there is no way to change the behavior via config files, env vars, etc. Seems reasonable to allow it though. E.g. similar to libvirt-guests, allow shutting down VMs instead of suspend.
I'm now testing automation with a VM which most probably can be paused / saved without any issues on shutdown of the host. That's pretty easy for me as the existing automation had just to be modified a little bit (remove the killing of libvirtd after startup and adding the guest-set-time instead).
Would you be willing to report this bug in the upstream issue tracker, where other libvirt devs can chime in? I can do it, but would like to avoid being a message broker only slowing down communication
I have to think about it as I'm currently not a member of gitlab. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c21
--- Comment #21 from James Fehlig
I'm now testing automation with a VM which most probably can be paused / saved without any issues on shutdown of the host. That's pretty easy for me as the existing automation had just to be modified a little bit (remove the killing of libvirtd after startup and adding the guest-set-time instead).
If this works out for you, and you're hesitant to raise the issue upstream, maybe just close the bug. Configurable VM state handling on host shutdown when using unprivileged libvirtd is really an upstream feature request IMO. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365
http://bugzilla.opensuse.org/show_bug.cgi?id=1174365#c22
Klaus Mueller
participants (1)
-
bugzilla_noreply@suse.com