[Bug 987668] New: libvirtd: Failed to notify systemd
http://bugzilla.opensuse.org/show_bug.cgi?id=987668 Bug ID: 987668 Summary: libvirtd: Failed to notify systemd Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Virtualization:Other Assignee: virt-bugs@suse.de Reporter: ebischoff@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- After an upgrade of Leap 12.1, more or less on July 1st, a problem appeared when trying to restart libvirtd with # systemctl restart libvirtd this commands times out after a while. In the background, libvirtd daemon is restarted at regular intervals (the PID keeps changing). In /var/log/messages, one finds every two minutes: 2016-07-05T14:44:15.376938+02:00 tegan systemd[1]: Starting Virtualization daemon... 2016-07-05T14:44:15.425527+02:00 tegan libvirtd[15535]: libvirt version: 2.0.0 2016-07-05T14:44:15.426024+02:00 tegan libvirtd[15535]: hostname: tegan 2016-07-05T14:44:15.426555+02:00 tegan libvirtd[15535]: Failed to notify systemd Problem is fairly similar, at least in its symptoms, to: https://bugzilla.redhat.com/show_bug.cgi?id=1314881 Version info: $ rpm -q libvirt libvirt-2.0.0-590.1.x86_64 $ rpm -q systemd systemd-210-95.1.x86_64 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
Eric Bischoff
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c1
James Fehlig
After an upgrade of Leap 12.1,
I guess you mean 42.1. And what do you mean by "an upgrade"? Did you just update some packages from your configured repos? (BTW, the OBS Virtualization repo must be one of them to get libvirt 2.0.0.)
2016-07-05T14:44:15.426555+02:00 tegan libvirtd[15535]: Failed to notify systemd
Heh, that's not very helpful :). Starting libvirtd manually with 'export LIBVIRT_DEBUG=1; /usr/sbin/libvirtd -l' might give something useful.
Problem is fairly similar, at least in its symptoms, to:
Symptoms do sound the same, but the underlying problem must be different. In the RH bug, libvirtd would not start with systemd > v229. libvirt.git commit c0bc1723, which is included in libvirt 2.0.0, fixed that issue.
Version info:
$ rpm -q libvirt libvirt-2.0.0-590.1.x86_64
$ rpm -q systemd systemd-210-95.1.x86_64
Here we have a "fixed" libvirt with older systemd. Does systemd v210 support sending a ready message to $NOTIFY_SOCKET? Is the NOTIFY_SOCKET env variable set correctly on the system? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c2
--- Comment #2 from Eric Bischoff
After an upgrade of Leap 12.1,
I guess you mean 42.1.
Yes, sorry.
And what do you mean by "an upgrade"?
zypper ref && zypper up
Did you just update some packages from your configured repos? (BTW, the OBS Virtualization repo must be one of them to get libvirt 2.0.0.)
Yes, among the repos: Virtualization (openSUSE_Leap_42.1)
2016-07-05T14:44:15.426555+02:00 tegan libvirtd[15535]: Failed to notify systemd
Heh, that's not very helpful :).
Yes, tell the guys who do such messages :-)
Starting libvirtd manually with 'export LIBVIRT_DEBUG=1; /usr/sbin/libvirtd -l' might give something useful.
It just works and the output contains: 2016-07-05 17:55:04.617+0000: 32720: debug : virSystemdNotifyStartup:504 : Skipping systemd notify, not requested I understand the problem is not in libvirtd, but in its interaction with systemd. It fails notifying systemd that everything went fine. When running the daemon manually, it just tells us it does not try to notify systemd.
Problem is fairly similar, at least in its symptoms, to:
Symptoms do sound the same, but the underlying problem must be different. In the RH bug, libvirtd would not start with systemd > v229. libvirt.git commit c0bc1723, which is included in libvirt 2.0.0, fixed that issue.
Yes.
$ rpm -q libvirt libvirt-2.0.0-590.1.x86_64
$ rpm -q systemd systemd-210-95.1.x86_64
Here we have a "fixed" libvirt with older systemd. Does systemd v210 support sending a ready message to $NOTIFY_SOCKET?
Probably yes, since man 3 sd_notify on that machine mentions NOTIFY_SOCKET environment variable.
Is the NOTIFY_SOCKET env variable set correctly on the system?
# env | grep NOTIFY => no answer But a manual run of systemd-notify works. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c3
Eric Bischoff
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c4
James Fehlig
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c5
Eric Bischoff
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c6
--- Comment #6 from Eric Bischoff
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virsystemd.c; h=871db7ee91735fe9ff1330fcbb71501fa4720a63;hb=HEAD#l503
Thanks, it's clearer now.
When notification is attempted a bit further down at line 524, you can see that sendmsg failed when starting libvirtd through systemd. Can you enable debug in /etc/libvirt/libvirtd.conf, start libvirtd via systemd, and see if we can determine why sendmsg failed?
Attached, but no more useful information. Any chance to get a recompiled version that would print the notify socket path? Or is there a way to know how systemd sets the NOTIFY_SOCKET environment variable? I googled and did not find so far.
Sorry I don't have advice on collecting debug info from systemd, but it would be nice to have such info from the systemd side as well.
If sendmsg failed, I doubt systemd even noticed anything... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c7
--- Comment #7 from Eric Bischoff
Or is there a way to know how systemd sets the NOTIFY_SOCKET environment variable? I googled and did not find so far.
form systemd sources: m->notify_socket = strappend(e, "/systemd/notify"); Curious to see wether we really get this path. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c8
--- Comment #8 from Eric Bischoff
from systemd sources:
m->notify_socket = strappend(e, "/systemd/notify");
If I look at a SLE12 SP2 build 1451, there is a file /run/systemd/notify If I look at the processes listening on this socket, it's process of pid 1 (init, ie systemd). But on my leap 42.1 this file does not exist. OK, we know the problem. Now, what is the root cause? :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c9
James Fehlig
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c10
Franck Bui
systemd-maintainers: libvirt recently switched to directly notifying systemd of its readiness instead of using sd_notify
http://libvirt.org/git/?p=libvirt.git;a=commit; h=c0bc172383c2c955394589e5808457935ae06f1d
IMHO, that's just the wrong thing to do...
This works fine on SLE12 SP2, where we've noticed the presence of /run/systemd/notify for $NOTIFY_SOCKET. But the file doesn't exist on SLE12 SP1, Leap 42.1, or 13.2, which all have systemd v210. Is directly notifying systemd of service readiness via $NOTIFY_SOCKET supported on v210? Thanks.
Well, they decide to use their own implementation of sd_notify() so you're basically on your own now, sorry. And it will surely break later when systemd will decide to change any bits of the implementation of the sd_notify() protocol. (And of course sd_notify() is supported by systemd v210). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c11
--- Comment #11 from Eric Bischoff
IMHO, that's just the wrong thing to do...
Well, they decide to use their own implementation of sd_notify() so you're basically on your own now, sorry. And it will surely break later when systemd will decide to change any bits of the implementation of the sd_notify() protocol.
(And of course sd_notify() is supported by systemd v210).
I can understand why they did it: to remove dependency to systemd libraries. With respect to future changes, in the patch comments they say "this is a stable ABI from systemd's POV which explicitly allows independent implementations". It resolves https://bugzilla.redhat.com/show_bug.cgi?id=1314881. So it's not a gratuitous change just for the pleasure of reimplementing things :-) . -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c12
James Fehlig
(In reply to James Fehlig from comment #9)
systemd-maintainers: libvirt recently switched to directly notifying systemd of its readiness instead of using sd_notify
http://libvirt.org/git/?p=libvirt.git;a=commit; h=c0bc172383c2c955394589e5808457935ae06f1d
IMHO, that's just the wrong thing to do...
Why? systemd docs even claim it is independently reimplementable https://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndSta...
This works fine on SLE12 SP2, where we've noticed the presence of /run/systemd/notify for $NOTIFY_SOCKET. But the file doesn't exist on SLE12 SP1, Leap 42.1, or 13.2, which all have systemd v210. Is directly notifying systemd of service readiness via $NOTIFY_SOCKET supported on v210? Thanks.
Well, they decide to use their own implementation of sd_notify() so you're basically on your own now, sorry.
Why, when systemd supports other implementations?
And it will surely break later when systemd will decide to change any bits of the implementation of the sd_notify() protocol.
systemd broke it in the first place. To avoid future breakages, libvirt decided to provide its own implementation, which systemd supports.
(And of course sd_notify() is supported by systemd v210).
But what about using the NOTIFY_SOCKET? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c13
--- Comment #13 from Franck Bui
With respect to future changes, in the patch comments they say "this is a stable ABI from systemd's POV which explicitly allows independent implementations".
Ok I didn't know that the protocol was part of a the stable API.
So it's not a gratuitous change just for the pleasure of reimplementing things :-) .
Well, IMHO it is. The compat libs have been deprecated for a while and the required change was trivial since it just a matter to link against libsystemd.so instead of libsystemd-daemon.so. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c14
Franck Bui
Why? systemd docs even claim it is independently reimplementable
https://www.freedesktop.org/wiki/Software/systemd/ InterfacePortabilityAndStabilityChart/
Ok thanks, I didn't know that the protocol is part of the stable API :)
systemd broke it in the first place. To avoid future breakages, libvirt decided to provide its own implementation, which systemd supports.
Well systemd simply deprecated the old compat lib for a while and then decided that it was time now to drop those libs. Not sure to understand why libvirt didn't cope with this change since the modifications required were minimal and trivial. Instead libvirt decided to do their own implementation which doens't work on Leap for some reasons...
But what about using the NOTIFY_SOCKET?
NOTIFY_SOCKET is part of the API... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c15
James Fehlig
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c16
Ciro Iriarte
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c17
Franck Bui
Is it possible to get NOTIFY_SOCKET fixed in systemd v210? The problem not only affects Leap, but SLE12 SP1 and openSUSE13.2.
huh ? to fix what exactly ? why do think systemd is the culprit here ? sd_notify() works perfectly well with v210, doesn't it ? Eric already gave you a hint BTW: "If sendmsg failed, I doubt systemd even noticed anything..." At least it would be interesting to know why sendmsg() failed and which return status it did return, that would be the very first step before blaming systemd...
BTW, one reason libvirt decided to do their own implementation is to drop the dependency on libsystemd{,-daemon}. Why link against the library for such a tiny function that is allowed to be reimplemented by services? It would make sense if libvirt was using other stuff from the library.
Because: - when installing libvirt-daemon on a Suse distro, it's very likely you'll find the lib already installed. - no need to deal with new regressions/bugs (even if the function is "tiny") -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c18
James Fehlig
At least it would be interesting to know why sendmsg() failed and which return status it did return,
Connection refused. But it turned out to be a problem with the calculation of the size of sockaddr_un for abstract socket addresses. I've sent a patch upstream that fixes the issue in my testing https://www.redhat.com/archives/libvir-list/2016-July/msg00375.html
that would be the very first step before blaming systemd...
Sorry, didn't mean to offend. Please accept my apologies. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c19
--- Comment #19 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=987668
http://bugzilla.opensuse.org/show_bug.cgi?id=987668#c20
James Fehlig
participants (1)
-
bugzilla_noreply@novell.com