On Tue, 2017-01-17 at 06:54 +0300, Andrei Borzenkov wrote:
You cannot restart D-Bus daemon at all without breaking running system. I am not sure also that logind is restartable without losing existing sessions.
I looked at logs and systemd was reloaded in the middle of update. Normally it should not cause issues, but there were known cases of misbehavior during daemon-reload. Unfortunately it is too late to find the reason.
Indeed, on my system, "systemctl daemon-reload" happened 9 times during
the update which eventually crashed. But that alone wasn't fatal. The
fatal problem, as you already conjectured, was the restart of the dbus
service, which caused various other services to be stopped and
restarted as well, including systemd-logind. systemd and other services
failed to create dbus connections. In the wake of these events, the gdm
session and X crashed. systemd was again reloaded while services were
being restarted. systemd started to emit the error message "Looping too
fast. Throttling execution a little" - probably while it was trying to
restart systemd-logind. The restart of the systemd-logind service
eventually failed, which explains why I wasn't able to log in on the
console to see what went wrong.
The logs show that rpm continued updating packages in spite of these
errors. Even the initrd seems to have been rebuilt. But no btrfs "post"
transaction snapshot has been created, so the zypper transaction didn't
fully succeed.
AFAICS, the problem was caused by the restart of the dbus service in the %postuninstall section of the dbus-1 package. On my system, DISABLE_RESTART_ON_UPDATE in /etc/sysconfig/services has the default value "no", so in a way this behaved as I configured it. But maybe the dbus service should be an exception from this rule, or should be controlled by a separate option, e.g. "ENABLE_DANGEROUS_RESTART_ON_UPDATE='yes'".
A log excerpt follows. I can provide a full journal if necessary.
Martin
Jan 16 09:30:13 systemd[1]: Reloading.
Jan 16 09:30:14 systemd[1]: Reloading.
Jan 16 09:30:40 systemd[1]: Reloading.
By itself, that didn't cause any trouble. As you suggested, the actual trouble started after the dbus update, as systemd commenced to stop all kinds of services depending on d-bus:
Jan 16 09:30:14 [RPM][7828]: install dbus-1-1.10.12-3.1.x86_64: success
Jan 16 09:30:14 systemd[1]: Reloading.
Jan 16 09:30:14 systemd[1]: Stopping D-Bus System Message Bus...
Jan 16 09:30:14 systemd[1]: Stopping Accounts Service...
Jan 16 09:30:14 systemd[1]: Stopping Bluetooth service...
Jan 16 09:30:14 systemd[1]: Stopping Network Manager...
Jan 16 09:30:14 systemd[1]: Stopping Login Service...
(... stopping more services ...)
Jan 16 09:30:14 /usr/lib/gdm/gdm-x-session[3025]: (EE)
Jan 16 09:30:14 /usr/lib/gdm/gdm-x-session[3025]: Fatal server error:
Jan 16 09:30:14 /usr/lib/gdm/gdm-x-session[3025]: (EE) systemd-logind disappeared (stopped/restarted?)
Jan 16 09:30:14 systemd[1]: Stopped D-Bus System Message Bus.
Jan 16 09:30:14 gdm[2073]: GdmLocalDisplayFactory: Failed to issue method call: The connection is closed
Jan 16 09:30:14 gdm[2073]: GLib: g_hash_table_find: assertion 'version == hash_table->version' failed
Jan 16 09:30:14 gdm[2073]: Tried to look up non-existent conversation gdm-launch-environment
Jan 16 09:30:14 gdm[2073]: Freeing conversation 'gdm-launch-environment' with active job
Jan 16 09:30:14 gdm[2073]: GdmLocalDisplayFactory: Failed to issue method call: The connection is closed
Jan 16 09:30:15 /usr/lib/gdm/gdm-x-session[4016]: (EE) systemd-logind: ReleaseControl failed: Connection was disconnected before a reply was received
Jan 16 09:30:15 /usr/lib/gdm/gdm-x-session[4016]: (EE) Server terminated with error (1). Closing log file.
Jan 16 09:30:15 unknown[4293]: gnome-software: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.
Jan 16 09:30:15 systemd[1]: Stopped Login Service.
Jan 16 09:30:15 systemd[1]: Closed D-Bus System Message Bus Socket.
Jan 16 09:30:15 systemd[1]: Stopping D-Bus System Message Bus Socket.
Jan 16 09:30:15 systemd[1]: Listening on D-Bus System Message Bus Socket.
Jan 16 09:30:15 ekiga.desktop[4356]: ekiga: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.
Jan 16 09:30:15 systemd[1]: Starting Daemon for power management...
Jan 16 09:30:15 systemd[1]: Started D-Bus System Message Bus.
Jan 16 09:30:15 systemd[1]: Failed to subscribe to NameOwnerChanged signal for 'org.freedesktop.NetworkManager': Device or resource busy
Jan 16 09:30:15 systemd[1]: Failed to subscribe to NameOwnerChanged signal for 'org.freedesktop.login1': Device or resource busy
Jan 16 09:30:15 systemd[1]: Failed to subscribe to NameOwnerChanged signal for 'org.bluez': Device or resource busy
Jan 16 09:30:15 [RPM][7828]: erase dbus-1-1.10.12-2.1.x86_64: success
Jan 16 09:30:15 [RPM][7828]: Transaction ID 587c8495 finished: 0
Jan 16 09:30:15 [RPM][7914]: Transaction ID 587c8497 started
Jan 16 09:30:15 [RPM][7914]: install libpulse0-32bit-9.99.1-1.1.x86_64: success
Jan 16 09:30:15 [RPM][7914]: erase libpulse0-32bit-9.0-2.3.x86_64: success
Jan 16 09:30:15 [RPM][7914]: Transaction ID 587c8497 finished: 0
Jan 16 09:30:15 [RPM][7917]: Transaction ID 587c8497 started
Jan 16 09:30:15 gnome-shell[4157]: gnome-shell: Fatal IO error 4 (Interrupted system call) on X server :1.
Jan 16 09:30:15 org.gtk.vfs.Daemon[4025]: A connection to the bus can't be made
...
Jan 16 09:30:21 [RPM][8088]: Transaction ID 587c849d started
Jan 16 09:30:21 [RPM][8088]: install python-xml-2.7.13-1.1.x86_64: success
Jan 16 09:30:21 [RPM][8088]: erase python-xml-2.7.12-1.5.x86_64: success
Jan 16 09:30:21 [RPM][8088]: Transaction ID 587c849d finished: 0
...
Jan 16 09:30:40 systemd[1]: Failed to subscribe to activation signal: Connection timed out
Jan 16 09:30:40 systemd[1]: Failed to register name: Connection timed out
Jan 16 09:30:40 systemd[1]: Failed to set up API bus: Connection timed out
Jan 16 09:30:40 systemd[1]: Starting WPA Supplicant daemon...
[...]
Jan 16 09:30:40 systemd[1]: Starting Login Service...
Jan 16 09:30:40 systemd[1]: Reloading.
Jan 16 09:30:40 [RPM][8895]: Transaction ID 587c84b0 started
Jan 16 09:30:40 [RPM][8895]: install xf86-video-nouveau-1.0.13-2.2.x86_64: success
Jan 16 09:30:40 [RPM][8895]: erase xf86-video-nouveau-1.0.13-2.1.x86_64: success
Jan 16 09:30:40 [RPM][8895]: Transaction ID 587c84b0 finished: 0
Jan 16 09:30:40 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:30:41 [RPM][8899]: Transaction ID 587c84b1 started
Jan 16 09:30:41 [RPM][8899]: install xf86-video-fbdev-0.4.4-3.10.x86_64: success
Jan 16 09:30:41 [RPM][8899]: erase xf86-video-fbdev-0.4.4-3.9.x86_64: success
Jan 16 09:30:41 [RPM][8899]: Transaction ID 587c84b1 finished: 0
Jan 16 09:30:42 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:30:44 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:30:46 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:30:59 os-prober[26917]: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sda1
Jan 16 09:31:04 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:31:05 systemd-logind[8821]: Failed to add match for JobRemoved: Connection timed out
Jan 16 09:31:05 systemd-logind[8821]: Failed to fully start up daemon: Connection timed out
Jan 16 09:31:05 systemd[1]: systemd-logind.service: Main process exited, code=exited, status=1/FAILURE
Jan 16 09:31:05 systemd[1]: Failed to start Login Service.
Jan 16 09:31:05 systemd[1]: systemd-logind.service: Unit entered failed state.
Jan 16 09:31:05 systemd[1]: systemd-logind.service: Failed with result 'exit-code'.
Jan 16 09:31:05 systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Jan 16 09:31:08 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:31:09 systemd[1]: Looping too fast. Throttling execution a little.
Jan 16 09:31:10 systemd[1]: Looping too fast. Throttling execution a little.
--
Dr. Martin Wilck