[Bug 1094762] New: Systemd timeouts while zypper dup upgrade from 42.3 - starts when old systemd package gets erased and replaced
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762 Bug ID: 1094762 Summary: Systemd timeouts while zypper dup upgrade from 42.3 - starts when old systemd package gets erased and replaced Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Upgrade Problems Assignee: bnc-team-screening@forge.provo.novell.com Reporter: abittner@opensuse.org QA Contact: jsrain@suse.com Found By: --- Blocker: --- I am apparently having major trouble while upgrading a normal 42.3 system via zypper dup to 15.0 I have even found some other people talking about this behavior in a very recent thread on opensuse ml: https://lists.opensuse.org/opensuse/2018-05/msg00801.html I am having the very same troubles as the thread starter. All following upgrades of packages that are systemd or services related seem to fail to talk to the crashed or non responsive systemd main components. journalctl still works and I have traced it to some following loglines: ..... May 26 22:49:40 dpg-ppb-smu-tux01 [RPM][14252]: erase libldap-2_4-2-2.4.44-18.1.x86_64: success May 26 22:49:40 dpg-ppb-smu-tux01 [RPM][14252]: install libldap-2_4-2-2.4.46-lp150.7.1.x86_64: success May 26 22:49:40 dpg-ppb-smu-tux01 [RPM][14252]: Transaction ID 5b09c864 finished: 0 May 26 22:49:40 dpg-ppb-smu-tux01 [RPM][14255]: Transaction ID 5b09c864 started May 26 22:49:40 dpg-ppb-smu-tux01 [RPM][14255]: erase libcurl4-7.37.0-36.1.x86_64: success May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14255]: install libcurl4-7.59.0-lp150.1.1.x86_64: success May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14255]: erase libcurl4-7.37.0-36.1.x86_64: success May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14255]: install libcurl4-7.59.0-lp150.1.1.x86_64: success May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14255]: Transaction ID 5b09c864 finished: 0 May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14258]: Transaction ID 5b09c865 started May 26 22:49:41 dpg-ppb-smu-tux01 [RPM][14258]: erase systemd-228-50.1.x86_64: success May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses..... ..... when the zypper dup erases the systemd-228-50.1.x86_64 package from 42.3 it then installs the new systemd package from leap 15.0 I suppose and the trouble begins there. .... May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus[1011]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 dbus-daemon[1011]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses May 26 22:49:41 dpg-ppb-smu-tux01 polkitd[1099]: Reloading rules May 26 22:49:42 dpg-ppb-smu-tux01 systemd-sysusers[14264]: Creating group systemd-coredump with gid 480. May 26 22:49:42 dpg-ppb-smu-tux01 systemd-sysusers[14264]: Creating user systemd-coredump (systemd Core Dumper) with uid 480 and gid 480. May 26 22:49:42 dpg-ppb-smu-tux01 nscd[1035]: 1035 monitored file `/etc/group` was moved into place, adding watch May 26 22:49:42 dpg-ppb-smu-tux01 nscd[1035]: 1035 monitored file `/etc/passwd` was moved into place, adding watch May 26 22:49:42 dpg-ppb-smu-tux01 nscd[1035]: 1035 monitored file `/etc/group` was written to May 26 22:49:42 dpg-ppb-smu-tux01 nscd[1035]: 1035 monitored file `/etc/passwd` was written to May 26 22:49:42 dpg-ppb-smu-tux01 polkitd[1099]: Collecting garbage unconditionally... May 26 22:49:42 dpg-ppb-smu-tux01 systemd[1]: Reexecuting. ..... Can I be of any help on how to track down the root cause of this bug? I am worried that my production system is now in some limbo state whenever this whole zypper dup eventually might finish. The thread speaks about successfull reboots after a very long zypper dup upgrade process though, but I am having important data on this machine and I am unsure if the dup process really goes through properly with all packages even when systemd is nonresponsive. Any emergency hints? Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c1
--- Comment #1 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c2
--- Comment #2 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c3
--- Comment #3 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c4
Tomáš Chvátal
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c5
--- Comment #5 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c6
Franck Bui
I am rarely doing this but how is a later bug and more importantly less extensive bug a duplicate of an earlier reported bug referencing all the relevant places and discussions.
Indeed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c7
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c8
--- Comment #8 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c9
--- Comment #9 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c10
--- Comment #10 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c11
--- Comment #11 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c12
--- Comment #12 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c13
--- Comment #13 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c14
--- Comment #14 from Franck Bui
um weird, I am inside the KDE Desktop (still un-rebooted) and inside a konsole there as root and am running this command to see the logs:
dpg-ppb-smu-tux01:~ # journalctl -xb -S "2018-05-26 22:00"
there is the -b isnt it? am I using wrong parameters?
It should be fine then, but please don't use -x as it makes the logs harder to read IMHO. Please as asked previously attached the full output of the command. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c15
--- Comment #15 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c16
--- Comment #16 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c17
--- Comment #17 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c18
--- Comment #18 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c19
--- Comment #19 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c20
--- Comment #20 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c21
--- Comment #21 from Franck Bui
If you wish, I can send you the entire VM. I have taken a snapshot while this was happening.
Yes please. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c22
--- Comment #22 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c23
--- Comment #23 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c24
--- Comment #24 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c25
--- Comment #25 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c26
--- Comment #26 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c27
--- Comment #27 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c28
--- Comment #28 from Franck Bui
Sorry, google drive sharing only works with G+ users.
Ok, to make it simple I sent you my gmail address privately. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c29
Franck Bui
May 27 11:39:24 linux-phe7 dbus-daemon[693]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
PID1 doesn't seem to have crashed. It might be worth noting that polkit and dbus are reloading their configuration while systemd is reexecuted (due to the package upgrade). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c30
--- Comment #30 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c31
--- Comment #31 from Carlos Robinson
(In reply to Carlos Robinson from comment #27)
Sorry, google drive sharing only works with G+ users.
Ok, to make it simple I sent you my gmail address privately.
Ok, done. It is not the snapshot only, it is the whole virtual box machine directory. Disk image, machine definition, suspend data, one snapshot, logs: all in tar bz2 archive. The machine is suspended (Vbox halt with saved state) with zypper dup running and stalling. Unfortunately, I don't have a snapshot made *before* the upgrade :-( Perhaps a fresh install of the beta would exhibit the symptoms? I have the beta ISO still saved. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c32
--- Comment #32 from Franck Bui
Perhaps a fresh install of the beta would exhibit the symptoms? I have the beta ISO still saved.
Could you try that ? if you can reproduce it might be easier for me to reproduce with the beta ISO instead. Thanks ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c33
--- Comment #33 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c34
--- Comment #34 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c35
--- Comment #35 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c36
--- Comment #36 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c37
--- Comment #37 from Franck Bui
I could try to install minimal X instead, or Gnome, with network disabled. :-?
Yes you could try with a minimal installation (without Xorg) and see if you can still reproduce. If so I'll try to get the same ISO. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c39
--- Comment #39 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c40
--- Comment #40 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c41
--- Comment #41 from Carlos Robinson
(In reply to Carlos Robinson from comment #36)
I could try to install minimal X instead, or Gnome, with network disabled. :-?
Yes you could try with a minimal installation (without Xorg) and see if you can still reproduce.
If so I'll try to get the same ISO.
Well, bad luck. I tried Server Pattern, which is text mode, and the issue did not appear on "zypper dup". Maybe tomorrow I'll try with the gnome pattern, time permitting. Did you find anything useful on my VM? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
Tomáš Chvátal
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c42
--- Comment #42 from Franck Bui
Did you find anything useful on my VM?
Yes systemd has crashed when it was updated. But unfortunately I couldn't figure out why because we had the "good" idea to disable coredump handling by default on Leap and strangely nothing was logged in the journal... So I couldn't any useful information about the crash. So far we know: 1. this happens when systemd is re-executed while being updated (v234 -> v234) 2. it isn't 100% reproducible 3. strangely the crash is not logged in the journal Something that might be related: dbus and polkit are reloading their configuration files while systemd is updated:
May 27 11:38:54 linux-phe7 dbus-daemon[9463]: Unable to set up transient service directory: XDG_RUNTIME_DIR "/run/user/0" not available: No such file or directory May 27 11:38:54 linux-phe7 dbus-daemon[9463]: [session uid=0 pid=9461] Reloaded configuration May 27 11:38:54 linux-phe7 dbus-daemon[7171]: [session uid=1000 pid=7171] Reloaded configuration May 27 11:38:54 linux-phe7 dbus-daemon[693]: [system] Reloaded configuration May 27 11:38:54 linux-phe7 dbus-daemon[693]: [system] Reloaded configuration May 27 11:38:55 linux-phe7 polkitd[813]: Reloading rules May 27 11:38:55 linux-phe7 polkitd[813]: Collecting garbage unconditionally... May 27 11:38:55 linux-phe7 polkitd[813]: Loading rules from directory /etc/polkit-1/rules.d May 27 11:38:55 linux-phe7 systemd[1]: Reexecuting. May 27 11:38:55 linux-phe7 polkitd[813]: Loading rules from directory /usr/share/polkit-1/rules.d May 27 11:38:56 linux-phe7 polkitd[813]: Finished loading, compiling and executing 4 rules May 27 11:38:56 linux-phe7 polkitd[813]: Reloading rules May 27 11:38:56 linux-phe7 polkitd[813]: Collecting garbage unconditionally... May 27 11:38:56 linux-phe7 systemd[1]: systemd 234 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN default-hierarchy=hybrid) May 27 11:38:56 linux-phe7 systemd[1]: Detected virtualization oracle. May 27 11:38:56 linux-phe7 systemd[1]: Detected architecture x86-64. May 27 11:38:56 linux-phe7 polkitd[813]: Loading rules from directory /etc/polkit-1/rules.d May 27 11:38:56 linux-phe7 polkitd[813]: Loading rules from directory /usr/share/polkit-1/rules.d May 27 11:38:56 linux-phe7 polkitd[813]: Finished loading, compiling and executing 4 rules
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c43
--- Comment #43 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c44
--- Comment #44 from Franck Bui
If you give me detailed instructions on how to set loglevels that will help you, I have still that one unupgraded machine here, I need to do something about it.
If you want to enable systemd debug logs at runtime you can use "systemd-analyze set-log-level debug".
I dont know though if the machine will be affected though.
What comes to my mind maybe is, is it possible that non-rebooted machines that e.g. had updates on 42.3 level (systemd and other stuff lately, not yet replaced and activated all files in use, e.g. non rebooted yet) and then going for a zypper dup with repositories swapped over to 15.0 URL addresses
Hmm is this scenario supported ? I mean should such upgrade are supposed to be done offline ? To answer your question I dont think it's related because Marcos seems to reproduce the issue when simply updating Leap15 Beta to Leap 15.0. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c45
--- Comment #45 from Franck Bui
Unfortunately, I don't have a snapshot made *before* the upgrade :-(
Unfortunately it's going to be hard to find out what was going on if we are not able to reproduce. I'm afraid that trying to do so with the debug logs enabled will change the timings and will prevent the crash to happen. The only idea I have currently would be to reproduce with the kernel coredump handling enabled (echo core >/proc/sys/kernel/core_pattern). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c46
--- Comment #46 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c47
--- Comment #47 from Franck Bui
Do i need to restart systemd for the debug level to take effect or anything?
You needn't. But beware that the journal is not persistent by default on Leap. That means that if for any reason you need to reboot, the journal contents from the previous boot will be lost. You can still make the journal peristent by doing: 1. mkdir /var/log/journal 2. systemctl restart systemd-journald.service -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c48
--- Comment #48 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c49
--- Comment #49 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c50
--- Comment #50 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c51
--- Comment #51 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c52
--- Comment #52 from Franck Bui
i think i made it :) -> ;(
Did you enable the debug logs ? Did you enable the coredump handling ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c53
--- Comment #53 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c54
--- Comment #54 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c55
--- Comment #55 from Franck Bui
only the systemd debug stuff you initially wrote, have seen that core dump variable too late.
Bad luck...
but isnt the journalctl -b output I pastes the stuff you wanted to see? isnt there anything in there?
It just seems that there're no debug logs in there. Could you attach the output of "journalctl -b" ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c56
--- Comment #56 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c58
--- Comment #58 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c59
--- Comment #59 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c60
--- Comment #60 from Franck Bui
Created attachment 771834 [details] journalctl -b output while still running zypper dup, systemd trouble around 15:17+ timestamp onward
journalctl -b output while still running zypper dup, systemd package became installed and then troubled around 2018-05-30 15:17+ timestamp onward methinks.
Sadly the log level is not kept/saved by PID1 when it is re-executed. IOW your logs don't provide more info than the ones already provided. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c62
--- Comment #62 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c63
--- Comment #63 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c66
--- Comment #66 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c67
--- Comment #67 from Franck Bui
i am doing the zypper dup from 42.2 to 42.3 atm, will only zypper up 42.3 afterwards and stop there, awaiting some more hints or requests i could help with for this bug.
Please before upgrading to 15.0 make sure to re-enable coredumps with: echo core >/proc/sys/kernel/core_pattern therefore if systemd crashes again you should find /core* coredump file. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c68
--- Comment #68 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c73
--- Comment #73 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c74
--- Comment #74 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c76
--- Comment #76 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c80
--- Comment #80 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c81
--- Comment #81 from Franck Bui
okay so with this test repo, what additional or fundamental steps for the best debug results should I follow?
if have the 42.2 -> 42.3 system here, can add (modify) all the 15. repos (oss, non-oss, update-oss, update-non-oss) and this special systemd repo
and set the systemd special repo to highest prio
and then zypper dup --download-in-advance
right?
Correct.
what kernel variables, scripts or debug stuff should i follow additonally? or is the special debug repo already including all that is needed?
The special repo should be enough, we'll generate a (more useful) core dump later if you manage to reproduce the issue. But I strongly advice you to try to reproduce on a *testing* system only. Thanks ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c82
--- Comment #82 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c83
--- Comment #83 from Franck Bui
yes it is a testing system.
side note: my other production system that was in the unrebooted state, i had the idea to reboot it eventually via reboot -f (reboot alone did nothing) but it never came back since :( so I need to visit on site and see where it hangs or what went wrong.
Possibly some services weren't enabled/disabled during the upgrade... it would help if you could get the logs.
my other production site has some mdraid and data that i dont dare yet to reboot maybe until we find out more about this bug.
At least let's see what the status of your other system that doesn't boot anymore. That said for production systems I wouldn't recommend to make such major upgrade "online" but use the ISO instead. Isn't that the official way to do such upgrades BTW ?
or would it be safe to manually fire up the systemd main (which exactly?) process somehow and re-execute all the processes from the zypper dup stage that wanted to interact with systemd?
Unfortunately I can't think of a way to restart systemd when it exited or crashed. Maybe some people on the systemd mailing list might have some idea...
it is totally unclear to me what I can do about my hanging pending production system where this bug happened during dup.
what is a possible workaround even now that the situation happened?
I'm afraid there're no workarounds at that points :( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c84
--- Comment #84 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c85
--- Comment #85 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c86
--- Comment #86 from Franck Bui
anyways it is now at around 17xx/27xx packages installing and didnt show timeouts or services and systemd problems I could notice.
or would there be more inside some logs no matter what? but it is rather speedyly installing upgrading so i figure it doesnt have systemd problems.
If systemd has exited unexpectedly, you would have noticed but you can make sure by issuing any systemctl commands: it should time out.
this is btrfs system now, which i never use in my normal machines.
what about that virtual machine image, did that use btrfs as well? maybe this is only on ext filesystems?
I doubt the file system has an impact here but I'll retry with ext4.
wondering of what to do next and how to somehow reproduce the bug situation?
I dunno :-/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c87
--- Comment #87 from Carlos Robinson
Carlos, Andreas any chance you can try to reproduce with the test package provided in comment #79 ?
I'm trying. I was somewhere else most of yesterday. The first step is cloning a VM of 42.3 that I have on Vmware Player and creating a copy on VBox instead (VB has snapshots, VmwP does not). This is half done, I have to start the new machine and adjust things till it runs. Then I'll do a snapshot and finally try the zypper dup to 15.0
I tried to do an upgrade from 42.3 to 15.0 but it worked just fine.
Carlos perhaps you can share the ISO you used to reproduce the issue ?
Sure. I'm uploading it to my google drive account and will share it publicly, for a limited time (till I need the space, probably). It says one minute left since minutes. [...] Ok! https://drive.google.com/open?id=1SrYm55VppdehlfYylWyAHh0GHmRDrl4_ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c88
--- Comment #88 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c89
--- Comment #89 from Franck Bui
Sure. I'm uploading it to my google drive account and will share it publicly, for a limited time (till I need the space, probably). It says one minute left since minutes. [...] Ok!
https://drive.google.com/open?id=1SrYm55VppdehlfYylWyAHh0GHmRDrl4_
Thanks Carlos ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c90
--- Comment #90 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c91
--- Comment #91 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c92
--- Comment #92 from Franck Bui
2018-06-06T16:27:33.192040+02:00 Eleanor-423 kernel: [14171.371792] lvm2-activation[29061]: segfault at e0 ip 00007fed2151d006 sp 00007ffe20b2e460 err or 4 in liblvm2app.so.2.2[7fed2150b000+ef000] 2018-06-06T16:27:33.680334+02:00 Eleanor-423 systemd-coredump[29070]: Due to PID 1 having crashed coredump collection will now be turned off.
Well it seems that systemd-coredump assumed that PID1 has crashed whereas it was simply reloading its config... which made the crash of lvm2-activation not recorded. But let's focus on the initial bug.
zypper dup started the rpm transactions about 2018-06-06T14:15:12, finished 2018-06-06T17:01:25
As you can see, about five hours, very slow even without crashes.
Well it's only 3 hours (not 5 hours) ;)
I will try, this afternoon, to apply (undo?) the snapshot and redo dup with the test repo added. Even if it runs slow, I don't have to tend to it much :-)
Perhaps make sure to reproduce without the test repo first ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c93
--- Comment #93 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c94
--- Comment #94 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c95
--- Comment #95 from Carlos Robinson
(In reply to andreas bittner from comment #43)
...
What comes to my mind maybe is, is it possible that non-rebooted machines that e.g. had updates on 42.3 level (systemd and other stuff lately, not yet replaced and activated all files in use, e.g. non rebooted yet) and then going for a zypper dup with repositories swapped over to 15.0 URL addresses
Hmm is this scenario supported ?
I mean should such upgrade are supposed to be done offline ?
Yes, IMHO, it is better to do them offline, but yes, that scenario is supported, and I think that it is more popular in fact. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c96
--- Comment #96 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c97
--- Comment #97 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c98
Eugene Suprun
Here are some log lines from the still ongoing zypper dup process, as an example, similar to the discussion thread, for example cups service failing after the upgrade thereof.
..... (1527/3403) Installing: Mesa-dri-nouveau-18.0.2-lp150.17.2.x86_64 ............................................................................. ........................[done] (1528/3403) Installing: cups-client-2.2.7-lp150.1.1.x86_64 ............................................................................. ...............................[done] (1529/3403) Installing: cups-2.2.7-lp150.1.1.x86_64 ............................................................................. ......................................[done] Additional rpm output: SysV service cups-lpd@ does not exist, skipping SysV service cups-lpd does not exist, skipping Failed to reload daemon: Activation of org.freedesktop.systemd1 timed out Failed to preset unit: Activation of org.freedesktop.systemd1 timed out Failed to preset unit: Activation of org.freedesktop.systemd1 timed out Failed to preset unit: Activation of org.freedesktop.systemd1 timed out systemd service cups-lpd.service does not exist. Failed to reload daemon: Activation of org.freedesktop.systemd1 timed out Failed to try-restart cups.service: Activation of org.freedesktop.systemd1 timed out See system logs and 'systemctl status cups.service' for details. .....
I had the same multi-hour zypper dup 42.3 -> 15.0 upgrade, fortunately at a test system, and I had made a backup. I made that test system 42.3 -> 15.0 updrade once again, but this time first I rebooted 42.3 at rescue mode, then rcnetwork start, zypper up, sed -i 's/42.3/15.0/g' /etc/zypp/repos.d/*, zypper ref, zypper dup. No problems this time. I'm sure the upgrade after booting from a 15.0 opensuse dvd would not cause any problems either. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c99
--- Comment #99 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c100
--- Comment #100 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c101
--- Comment #101 from Franck Bui
any more hints? requests?
Great you seem to reproduce this quite easily. Can you retry to reproduce again but with the test repo enabled (see comment #79) so the "special" systemd package is used during the upgrade ? Thanks ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c102
--- Comment #102 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c103
--- Comment #103 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c104
--- Comment #104 from andreas bittner
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c105
--- Comment #105 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c106
--- Comment #106 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c113
--- Comment #113 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c115
--- Comment #115 from Carlos Robinson
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c116
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c117
Chris Bradbury
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
Nick Dordea
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762
http://bugzilla.opensuse.org/show_bug.cgi?id=1094762#c119
Franck Bui
participants (1)
-
bugzilla_noreply@novell.com