[opensuse] Services not started after update/reboot
Hi List, Last night I updated our server (Mail, Homedirs etc.). It is running Leap 42.2. The ones I did so far went smooth: Zypper up, then reboot. Not this time: The system rebooted fine, but all client machines were dead because they didn't get the NFS directories: systemd had decided to not start nfs-server, although it is enabled :o So it's not that it tried to start and failed with an error. The first mention of it in the logs was when I manually started it. This morning I found out that the same had happened to postfix: enabled, but not started during reboot. Has anyone seen a similar issue lately, or has some suggestions how to find out why that happened? Pit -- Dr. Peter "Pit" Suetterlin http://www.astro.su.se/~pit Institute for Solar Physics Tel.: +34 922 405 590 (Spain) P.Suetterlin@royac.iac.es +46 8 5537 8559 (Sweden) Peter.Suetterlin@astro.su.se -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu, Jun 29, 2017 at 2:59 PM, Peter Suetterlin <P.Suetterlin@royac.iac.es> wrote:
Hi List,
Last night I updated our server (Mail, Homedirs etc.). It is running Leap 42.2. The ones I did so far went smooth: Zypper up, then reboot. Not this time:
The system rebooted fine, but all client machines were dead because they didn't get the NFS directories: systemd had decided to not start nfs-server, although it is enabled :o
So it's not that it tried to start and failed with an error. The first mention of it in the logs was when I manually started it.
One possible reason could be ordering dependency loop, in this case systemd picks up some service to skip to break loop. I expect it to be logged though. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov wrote: Thanks for the suggestion Adrei!
One possible reason could be ordering dependency loop, in this case systemd picks up some service to skip to break loop. I expect it to be logged though.
So I went through the systemd messages during boot. I didn't expect a dependency loop, as the system had booted fine before. Instead I noticed two things: - the pager journalctl uses by default is case-sensitive :o (my less is always set to not be case sensitive on searches...) So I overlooked some messages: - The services (NFS server and Postfix) had been stopped before they even would have been started. The issue was a problem with some directory: ---- Jun 28 22:59:18 royac6 systemd[1]: Started Purge old kernels. Jun 28 22:59:30 royac6 systemd[1]: Stopped NFS server and services. Jun 28 22:59:30 royac6 systemd[1]: Stopping NFSv4 ID-name mapping service... Jun 28 22:59:30 royac6 systemd[1]: Stopped NFS Mount Daemon. Jun 28 22:59:30 royac6 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. Jun 28 22:59:30 royac6 systemd[1]: Starting Activate md array even though degraded... Jun 28 22:59:30 royac6 systemd[1]: Stopped Postfix Mail Transport Agent. Jun 28 22:59:30 royac6 systemd[1]: Stopped NFSv4 ID-name mapping service. Jun 28 22:59:30 royac6 systemd[1]: Started Activate md array even though degraded. Jun 28 22:59:30 royac6 systemd[1]: Stopped target Local File Systems. Jun 28 22:59:30 royac6 systemd[1]: Unmounting /home... Jun 28 22:59:30 royac6 systemd[1]: Stopped (with error) /dev/md1. Jun 28 22:59:30 royac6 systemd[1]: Unmounted /home. ---- The home RAID had been fine though: ---- Jun 28 22:59:00 royac6 kernel: md: bind<sda1> Jun 28 22:59:00 royac6 kernel: md: bind<sdb1> Jun 28 22:59:00 royac6 kernel: md/raid1:md1: active with 2 out of 2 mirrors Jun 28 22:59:00 royac6 kernel: md1: detected capacity change from 0 to 1024061145088 ... Jun 28 22:59:00 royac6 systemd[1]: Mounted /home. Jun 28 22:59:00 royac6 kernel: EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: discard ------ So something in the 30s between the two snippets something must have marked the RAID as failed. The only thing I can see in the logs is a huge block where os-prober runs the full phalanx of /usr/lib/os-probes/mounted/* scripts on /dev/md1. Not sure what had triggered this. However, the boot now just continues, and 2s later md1 is obviously fine again: ---- Jun 28 22:59:32 royac6 kernel: EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: discard Jun 28 22:59:32 royac6 systemd[1]: Mounted /home. ---- The stopped services however are never (re)started. I'd call this a bug - just not sure where and why.... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu, Jun 29, 2017 at 4:29 PM, Peter Suetterlin <P.Suetterlin@royac.iac.es> wrote:
So something in the 30s between the two snippets something must have marked the RAID as failed.
Try booting without option "quiet" on kernel command line; you may also want to add printk.devkmsg=on (not sure in which exact kernel release it was added) to be sure messages are not lost. Then upload output of "journalctl -b" to http://susepaste.org/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov wrote:
On Thu, Jun 29, 2017 at 4:29 PM, Peter Suetterlin <P.Suetterlin@royac.iac.es> wrote:
So something in the 30s between the two snippets something must have marked the RAID as failed.
Try booting without option "quiet" on kernel command line; you may also want to add printk.devkmsg=on (not sure in which exact kernel release it was added) to be sure messages are not lost. Then upload output of "journalctl -b" to http://susepaste.org/
OK, I'll probably put those options in as default, but a reboot will have to wait at least until late tonight. And looking at the details I somehow suspect some race condition with the purge-kernel and os-prober activity, so the prediction rather is that the next boot will just work again, unless there's some kernel updates again. If I'll catch it breaking again I'll happily make use of your kind offer to look at the logs! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (2)
-
Andrei Borzenkov
-
Peter Suetterlin