On 2017-10-23 13:05, Peter Suetterlin wrote:
Hi,
I'm running a server on Leap 42.2. Amongst other things, it is mail and NFS server for the home directories.
After a reboot today neither postfix nor the nfs server were running. A look at the boot log:
Oct 23 07:44:43 royac6 systemd[1]: Stopped Postfix Mail Transport Agent.
The actual reason will be before that.
Oct 23 07:44:43 royac6 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. Oct 23 07:44:43 royac6 systemd[1]: Starting Activate md array even though degraded...
Notice that the RAID array has a problem, one disk missing.
Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS server and services. Oct 23 07:44:43 royac6 systemd[1]: Stopping NFSv4 ID-name mapping service... Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS Mount Daemon. Oct 23 07:44:43 royac6 systemd[1]: Stopped NFSv4 ID-name mapping service. Oct 23 07:44:43 royac6 systemd[1]: Started Activate md array even though degraded. Oct 23 07:44:43 royac6 systemd[1]: Stopped target Local File Systems. Oct 23 07:44:43 royac6 systemd[1]: Unmounting /home... Oct 23 07:44:43 royac6 systemd[1]: Stopped (with error) /dev/md1. Oct 23 07:44:43 royac6 systemd[1]: Unmounted /home.
It is also stopping the raid array. Apparently /home is mounted in the raid array, not NFS. You have to clarify your setup.
So obviously it is because a 'problem' with the home directories (/home is served by nfs-server, and postfix uses Maildir in the home directories). But /home is mounted... So I looked at that:
Oct 23 07:44:09 royac6 kernel: sdb: sdb1 Oct 23 07:44:09 royac6 kernel: sda: sda1 Oct 23 07:44:12 royac6 kernel: md: bind<sda1> Oct 23 07:44:13 royac6 kernel: md: bind<sdb1> Oct 23 07:44:13 royac6 kernel: md/raid1:md1: active with 2 out of 2 mirrors Oct 23 07:44:13 royac6 kernel: created bitmap (8 pages) for device md1 Oct 23 07:44:13 royac6 kernel: md1: bitmap initialized from disk: read 1 pages, set 11 of 15260 bits Oct 23 07:44:13 royac6 kernel: md1: detected capacity change from 0 to 1024061145088 Oct 23 07:44:13 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b.
The missing disk for the RAID array now appears. Nowhere does it talk about NFS. So now it tries again to mount /home.
Oct 23 07:44:13 royac6 systemd[1]: Mounting /home... Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): 1 orphan inode deleted Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): recovery complete Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: discard Oct 23 07:44:13 royac6 systemd[1]: Mounted /home.
No further mentions of md1 or /home until the first message block above.
A second after the umount I see
Oct 23 07:44:44 royac6 systemd[1]: Stopped Timer to wait for more drives before activating degraded array.. Oct 23 07:44:44 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b. Oct 23 07:44:44 royac6 systemd[1]: Mounting /home...
What the **** is going on there? Sidenote: It is stopping the nfs and mail services before they even are started...
As I don't see any real error message it seems to be some race condition in the systemd dependencies(?), but I'm not sure what to do to locate it. It seems to be the wait timer, but why is it still running? The device is already active, and has all it's disks...
Any help appreciated!
Pit
-- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)