Re: [opensuse] systemd stops postfix/nfsserver for false(?) reasons

24 Oct 2017


      Carlos E. R. wrote:
...
On 2017-10-23 19:16, Peter Suetterlin wrote:
...
...
If you look at the timestamps, this is 30 seconds *before* it stops/unmounts
/home and claims the disk is missing.
Sorry for posting them out-of-sync.
Ah!!
Uff...
Please, don't ever do that. Or if you do, please say the logs are not in
order, clearly.
Mea maxima culpa.  I posted it the way I was proceeding to find the issue :(
...
Please, can you repost a longer part of the log, all in correct order?
Even better, the full (minutes) log, from boot till minutes later when
you login?
http://www.royac.iac.es/~pit/Stuff/boot.log.xz
...
I can try have a go at it after reordering.
Oct 23 07:44:09 royac6 kernel:  sdb: sdb1
Oct 23 07:44:09 royac6 kernel:  sda: sda1
Oct 23 07:44:12 royac6 kernel: md: bind<sda1>
Oct 23 07:44:13 royac6 kernel: md: bind<sdb1>
Oct 23 07:44:13 royac6 kernel: md/raid1:md1: active with 2 out of 2 mirrors
Oct 23 07:44:13 royac6 kernel: created bitmap (8 pages) for device md1
Oct 23 07:44:13 royac6 kernel: md1: bitmap initialized from disk: read 1
pages, set 11 of 15260 bits
Oct 23 07:44:13 royac6 kernel: md1: detected capacity change from 0 to
1024061145088
The raid is assembled. The "capacity change" I don't understand, maybe
the disks are external?
No, this is normal and happens with all raid assemblies
...
Oct 23 07:44:13 royac6 systemd[1]: Found device
/dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b.
Which disk is this one?
That's the one in question, to be mounted on /home, which is done in the next
step
...
Oct 23 07:44:13 royac6 systemd[1]: Mounting /home...
Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): 1 orphan inode deleted
Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): recovery complete
Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): mounted filesystem with
ordered data mode. Opts: discard
Oct 23 07:44:13 royac6 systemd[1]: Mounted /home.
Apparently it detects an error on the home filesystem and does recovery
on it, before actually mounting it.
Yes, the only unusual thing I could find.  But the mount succeeds without
further delay.
...
Shouldn't there be more log entries here? There is a hole, half a minute.
There are, but not related.  The 30s is (I suppose) the timeout of the
wait-for-more-disks.
...
Oct 23 07:44:43 royac6 systemd[1]: Stopped Postfix Mail Transport Agent.
Oct 23 07:44:43 royac6 systemd[1]: Created slice
system-mdadm\x2dlast\x2dresort.slice.
Oct 23 07:44:43 royac6 systemd[1]: Starting Activate md array even
though degraded...
Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS server and services.
Oct 23 07:44:43 royac6 systemd[1]: Stopping NFSv4 ID-name mapping service...
Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS Mount Daemon.
Oct 23 07:44:43 royac6 systemd[1]: Stopped NFSv4 ID-name mapping service.
Oct 23 07:44:43 royac6 systemd[1]: Started Activate md array even though
degraded.
Oct 23 07:44:43 royac6 systemd[1]: Stopped target Local File Systems.
Oct 23 07:44:43 royac6 systemd[1]: Unmounting /home...
Oct 23 07:44:43 royac6 systemd[1]: Stopped (with error) /dev/md1.
Oct 23 07:44:43 royac6 systemd[1]: Unmounted /home.
Yes, here it umounts /home because the array is degraded, but the reason
of the degradation is missing.
Exactly.  There is absolutely nothing in the logs that suggests an issue with
the RAID.  Probably I have to put systemd in a more verbose mode?
...
Oct 23 07:44:44 royac6 systemd[1]: Stopped Timer to wait for more drives
before activating degraded array..
Oct 23 07:44:44 royac6 systemd[1]: Found device
/dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b.
Same device as before. What is it?
Not sure - I think it restarts the detection after the timeout and finds it (again).
...
And then mounts home again, possibly degraded.
Oct 23 07:44:44 royac6 systemd[1]: Mounting /home...
Yes, it mounts it again.  In perfectly sane state.  And I'm quite sure (from
the logs) that it was clean before, too.
...
...
So to sum up again:
Kernel detects both disks/partitions, md properly fires up the RAID clean,
mounts it and recovers from an orphaned inode.
Then suddenly systemd decides that there is a disk missing and unmounts again,
just to 'find' the RAID again directly after that.
We need to see the complete boot log, without "quiet", and also the list
of disks.
Without quiet will take a while, I'll have to schedule a reboot without
affecting the observatory operations...
...
...
lsblk --bytes --output NAME,KNAME,RA,RM,RO,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT,UUID,PARTUUID,WWN,MODEL,ALIGNMENT /dev/sd?
you can post those to susepaste.org for a limited time, and post here
the link.
Wanted to do that with the boot log, but that's a bit too long (5k lines).
Or is there an easy way to get that into the form?

The lsblk output is here:  http://paste.opensuse.org/94717111


-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org