Carlos E. R. wrote:
On 2017-10-23 19:16, Peter Suetterlin wrote:
If you look at the timestamps, this is 30 seconds *before* it stops/unmounts /home and claims the disk is missing. Sorry for posting them out-of-sync.
Ah!!
Uff...
Please, don't ever do that. Or if you do, please say the logs are not in order, clearly.
Mea maxima culpa. I posted it the way I was proceeding to find the issue :(
Please, can you repost a longer part of the log, all in correct order? Even better, the full (minutes) log, from boot till minutes later when you login?
http://www.royac.iac.es/~pit/Stuff/boot.log.xz
I can try have a go at it after reordering.
Oct 23 07:44:09 royac6 kernel: sdb: sdb1 Oct 23 07:44:09 royac6 kernel: sda: sda1 Oct 23 07:44:12 royac6 kernel: md: bind<sda1> Oct 23 07:44:13 royac6 kernel: md: bind<sdb1> Oct 23 07:44:13 royac6 kernel: md/raid1:md1: active with 2 out of 2 mirrors Oct 23 07:44:13 royac6 kernel: created bitmap (8 pages) for device md1 Oct 23 07:44:13 royac6 kernel: md1: bitmap initialized from disk: read 1 pages, set 11 of 15260 bits Oct 23 07:44:13 royac6 kernel: md1: detected capacity change from 0 to 1024061145088
The raid is assembled. The "capacity change" I don't understand, maybe the disks are external?
No, this is normal and happens with all raid assemblies
Oct 23 07:44:13 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b.
Which disk is this one?
That's the one in question, to be mounted on /home, which is done in the next step
Oct 23 07:44:13 royac6 systemd[1]: Mounting /home... Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): 1 orphan inode deleted Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): recovery complete Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: discard Oct 23 07:44:13 royac6 systemd[1]: Mounted /home.
Apparently it detects an error on the home filesystem and does recovery on it, before actually mounting it.
Yes, the only unusual thing I could find. But the mount succeeds without further delay.
Shouldn't there be more log entries here? There is a hole, half a minute.
There are, but not related. The 30s is (I suppose) the timeout of the wait-for-more-disks.
Oct 23 07:44:43 royac6 systemd[1]: Stopped Postfix Mail Transport Agent. Oct 23 07:44:43 royac6 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. Oct 23 07:44:43 royac6 systemd[1]: Starting Activate md array even though degraded... Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS server and services. Oct 23 07:44:43 royac6 systemd[1]: Stopping NFSv4 ID-name mapping service... Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS Mount Daemon. Oct 23 07:44:43 royac6 systemd[1]: Stopped NFSv4 ID-name mapping service. Oct 23 07:44:43 royac6 systemd[1]: Started Activate md array even though degraded. Oct 23 07:44:43 royac6 systemd[1]: Stopped target Local File Systems. Oct 23 07:44:43 royac6 systemd[1]: Unmounting /home... Oct 23 07:44:43 royac6 systemd[1]: Stopped (with error) /dev/md1. Oct 23 07:44:43 royac6 systemd[1]: Unmounted /home.
Yes, here it umounts /home because the array is degraded, but the reason of the degradation is missing.
Exactly. There is absolutely nothing in the logs that suggests an issue with the RAID. Probably I have to put systemd in a more verbose mode?
Oct 23 07:44:44 royac6 systemd[1]: Stopped Timer to wait for more drives before activating degraded array.. Oct 23 07:44:44 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b.
Same device as before. What is it?
Not sure - I think it restarts the detection after the timeout and finds it (again).
And then mounts home again, possibly degraded.
Oct 23 07:44:44 royac6 systemd[1]: Mounting /home...
Yes, it mounts it again. In perfectly sane state. And I'm quite sure (from the logs) that it was clean before, too.
So to sum up again: Kernel detects both disks/partitions, md properly fires up the RAID clean, mounts it and recovers from an orphaned inode. Then suddenly systemd decides that there is a disk missing and unmounts again, just to 'find' the RAID again directly after that.
We need to see the complete boot log, without "quiet", and also the list of disks.
Without quiet will take a while, I'll have to schedule a reboot without affecting the observatory operations...
lsblk --bytes --output NAME,KNAME,RA,RM,RO,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT,UUID,PARTUUID,WWN,MODEL,ALIGNMENT /dev/sd?
you can post those to susepaste.org for a limited time, and post here the link.
Wanted to do that with the boot log, but that's a bit too long (5k lines). Or is there an easy way to get that into the form? The lsblk output is here: http://paste.opensuse.org/94717111 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org