[Bug 1064887] New: raid1 falsely detected as degraded during boot, dependent services don't get started
http://bugzilla.opensuse.org/show_bug.cgi?id=1064887 Bug ID: 1064887 Summary: raid1 falsely detected as degraded during boot, dependent services don't get started Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: P.Suetterlin@royac.iac.es QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 745673 --> http://bugzilla.opensuse.org/attachment.cgi?id=745673&action=edit Complete boot log (journalctl -b) after logging in Our home directory and mail server runs Leap42.2, with both the system and the homedirectories on a mirrored RAID each. Already for the second time after booting the system, neither postfix nor the nfs-server were running. They had not been started because they depend on the /home directory (nfs obvious, postfix delivers to Maildir in the home directories). Inspecting the boot log (full version attached) the kernel properly detects both disks of the RAID (/dev/sd[ab]1), the RAID is started with 2 out of 2 disks, and /home gets mounted once the device file is found: Oct 23 07:44:09 royac6 kernel: sdb: sdb1 Oct 23 07:44:09 royac6 kernel: sda: sda1 Oct 23 07:44:12 royac6 kernel: md: bind<sda1> Oct 23 07:44:13 royac6 kernel: md: bind<sdb1> Oct 23 07:44:13 royac6 kernel: md/raid1:md1: active with 2 out of 2 mirrors Oct 23 07:44:13 royac6 kernel: created bitmap (8 pages) for device md1 Oct 23 07:44:13 royac6 kernel: md1: detected capacity change from 0 to 1024061145088 Oct 23 07:44:13 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b. Oct 23 07:44:13 royac6 systemd[1]: Started Timer to wait for more drives before activating degraded array.. Oct 23 07:44:13 royac6 systemd[1]: Mounting /home... Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): 1 orphan inode deleted Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): recovery complete Oct 23 07:44:13 royac6 kernel: EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: discard Oct 23 07:44:13 royac6 systemd[1]: Mounted /home. But after 30s systemd decides that there are missing disks, unmounts /home (and stops postfix/nfs-server before they even got started), only to immediately find things OK again, and mount /home a second time. Oct 23 07:44:43 royac6 systemd[1]: Stopped Postfix Mail Transport Agent. Oct 23 07:44:43 royac6 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. Oct 23 07:44:43 royac6 systemd[1]: Starting Activate md array even though degraded... Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS server and services. Oct 23 07:44:43 royac6 systemd[1]: Stopping NFSv4 ID-name mapping service... Oct 23 07:44:43 royac6 systemd[1]: Stopped NFS Mount Daemon. Oct 23 07:44:43 royac6 systemd[1]: Stopped NFSv4 ID-name mapping service. Oct 23 07:44:43 royac6 systemd[1]: Started Activate md array even though degraded. Oct 23 07:44:43 royac6 systemd[1]: Stopped target Local File Systems. Oct 23 07:44:43 royac6 systemd[1]: Unmounting /home... Oct 23 07:44:43 royac6 systemd[1]: Stopped (with error) /dev/md1. Oct 23 07:44:43 royac6 systemd[1]: Unmounted /home. Oct 23 07:44:44 royac6 systemd[1]: Stopped Timer to wait for more drives before activating degraded array.. Oct 23 07:44:44 royac6 systemd[1]: Found device /dev/disk/by-uuid/133b616a-1100-4278-86a7-9eb677783e9b. Oct 23 07:44:44 royac6 systemd[1]: Mounting /home... However, the dependent services (postfix, nfs-server) do not get started after this. There is nothing in the logs that would suggest a problem with the RAID or any of it's disks. /home is mounted, the RAID is active and clean. Here's the fstab entry for /home: UUID=133b616a-1100-4278-86a7-9eb677783e9b /home ext4 defaults,discard 0 0 (Note that the later start of postfix at 08:07:15 was triggered manually) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1064887 http://bugzilla.opensuse.org/show_bug.cgi?id=1064887#c1 --- Comment #1 from Peter Sütterlin <P.Suetterlin@royac.iac.es> --- Andrei Borzenkov posted some additional analysis on the mailing list: https://lists.opensuse.org/opensuse/2017-10/msg00407.html -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1064887 http://bugzilla.opensuse.org/show_bug.cgi?id=1064887#c2 --- Comment #2 from Peter Sütterlin <P.Suetterlin@royac.iac.es> --- In the meantime I have upgraded the system to Leap 42.3 The issue does not show up anymore, the RAID is mounted properly and all services are started. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1064887 http://bugzilla.opensuse.org/show_bug.cgi?id=1064887#c3 Peter Sütterlin <P.Suetterlin@royac.iac.es> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #3 from Peter Sütterlin <P.Suetterlin@royac.iac.es> --- In the meantime I have upgraded the system to Leap 42.3 The issue does not show up anymore, the RAID is mounted properly and all services are started. Changed to worksforme (or wontfix? 42.2 has reached EOL...) -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com