http://bugzilla.opensuse.org/show_bug.cgi?id=1099745 Bug ID: 1099745 Summary: [kubic][transactional server] systemd fails to boot, reports $subvolume already mounted Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Other Assignee: fbui@suse.com Reporter: rbrown@suse.com QA Contact: qa-bugs@suse.de CC: iforster@suse.com, kukuk@suse.com Found By: --- Blocker: --- Created attachment 775783 --> http://bugzilla.opensuse.org/attachment.cgi?id=775783&action=edit journal -b with debug log enabled This bug has already been debugged significantly by myself and Franck but for completeness the report as it stands right now # SYMPTOMS On Transactional Tumbleweed or Kubic systems there is a clear race condition where on some boots the mounting of the 1 or more various .mount units required by local-fs.target fails with errors like the following:
mount[661]: mount: /boot/grub2/x86_64-efi: /dev/sda2 already mounted on /.
The sub-volume is not consistant, and on occasion more than one subvolume will fail to be mounted with the same error. Once the user enters the Emergency shell, any attempt to mount the subvolume works perfectly fine - it is not "already mounted" by the time the sysadmin can login to the Emergency shell # STEPS TO REPRODUCE Install Tumbleweed with a transactional server role, all default settings create a cronjob to run "reboot -f" every minute wait until the system fails to boot This has been reproduced on an Intel i5 NUC somewhat reliably. It occurs on average approximately every dozen reboots. To rule out any kind of bus/hardware issues it has been successfully reproduced on the following devices holding the systems rootfs: SATA 7200 HDD SATA SSD M2 SSD USB SD Card (Multiple) USB Pen Drive Enabling systemd debug logging seems to slow things down enough that this bug is much harder to catch, but not impossible - debug logs are attached # USER IMPACT Due to the failure of the boot at this point, systemd drops to the emergency shell The system is therefore unusable, hence the critical severity of this bug. Transactional Tumbleweed & Kubic machines reboot regularly (as a result of every package change, update, etc), further increasing the severity of this bug - if the bug occurs on average every 12 boots, users can expect one major outage of each system at least every month. Neither distribution can be considered reliable in normal operation until this bug is mitigated or resolved. # AFFECTED SYSTEMS (Tested, able to reproduce) Tumbleweed (Transactional Server Role) Kubic (All Installations) # SUSPECTED AFFECTED SYSTEMS Future CaaS Platform versions and SLE 15 SPx (with Transactional updates) using systemd versions equal or later than currently in Tumbleweed # UNAFFECTED SYSTEMS (Tested, unable to reproduce) Leap 15 (Including Transactional Server Role) Tumbleweed (non transactional roles) # SUSPECTED UNAFFECTED SYSTEMS SLE 15 GA # PRELIMINARY HYPOTHESIS As this doesn't happen on all Tumbleweed systems this bug is likely triggered by the presence of /var (btrfs subvolume) and /etc (overlayfs related to var) in fstab-sys in initrd. This is needed on a transactional system so the contents of /etc's overlayfs can be read by the initrd. It is not present on non-transactional Tumbleweed system roles. However Leap 15 transactional systems have an identical initrd & transactional update configuration. As this doesn't happen on Leap 15 systems, this bug is clearly triggered by changes to systemd introduced in versions later than that in Leap 15 and SLE 15 My hypothesis is that one of the changes introduced recently seems to be attempting to mount subvolumes too early. I suspect systemd might be unmounting the devices uses by the initrd before remounting them as part of local-fs.target. If systemd is not waiting long enough for the unmount to complete before attempting to mount the first .mount target, that could match the observed behaviour. -- You are receiving this mail because: You are on the CC list for the bug.