[Bug 1231836] New: [MicroOS]: emergency mode after attempting to mount /.snapshots
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Bug ID: 1231836 Summary: [MicroOS]: emergency mode after attempting to mount /.snapshots Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel:Filesystems Assignee: kernel-fs@suse.de Reporter: egotthold@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 878122 --> https://bugzilla.suse.com/attachment.cgi?id=878122&action=edit Failing boot During the nightly reboot of one of my MicroOS systems, I am getting intermittent drops into the emergency shell. Rebooting the device helps in most cases, in other cases, a second reboot is needed. I am using /etc/fstab with the following content [1]. In the attachments, one of those failing boots can be observed. The community it seems has this issue also for a good while - https://forums.opensuse.org/t/random-boot-failures-in-microos/176605 [1]: /etc/fstab UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 / btrfs ro 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /usr/local btrfs subvol=/@/usr/local 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /srv btrfs subvol=/@/srv 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /root btrfs subvol=/@/root,x-initrd.mount 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /opt btrfs subvol=/@/opt 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /home btrfs subvol=/@/home 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /boot/writable btrfs subvol=/@/boot/writable 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc 0 0 UUID=2e79ec65-78e5-4ba4-b549-cc97117e0954 /.snapshots btrfs subvol=/@/.snapshots 0 0 UUID=83c66cbc-e40b-4b67-987f-3b41f613932c /var btrfs defaults,x-initrd.mount 0 0 UUID=EEE3-4BD4 /boot/efi vfat utf8 0 2 UUID=1c5adbee-7e1a-471e-9d04-0e583d29d8ec /var/data auto nofail,subvol=/,x-parent=8d06eb2d:af57615e:15d28725:65d23c58 0 0 overlay /etc overlay defaults,lowerdir=/sysroot/var/lib/overlay/168/etc:/sysroot/etc,upperdir=/sysroot/var/lib/overlay/169/etc,workdir=/sysroot/var/lib/overlay/169/work-etc,x-systemd.requires-mounts-for=/var,x-systemd.requires-mounts-for=/sysroot/var,x-initrd.mount 0 0 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Enno Gotthold <egotthold@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fvogt@suse.com, | |santiago.zarate@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c1 Santiago Zarate <santiago.zarate@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED CC| |iforster@suse.com Flags| |needinfo?(iforster@suse.com | |) --- Comment #1 from Santiago Zarate <santiago.zarate@suse.com> --- Try adding `debug systemd.log_devel=debug` to the system's cmd line - Ignaz might have some other thoughts -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c2 Wenruo Qu <wqu@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wqu@suse.com --- Comment #2 from Wenruo Qu <wqu@suse.com> --- I believe this bug is more about the init process, other than the btrfs itself. The dmesg shows nothing wrong from btrfs, and I'm very confident that btrfs will be super noisy for anything unexpected (even for repairable corruption, it will output something about it). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c3 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(iforster@suse.com | |) | --- Comment #3 from Fabian Vogt <fvogt@suse.com> --- (In reply to Wenruo Qu from comment #2)
I believe this bug is more about the init process, other than the btrfs itself.
The dmesg shows nothing wrong from btrfs, and I'm very confident that btrfs will be super noisy for anything unexpected (even for repairable corruption, it will output something about it).
FWICT it's the opposite, it shows that it's likely a kernel issue. mount: /.snapshots: /dev/nvme0n1p2 already mounted on /. That is printed by mount on -EBUSY, but btrfs explicitly handles mounting subvolumes of the same filesystem separately, so unless /.snapshots is already mounted this can't happen. I had a quick look at kernel code (git grep EBUSY fs/btrfs) and found this comment: /* * We got an EBUSY because our SB_RDONLY flag didn't match the existing * super block, so invert our setting here and retry the mount so we * can get our vfsmount. */ That triggered some unpleasant memories about the superblock ro flag flipping during boot (boo#1156421). In this case a stress test would confirm that and indeed it did! Here's a reproducer: #!/bin/sh set -eu # Create a btrfs image dd if=/dev/zero of=btrfs bs=1M count=128 mkfs.btrfs btrfs loop=$(losetup --show -f btrfs) mkdir mnt # With /.snapshot and /.snapshots/snapshot/1 subvolumes mount "$loop" mnt btrfs subvol create mnt/.snapshots btrfs subvol set-default mnt/.snapshots mkdir -p mnt/.snapshots/snapshot/1 btrfs subvol create mnt/.snapshots/snapshot/1/snapshot umount mnt mount "$loop" mnt mkdir mnt/.snapshots # First loop: make mnt ro/rw/ro/rw/ro/... while mount -o remount,ro mnt; do mount -o remount,rw mnt; done & bg=$! # Second loop: mount and umount mnt/.snapshots while mount "$loop" mnt/.snapshots -o subvol=.snapshots; do umount mnt/.snapshots; done # Cleanup kill $bg wait umount -R mnt losetup -d "$loop" rmdir mnt Here it fails within seconds: ... Create subvolume 'mnt/.snapshots' Create subvolume 'mnt/.snapshots/snapshot/1/snapshot' mount: /tmp/mnt/.snapshots: /dev/loop0 already mounted on /tmp/mnt. dmesg(1) may have more information after failed mount system call. Maybe SB_RDONLY flips while /.snapshots is being mounted? Just a guess: 1. / gets mounted ro in the initrd (SB readonly) 2. /.snapshots mount starts (SB readonly) 3. /usr/local mount starts (SB readonly) 4. /usr/local mount finishes (SB readwrite) 5. /.snapshots mount fails because SB is readwrite now? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Wenruo Qu <wqu@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-fs@suse.de |wqu@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c4 --- Comment #4 from Wenruo Qu <wqu@suse.com> --- OK, I can reproduce the problem now, and indeed btrfs will not output any error message in that case. I'll look into the problem. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c5 --- Comment #5 from Wenruo Qu <wqu@suse.com> --- The direct cause is, when the initial super RO flag mismatches, we got EBUSY and go btrfs_reconfigure_for_mount(), which flips our RO flag and retry. But since the background process is also re-mounting which can flips the RO flag to a different one, we got -EBUSY again because the newly flipped RO flag conflicts with the newly remounted flag. This involves a lot of VFS calls which can be a little complex, but I'll try if we can do a mutex or something like that to avoid remount/mount to race on the same btrfs. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c6 --- Comment #6 from Fabian Vogt <fvogt@suse.com> --- (In reply to Wenruo Qu from comment #5)
The direct cause is, when the initial super RO flag mismatches, we got EBUSY and go btrfs_reconfigure_for_mount(), which flips our RO flag and retry.
But since the background process is also re-mounting which can flips the RO flag to a different one, we got -EBUSY again because the newly flipped RO flag conflicts with the newly remounted flag.
Heh, so I got a lucky guess!
This involves a lot of VFS calls which can be a little complex, but I'll try if we can do a mutex or something like that to avoid remount/mount to race on the same btrfs.
I saw your patch on the ML, which went into a different direction with more brute force... Your approach of just retrying until no more -EBUSY probably works, but I wonder whether this retry loop is a good idea. In two cases it might retry indefinitely: a) -EBUSY might be returned for other reasons (if not possible right now, maybe in the future?) b) Something flips ro/rw all the time (rather artificial, but technically possible) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c7 --- Comment #7 from Wenruo Qu <wqu@suse.com> --- (In reply to Fabian Vogt from comment #6)
(In reply to Wenruo Qu from comment #5)
The direct cause is, when the initial super RO flag mismatches, we got EBUSY and go btrfs_reconfigure_for_mount(), which flips our RO flag and retry.
But since the background process is also re-mounting which can flips the RO flag to a different one, we got -EBUSY again because the newly flipped RO flag conflicts with the newly remounted flag.
Heh, so I got a lucky guess!
This involves a lot of VFS calls which can be a little complex, but I'll try if we can do a mutex or something like that to avoid remount/mount to race on the same btrfs.
I saw your patch on the ML, which went into a different direction with more brute force...
Your approach of just retrying until no more -EBUSY probably works, but I wonder whether this retry loop is a good idea. In two cases it might retry indefinitely:
a) -EBUSY might be returned for other reasons (if not possible right now, maybe in the future?)
So far I didn't see any VFS function in the get_tree() and fc_mount() chain to return EBUSY. But I see your point, maybe we can change the return value to a more special one so that VFS layer is much harder to return a conflicting error number.
b) Something flips ro/rw all the time (rather artificial, but technically possible)
In that case we will eventually win the race, and we only need to win it once. The real blockage behind this is, we do not hold any super block, thus can not utilize the s_umount rwsem to do any synchronization. Maybe we can introduce an internal rwsem to handle this, but it's pretty hard to co-operate with the VFS and very bug-prone thus I do not think it's any better. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c8 --- Comment #8 from Wenruo Qu <wqu@suse.com> --- (In reply to Wenruo Qu from comment #7)
(In reply to Fabian Vogt from comment #6)
(In reply to Wenruo Qu from comment #5)
The direct cause is, when the initial super RO flag mismatches, we got EBUSY and go btrfs_reconfigure_for_mount(), which flips our RO flag and retry.
But since the background process is also re-mounting which can flips the RO flag to a different one, we got -EBUSY again because the newly flipped RO flag conflicts with the newly remounted flag.
Heh, so I got a lucky guess!
This involves a lot of VFS calls which can be a little complex, but I'll try if we can do a mutex or something like that to avoid remount/mount to race on the same btrfs.
I saw your patch on the ML, which went into a different direction with more brute force...
Your approach of just retrying until no more -EBUSY probably works, but I wonder whether this retry loop is a good idea. In two cases it might retry indefinitely:
a) -EBUSY might be returned for other reasons (if not possible right now, maybe in the future?)
So far I didn't see any VFS function in the get_tree() and fc_mount() chain to return EBUSY.
My bad, get_vfs_tree() itself can return -EBUSY if fc->root is already populated. Thankfully unless something went wrong, fc->root should only be populated by the fs callbacks.
But I see your point, maybe we can change the return value to a more special one so that VFS layer is much harder to return a conflicting error number.
Although this means it's much more convincing to change the error number to a special one. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c9 --- Comment #9 from Wenruo Qu <wqu@suse.com> --- BTW, I also explored another solution, to allow mismatch in sb->s_flags and fc->sb_flags, then reconfigure. This methods avoid the brute force retry loop, and use proper rwsem to prevent race in theory. But it just doesn't work. If the fs is initially mounted RO, a new RW open on the same device will return -EINVAL, thus rejecting the mount way earlier before we can get a proper super block to continue. That's why we're risking the race by retry without a proper sb lock to hold. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c10 --- Comment #10 from Wenruo Qu <wqu@suse.com> --- (In reply to Wenruo Qu from comment #9)
BTW, I also explored another solution, to allow mismatch in sb->s_flags and fc->sb_flags, then reconfigure. This methods avoid the brute force retry loop, and use proper rwsem to prevent race in theory.
But it just doesn't work.
If the fs is initially mounted RO, a new RW open on the same device will return -EINVAL, thus rejecting the mount way earlier before we can get a proper super block to continue.
That's why we're risking the race by retry without a proper sb lock to hold.
What an idiot I am, I'm using a wrong script to test, thus mounting a wrong device, no wonder device scan failed, it's not related to the flag. I'll do more testing to make sure if we can go the new solution. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c11 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kukuk@suse.com --- Comment #11 from Fabian Vogt <fvogt@suse.com> --- *** Bug 1230010 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c12 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |IN_PROGRESS --- Comment #12 from Fabian Vogt <fvogt@suse.com> --- (In reply to Wenruo Qu from comment #10)
(In reply to Wenruo Qu from comment #9)
BTW, I also explored another solution, to allow mismatch in sb->s_flags and fc->sb_flags, then reconfigure. This methods avoid the brute force retry loop, and use proper rwsem to prevent race in theory.
But it just doesn't work.
If the fs is initially mounted RO, a new RW open on the same device will return -EINVAL, thus rejecting the mount way earlier before we can get a proper super block to continue.
That's why we're risking the race by retry without a proper sb lock to hold.
What an idiot I am, I'm using a wrong script to test, thus mounting a wrong device, no wonder device scan failed, it's not related to the flag.
I've done even more stupid stuff, don't worry!
I'll do more testing to make sure if we can go the new solution.
FWICT the latest patch on the btrfs ML upstream looks like it would be ok to merge. Can we get this in TW and SLM 6.x? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Santiago Zarate <santiago.zarate@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.suse.com/s | |how_bug.cgi?id=1233693 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c13 Miller <charlesmillerspam@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |charlesmillerspam@gmail.com --- Comment #13 from Miller <charlesmillerspam@gmail.com> --- *** Bug 1233693 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c14 --- Comment #14 from Miller <charlesmillerspam@gmail.com> --- I am biting my nails as I wait for this fix, as I have some deployments of MicroOS that exhibit this issue, which I would like to be able to rely on when they automatically reboot. It seems to be quiet lately, has anything been changed since the last comments here? Is there another place or mailing list that I should be looking at to know when it's resolved? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c15 David Sterba <dsterba@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dsterba@suse.com --- Comment #15 from David Sterba <dsterba@suse.com> --- (In reply to Miller from comment #14)
It seems to be quiet lately, has anything been changed since the last comments here? Is there another place or mailing list that I should be looking at to know when it's resolved?
I've sent the pull request with this patch to Linus so it'll be in 6.13-rc3, but that'll still take time before it appears in upstream stable and then Tumbleweed/MicroOS. Though, I can add it to our distro branch in advance. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c16 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zluo@suse.com --- Comment #16 from Fabian Vogt <fvogt@suse.com> --- *** Bug 1232815 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Felix Niederwanger <felix.niederwanger@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |felix.niederwanger@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 https://bugzilla.suse.com/show_bug.cgi?id=1231836#c17 --- Comment #17 from Fabian Vogt <fvogt@suse.com> --- (In reply to David Sterba from comment #15)
(In reply to Miller from comment #14)
It seems to be quiet lately, has anything been changed since the last comments here? Is there another place or mailing list that I should be looking at to know when it's resolved?
I've sent the pull request with this patch to Linus so it'll be in 6.13-rc3, but that'll still take time before it appears in upstream stable and then Tumbleweed/MicroOS. Though, I can add it to our distro branch in advance.
Yes, please! I actually assumed we had this TW already, given the severity. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231836 Pavel Dostál <pdostal@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pdostal@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com