[opensuse] os-prober script corrupts btrfs
Hi - I ran into an issue with the script /usr/lib/linux-boot-probes/50mounted-tests (os-prober package) mounting an already mounted btrfs volume and effectively corrupting that volume. This occurred automatically during a "zypper update" on a Leap 15.2 system. /var/log/message outputs indicate the volume was actually mounted. At that time the volume was already in used by a XEN VM. The volume is part of a LVM Volume Group and lvdisplay indicated the volume was already in use. 1. Why is this script needed when dracut/grub is updating the initrd files and why does it touch (rw mount) volumes unrelated to the boot process? 2. Why did the btrfs partition not rejecting any further mount attempts (as it was already opened elsewhere)? 3. In order to prevent this from happening in the future is it safe to remove "os-prober" or can the script be modified to leave certain volumes untouched? Thanks Holger --
On 19/11/2020 17.19, Holger Jakob wrote:
Hi -
I ran into an issue with the script /usr/lib/linux-boot-probes/50mounted-tests (os-prober package) mounting an already mounted btrfs volume and effectively corrupting that volume. This occurred automatically during a "zypper update" on a Leap 15.2 system. /var/log/message outputs indicate the volume was actually mounted. At that time the volume was already in used by a XEN VM. The volume is part of a LVM Volume Group and lvdisplay indicated the volume was already in use.
1. Why is this script needed when dracut/grub is updating the initrd files and why does it touch (rw mount) volumes unrelated to the boot process?
2. Why did the btrfs partition not rejecting any further mount attempts (as it was already opened elsewhere)?
3. In order to prevent this from happening in the future is it safe to remove "os-prober" or can the script be modified to leave certain volumes untouched?
You can certainly disable os-prober. Edit "/etc/default/grub": GRUB_DISABLE_OS_PROBER="true" Mounted something twice should not corrupt it. os-prober mounts everything to find out if that partition is bootable, and then add it to the boot menu. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
On 19/11/2020 17.19, Holger Jakob wrote: Hi -
I ran into an issue with the script /usr/lib/linux-boot-probes/50mounted-tests (os-prober package) mounting an already mounted btrfs volume and effectively corrupting that volume. This occurred automatically during a "zypper update" on a Leap 15.2 system. /var/log/message outputs indicate the volume was actually mounted. At that time the volume was already in used by a XEN VM. The volume is part of a LVM Volume Group and lvdisplay indicated the volume was already in use.
1. Why is this script needed when dracut/grub is updating the initrd files and why does it touch (rw mount) volumes unrelated to the boot process?
2. Why did the btrfs partition not rejecting any further mount attempts (as it was already opened elsewhere)?
3. In order to prevent this from happening in the future is it safe to remove "os-prober" or can the script be modified to leave certain volumes untouched?
You can certainly disable os-prober. Edit "/etc/default/grub":
GRUB_DISABLE_OS_PROBER="true"
Thanks for this advice. I have made the change and hope it will just work. If this is really just for detection of foreign OSs I am not worried about it at all.
Mounted something twice should not corrupt it. os-prober mounts everything to find out if that partition is bootable, and then add it to the boot menu.
I think this only applies for mounts with the same kernel. VMs with independent kernels could really mess up things. Since I am still in the middle of "disaster recovery" I cannot confirm my theory but logs seems to indicate the host was actually writing to the btrfs volume during the os-probe run. And, there is an old CentOS ticket describing a very similar situation: https://bugs.centos.org/view.php?id=10918 I think I will reach out to the os-prober maintainers to see what they can do. The latest version 1.77 does not list any btrfs related changed relative to 1.76 used in Leap 15.2. Holger --
On 20/11/2020 04.13, Holger Jakob wrote:
On 19/11/2020 17.19, Holger Jakob wrote: Hi -
I ran into an issue with the script /usr/lib/linux-boot-probes/50mounted-tests (os-prober package) mounting an already mounted btrfs volume and effectively corrupting that volume. This occurred automatically during a "zypper update" on a Leap 15.2 system. /var/log/message outputs indicate the volume was actually mounted. At that time the volume was already in used by a XEN VM. The volume is part of a LVM Volume Group and lvdisplay indicated the volume was already in use.
1. Why is this script needed when dracut/grub is updating the initrd files and why does it touch (rw mount) volumes unrelated to the boot process?
2. Why did the btrfs partition not rejecting any further mount attempts (as it was already opened elsewhere)?
3. In order to prevent this from happening in the future is it safe to remove "os-prober" or can the script be modified to leave certain volumes untouched?
You can certainly disable os-prober. Edit "/etc/default/grub":
GRUB_DISABLE_OS_PROBER="true"
Thanks for this advice. I have made the change and hope it will just work. If this is really just for detection of foreign OSs I am not worried about it at all.
Mounted something twice should not corrupt it. os-prober mounts everything to find out if that partition is bootable, and then add it to the boot menu.
I think this only applies for mounts with the same kernel. VMs with independent kernels could really mess up things.
I had not realized it was different kernels until Andrei described it.
Since I am still in the middle of "disaster recovery" I cannot confirm my theory but logs seems to indicate the host was actually writing to the btrfs volume during the os-probe run.
And, there is an old CentOS ticket describing a very similar situation: https://bugs.centos.org/view.php?id=10918
I think I will reach out to the os-prober maintainers to see what they can do. The latest version 1.77 does not list any btrfs related changed relative to 1.76 used in Leap 15.2.
As Andrei said, you should create an openSUSE bug report. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
19.11.2020 19:19, Holger Jakob пишет:
Hi -
I ran into an issue with the script /usr/lib/linux-boot-probes/50mounted-tests (os-prober package) mounting an already mounted btrfs volume and effectively corrupting that volume.
What exactly "corrupting" means? Do you have any evidence? You do not think you may be misinterpreting what you saw? Show log entries, or commands with output that you used to make conclusion that anything was corrupted and explain, what exactly in this output indicates corruption.
This occurred automatically during a "zypper update" on a Leap 15.2 system. /var/log/message outputs indicate the volume was actually mounted. At that time the volume was already in used by a XEN VM. The volume is part of a LVM Volume Group and lvdisplay indicated the volume was already in use.
So it means btrfs was mounted in guest while os-prober mounted it on host. Is it correct?
1. Why is this script needed when dracut/grub is updating the initrd
grub has absolutely nothing to do with initrd and dracut has absolutely nothing to do with os-prober. os-prober is called by grub when it updates grub.cfg.
files and why does it touch (rw mount) volumes unrelated to the boot process?
because os-prober searches for another instances of installed operating systems and for this it needs to mount filesystem to be able to look what is inside. As for rw mount - I am afraid, nobody can say for sure today. It was in original patch ported from Fedora in year 2013: # note that the btrfs volume must not be mounted ro ... and looking in Fedora GIT history it was this way in the very first version.
2. Why did the btrfs partition not rejecting any further mount attempts (as it was already opened elsewhere)?
Linux does not have any mandatory locking. And host does not even know there *is* btrfs filesystem on device. I happily corrupted my VM by launching it twice.
3. In order to prevent this from happening in the future is it safe to remove "os-prober" or can the script be modified to leave certain volumes untouched?
You can disable os-prober, e.g. using YaST bootloader module (Probe for foreign OS). You can also try to add filesystem UUID to GRUB_OS_PROBER_SKIP_LIST variable in /etc/default/grub; this should skip running linux-boot-prober if this is indeed the problem. But os-prober already mounts filesystem before linux-boot-prober is called, so it may not help. If you do not need it, you can remove and lock os-prober. What about opening bug report? This sounds like real problem, even if solution is not obvious.
participants (3)
-
Andrei Borzenkov
-
Carlos E. R.
-
Holger Jakob