What | Removed | Added |
---|---|---|
CC | iforster@suse.com |
(In reply to Michael Chang from comment #3) > (In reply to Jiri Srain from comment #2) > > It does, even though I don't really understand the rationale behind two > > locations of the data. > > > > Anyway, I would like to hear Michael's opinion > > I don't know what is "health-care" [...] It's this one: https://github.com/kubic-project/health-checker / https://software.opensuse.org/package/health-checker It is an application used to check whether a system boots successfully after an update and trigger automatic rollbacks to known good states in case of failures. There are several stages during boot which may fail, requiring GRUB environment variables to store a flag file: 1. The kernel itself or basic initrd components are failing To detect that case health-checker is setting a GRUB variable (`save_env -f "${env_block}" health_checker_flag`). That variable will be cleared later if the boot was successful, but if that variable is set when GRUB is starting something seems to have gone wrong during a previous run and GRUB will automatically boot the previous snapshot. 2. initrd ends up in an emergency shell If the initrd should end up in an emergency shell (e.g. because of missing or broken drivers or modules or by reaching a timeout) then the default BTRFS snapshot will be reset to the previous working one and a reboot is triggered. The only GRUB related thing in this stage is that the GRUB environment block variable is cleared, so GRUB doesn't select the previous snapshot any more (`grub2-editenv - set health_checker_flag=0`). 3. Failure in the actual system Mostly the same as 2 from a GRUB perspective: If any of the health-checker checker scripts fails, then the default BTRFS snapshot will be set to the previous one, the flag is reset and the system is rebooted. If all stages were successful the GRUB environment block variable will also be cleared. health-checker is meant to be used on systems with a read-only root file system such as openSUSE Kubic or SUSE CaaSP > Anyway below is the comments from what I can tell. > > > This is "just" a cosmetic problem when using health-checker on a read-only root > > file system. The value will still be stored into the btrfs header as long as > > 'env_block' is set. > > This looks not correct to me, it is true for env_block= instructs for > chaining to other environment block, but only some specific keys like > 'next_entry' would be pointed to use it. Other ordinary keys, that is only > requiring read by grub, would still use /boot/grub2/grubenv. In this case userspace and GRUB have to communicate somehow: The variable set by GRUB has to be modified from userspace, as only the system knows whether the boot was successful or not and can set the flag accordingly. > In this case, setting health_checker_flag=0 should require writing to > /boot/grub2/grubenv and the error is sensible if ongoing task is on > read-only file system. I really don't know anything else to expect from > btrfs, since it is no difference to other file systems in this regard. Indeed, that was something I hadn't realized until recently: While GRUB itself is *always* able to write the GRUB environment block (on supported file systems at least), it may not be possible to do so from userspace - in this case because the file system is mounted or flagged read-only. Due to this one workaround I was thinking of is to use a completely different file to store the GRUB environment block, e.g. some file below /var, as /var always has to be writeable. From what I can see the problem would be to find the correct partition or subvolume containing that file from within GRUB. I'll have to experiment with this a bit. Regarding Jiri's question about the split into two sections: What I also don't understand is why /boot/grub2/grubenv is used on Btrfs at all. Couldn't all data be stored into the Btrfs header directly? The contents of /boot/grub2/grubenv are inconsistent on Btrfs systems anyway, as GRUB won't update that file. (On https://www.spinics.net/lists/linux-btrfs/msg82209.html the idea to "implement its own damn file system and we give it its own partition" was brought up - this would obviously also solve the problems ;-))