
On Thu, 2022-09-01 at 12:21 -0400, Chris Murphy wrote:
On Thu, Sep 1, 2022, at 4:40 AM, Richard Brown wrote:
Pinning versions only works if its only the versions being changed that caused the problem.
One of the reasons btrfs snapshots beats every other automatic healing mechanism is that they also capture unexpected behaviour, such as issues caused by rpm scripts.
Sure but I'm not suggesting the end to snapshots in the system root. I assume rpm scripts that could cause issues only target the system root, not $BOOT. And if scripts can act on $BOOT, I think it's limited to that of adding or removing vmlinuz, initrd, and bootloader config.
If that assumption is wrong, then I need to know more about how $BOOT is being modified, and I'm likely to be pretty skeptical of any sort of complexity. The whole idea of Boot Loader Spec is keeping things simple, almost to the point of silliness (which happens to be one of the more common arguments against BLS, funny enough).
Your assumption is wrong. Any rpm is installed as root. Any rpm can introduce new modules which result in modifications to /boot. Any rpm script can modify whathever it wants. No restriction we could impose in our own packaging would translate to 3rd parties. 3rd party kernel modules are, despite the wishes of the Kernel development community, used by a large proportion of Linux users. Therefore any rollback mechanism needs to suspect unexpected alterations to /boot, hence why our typical btrfs snapshotting approach integrates it as part of the whole system.
No, I really think the solution is simplicity via constraints, and holding distros accountable to the spec. If the spec is lacking in clarity about proper or improper behaviors, we need to get that added into the spec itself.
Constraints cannot be imposed on 3rd parties, but 3rd party kernel modules are a thing, as already mentioned. Are you really going to tell everyone they cant have NVIDIA drivers any more?
As a MicroOS fan, I might say that such long uptimes are stupid and best avoided, but I'm also well aware that such long uptimes are VERY common among our userbase, especially on the enterprise side of things.
So, yeah, we need months of snapshots as long as their are months of potential boot-breaking changes which have not been validated by a boot.
OK but would you expect to need to retain 3 kernels for these use case? 30? 300? I still think that question goes back to the loss of pooling with Btrfs, and how to properly estimate the size $BOOT needs to be.
But as Thorsten pointed out, just keeping excess kernels around is not enough. /boot is not some magical directory that exists in isolation from the rest of the operating system. It's entirely possible that a kernel once loaded may depend on other modules, libraries, and files elsewhere in the filesystem. Which is why we treat /boot as part of the whole operating system, and ensure that when someone rolls back they get the OS, including /boot, exactly as they used to have it. No one wants to go from a known-working-bootable system to a broken system and then onward to a half-rolled-back-hybrid of /root and /boot being different from how they had it before. People want their system restored back to the state they knew worked..which means rolling back /boot in sync with the rest of their system. -- Richard Brown Linux Distribution Engineer - Future Technology Team SUSE Software Solutions Germany GmbH, Frankenstraße 146, D-90461 Nuremberg, Germany (HRB 36809, AG Nürnberg) Managing Directors/Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman