On Thu, Sep 1, 2022, at 4:40 AM, Richard Brown wrote:
Pinning versions only works if its only the versions being changed that caused the problem.
One of the reasons btrfs snapshots beats every other automatic healing mechanism is that they also capture unexpected behaviour, such as issues caused by rpm scripts.
Sure but I'm not suggesting the end to snapshots in the system root. I assume rpm scripts that could cause issues only target the system root, not $BOOT. And if scripts can act on $BOOT, I think it's limited to that of adding or removing vmlinuz, initrd, and bootloader config. If that assumption is wrong, then I need to know more about how $BOOT is being modified, and I'm likely to be pretty skeptical of any sort of complexity. The whole idea of Boot Loader Spec is keeping things simple, almost to the point of silliness (which happens to be one of the more common arguments against BLS, funny enough).
And our rpm packaging scripts are just as likely to cause problems as actual version bumps of binaries..if not more so, as those scripts are often, by design, doing intelligent stuff based on what they detect on the installing host..we sometimes NEED packages to behave differently depending on what else they find on the system, but in doing so we NEED a way to capture those differences and roll them back if they prove to be invalid.
I agree but I'm not imagining what unique differences could happen on $BOOT specifically? If the constraints on $BOOT are sufficient, it seems pretty unlikely, i.e. only certain tools can create or delete files; no tool can modify files; and we can use mount options to enforce DAC and MAC.
We have that now with snapshots/transactional-updates, and nothing else. Any new augmentation/replacement to the status quo needs to at least keep parity with what we already have.
I tentatively agree but also would counter that it suggests more constraints on $BOOT are needed, rather than snapshotting complexity in order to be able to rollback prior complexity. Perhaps the strongest argument I can make in favor of file system snapshots is having a fallback if some distro breaks the boot of another distro. But if we were to establish Btrfs as the preferred volume format for $BOOT, we don't really get around this problem because any mistake in another distribution could just do something like a super aggressive garbage collection and delete a bunch of snapshots owned by another distro. No, I really think the solution is simplicity via constraints, and holding distros accountable to the spec. If the spec is lacking in clarity about proper or improper behaviors, we need to get that added into the spec itself.
I would strongly argue that would be a huge, detrimental, regression compared to the status quo
Were it to be dropped entirely, I agree. But the idea anyone needs weeks, let alone months, of system snapshots isn't very compelling. And also the farther back the rollback, the more security and bug fixes are rolled back too.
Someone who doesn't reboot their system for months, may need months of system snapshots.
Sure. The system root would be part of that, as I'm thinking you're still doing limited software updates and taking a snapshot before and after them - but reboot isn't required. However, in these cases, $BOOT is changing very little if at all. The only time a snapshot would capture some change is if there's an addition or removal of a kernel. But we don't need Btrfs snapshots to merely perform retention of three files. I could get on board some optional fallback boot implementation. Maybe it's a distro specific copy of everything that would go on $BOOT, but found in /var. Either the distro specific grub.cfg, or individual BLS snippets, could point to this alternative/fallback location. It's not ridiculous to want $BOOT to be repairable. At some point it could be a spec enhancement.
As a MicroOS fan, I might say that such long uptimes are stupid and best avoided, but I'm also well aware that such long uptimes are VERY common among our userbase, especially on the enterprise side of things.
So, yeah, we need months of snapshots as long as their are months of potential boot-breaking changes which have not been validated by a boot.
OK but would you expect to need to retain 3 kernels for these use case? 30? 300? I still think that question goes back to the loss of pooling with Btrfs, and how to properly estimate the size $BOOT needs to be. -- Chris Murphy