On Tue, 21 Mar 2017, Joachim Werner wrote:
Am 21.03.2017 um 17:33 schrieb Michael Matz
: Hi,
On Tue, 21 Mar 2017, Thorsten Kukuk wrote:
As I wrote: switch your btrfs root subvolume to read-only and transactional-updates are 100% safe.
Yes. And as we were saying, if you don't do that you don't have anything at all.
And since you can apply them at any time, you even don't need to spend the time waiting that your mission critical server is alive again ;)
You mean with read-only / ? How could updates become active without reboot? You can't change the files already opened by running processes. So eventually you _have_ to wait for reboot/app-restart (let's ignore live patching for this thread :) ), but that's of course the same with all update approaches (in other words I don't see how transactional-updates specifically change anything in this regard).
I'm sure I'm missing something, but for a system where data, applications, and configuration are separated reasonably well into their own btrfs subvolumes (which I think is the case for a default SUSE Linux setup), what kind of writes exactly could happen, in the same subvolume as the one the transactional snapshot is done to, that could really lead to data loss?
Obviously none if the subvolume in question is mounted read-only. But even if it isn't, what kinds of (intended) writes would happen to /usr/* during a transactional update run?
I'm always surprised how well traditional RPM updates work in the running (server) system, although we are basically relying on old software that hasn't been updated yet to peacefully co-exist with new software that has been updated already.
But there are real problems, and from time to time we hit them. For example, we've run into cases where updating Salt with Salt fails because the running Salt process may lazy-load updated code that doesn't match the running code's APIs any more.
And of course it doesn't work smoothly at all if you try to update a Gnome or KDE from within a running desktop session.
Compared to the alternative approaches (Windows, Mac, iOS, Gnome, Android), I see the "Kukuk-approach" as the best choice if we can get those (IMHO rather hypothetical) snapshot-related data losses under control:
The downtime is really reduced to just the reboot, while the other approaches leave the system in a non-productive state during at least one reboot plus the time all updates take to be applied, which can take many minutes even if the updates have been completely downloaded before.
But as you are safely doing it during night anyway that little extra
time doesn't matter.
I suppose that the transactional update should work this way:
1) download updates
2) force sub-volumes we are going to snapshot r/o
3) snapshot, apply updates
... time passes (hopefully running system is happy with r/o state)
4) you reboot, updated snapshot gets activated
then we're safe. But if you leave out 2) there's the possibility
of breakage (or you need a verification step before 4) that the
subvolumes you snapshotted didn't change so you can rollback in
that case and try again).
You can do 2) and 3) also during/after 4) easily.
Your theory above suggests that 2) is not going to be an issue
for the running system or your productivity.
Richard.
--
Richard Biener