[opensuse-factory] systemd, btrfs, /var/lib/machines
Starting with systemd 219, it automatically creates subvolumes for nspawn containers at /var/lib/machines. And there's also an commit sometime in March/April that brings snapshot and rollback control of containers into machinectl which leverages btrfs snapshotting. Has anyone using Factory looked at how snapper behaves when doing rollbacks? I'm pretty sure the following will happen: The snapshot of the top level of the file system done by snapper will stop at /var/lib/machines/ and not include any of the subvolumes in it. That's not the problem though. The problem happens if you do a rollback, and of course now the /var/lib/machines directory will be empty. All of your containers are in a different subvolume. I don't think it's snappers job to do a recursive snapshot in order to make sure these containers are present in every snapshot. This explodes the number of subvolumes. It duplicates snapshotting (both machinectl and snapper). I think this is "yet another example" of why nested subvolumes usually aren't a good idea. There probably should be a systemd-machines subvolume at the top level of the file system, which is added to fstab to mount it at /var/lib/machines. And then snapper needs a way to know to exclude systemd-machines from its snapshotting management. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello Chris and all, thanks for raising this point. On 2015-07-15 T 22:05 -0600 Chris Murphy wrote:
I think this is "yet another example" of why nested subvolumes usually aren't a good idea.
Well, I am not sure, if in this case the "nesting" is not even beneficial.
There probably should be a systemd-machines subvolume at the top level of the file system, which is added to fstab to mount it at /var/lib/machines.
Yes, indeed, that is a good proposal. Mind, if you open a FATE at: https://features.opensuse.org/ that the Installer/Partitioner creates a subvolume for /var/lib/machines? Please include user "mge1512", and I will help driving this. Thanks in advance!
And then snapper needs a way to know to exclude systemd-machines from its snapshotting management.
It does anyways, as by default snapper only is configured to consider the "/" subvolume. And here the positive side of "nesting" comes into the game, ...:-) So long - MgE -- Matthias G. Eckermann - Senior Product Manager SUSE® Linux Enterprise SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 16 July 2015 at 06:05, Chris Murphy
Starting with systemd 219, it automatically creates subvolumes for nspawn containers at /var/lib/machines. And there's also an commit sometime in March/April that brings snapshot and rollback control of containers into machinectl which leverages btrfs snapshotting.
Has anyone using Factory looked at how snapper behaves when doing rollbacks? I'm pretty sure the following will happen:
The snapshot of the top level of the file system done by snapper will stop at /var/lib/machines/ and not include any of the subvolumes in it. That's not the problem though. The problem happens if you do a rollback, and of course now the /var/lib/machines directory will be empty. All of your containers are in a different subvolume.
I don't think it's snappers job to do a recursive snapshot in order to make sure these containers are present in every snapshot. This explodes the number of subvolumes. It duplicates snapshotting (both machinectl and snapper).
I think this is "yet another example" of why nested subvolumes usually aren't a good idea. There probably should be a systemd-machines subvolume at the top level of the file system, which is added to fstab to mount it at /var/lib/machines. And then snapper needs a way to know to exclude systemd-machines from its snapshotting management.
Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper? Everything you said sounded reasonable right up until you said " I'm pretty sure the following will happen:" I think we should look into fixing it if is IS broken..but I think you're mistaken Snapper only backs up and restores the *root* subvolume, / by default Any other subvolumes can be snapper snapshotted and restored, but they'll be ignored and *not emptied* when snapper restores the *root* subvolume Are you pontificating a possible problem, or actually saying that the behaviour of snapper breaks with systemd 219 and nspawn containers? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Jul 16, 2015 at 9:44 AM, Richard Brown
Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper?
It's not being messed up by snapper, it's messed up by not having an fstab entry that causes persistence in the contents of /var/lib/machines.
Everything you said sounded reasonable right up until you said " I'm pretty sure the following will happen:"
It's a logical argument because I don't happen to have either a Factory installation right now, or an fstab from a Factory installation. But I have a 13.2 installation so I'm familiar with the openSUSE fstab, and I have a Fedora 22 which has systemd 219 and it creates subvolumes for containers at /var/lib/machines/. Since a btrfs snapshot will stop at that subvolume, none of the nspawn containers will be in any of the snapshots of /. And because fstab doesn't contain either a /var or /var/lib or a /var/lib/machines mount point, there is no persistence of /var/lib/machines - systemd will see it's missing in any rollback and create a new subvolume there, which will be empty. So yes your containers will vanish, unless I'm missing something, and I don't think I am or I wouldn't have started the thread.
I think we should look into fixing it if is IS broken..but I think you're mistaken Snapper only backs up and restores the *root* subvolume, / by default
Snapper rollback depends on the fstab being properly populated to do rollbacks.
Any other subvolumes can be snapper snapshotted and restored, but they'll be ignored and *not emptied* when snapper restores the *root* subvolume
Are you pontificating a possible problem, or actually saying that the behaviour of snapper breaks with systemd 219 and nspawn containers?
I am hypothesizing based on available facts. But I find your pompous third way of asking the same question entertaining. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 16 July 2015 at 18:11, Chris Murphy
I am hypothesizing based on available facts. But I find your pompous third way of asking the same question entertaining.
Unreserved apologies, it's been a long day and I guess it's showing, though that's no excuse Sorry again -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Jul 16, 2015 at 10:11:03AM -0600, Chris Murphy wrote:
On Thu, Jul 16, 2015 at 9:44 AM, Richard Brown
wrote: Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper?
It's not being messed up by snapper, it's messed up by not having an fstab entry that causes persistence in the contents of /var/lib/machines.
I agree. And if systemd creates that subvolume it also has to add
it to fstab just like YaST and snapper do with the subvolumes
they create.
BTW: It also creates the subvolume in a wrong way which makes
deleting snapshot #1 impossible (see bug #910602 - sorry internal
only).
So please open a bug report (and add me to CC).
Regards,
Arvin
--
Arvin Schnell,
On Fri, Jul 17, 2015 at 03:23:08PM +0200, Arvin Schnell wrote:
On Thu, Jul 16, 2015 at 10:11:03AM -0600, Chris Murphy wrote:
On Thu, Jul 16, 2015 at 9:44 AM, Richard Brown
wrote: Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper?
It's not being messed up by snapper, it's messed up by not having an fstab entry that causes persistence in the contents of /var/lib/machines.
I agree. And if systemd creates that subvolume it also has to add it to fstab just like YaST and snapper do with the subvolumes they create.
BTW: It also creates the subvolume in a wrong way which makes deleting snapshot #1 impossible (see bug #910602 - sorry internal only).
So please open a bug report (and add me to CC).
Where does systemd create a subvolume? AFAICS from udev builtin code there is an ioctl BTRFS_IOC_DEVICES_READY. Also there are some rules in /usr/lib/udev/rules.d/64-btrfs.rules to get btrfs ready with the help of the builtin code. Werner -- "Having a smoking section in a restaurant is like having a peeing section in a swimming pool." -- Edward Burr
On Fri, Jul 17, 2015 at 10:44 AM, Dr. Werner Fink
On Fri, Jul 17, 2015 at 03:23:08PM +0200, Arvin Schnell wrote:
On Thu, Jul 16, 2015 at 10:11:03AM -0600, Chris Murphy wrote:
On Thu, Jul 16, 2015 at 9:44 AM, Richard Brown
wrote: Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper?
It's not being messed up by snapper, it's messed up by not having an fstab entry that causes persistence in the contents of /var/lib/machines.
I agree. And if systemd creates that subvolume it also has to add it to fstab just like YaST and snapper do with the subvolumes they create.
BTW: It also creates the subvolume in a wrong way which makes deleting snapshot #1 impossible (see bug #910602 - sorry internal only).
So please open a bug report (and add me to CC).
Where does systemd create a subvolume?
AFAICS from udev builtin code
No, this is done from tmpfiles.d using the "v" type. /var/lib/machines should not be altered in anyway whatsoever by any other tool other than systemd-nspawn/machinectl. it is private property and what is going on there is unspecified/implementation defined and may change at any time, no assumptions shall be made about it. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, Jul 17, 2015 at 7:23 AM, Arvin Schnell
On Thu, Jul 16, 2015 at 10:11:03AM -0600, Chris Murphy wrote:
On Thu, Jul 16, 2015 at 9:44 AM, Richard Brown
wrote: Are you actually sure that the snapshots of /var/lib/machines are being managed and messed up by snapper?
It's not being messed up by snapper, it's messed up by not having an fstab entry that causes persistence in the contents of /var/lib/machines.
I agree. And if systemd creates that subvolume it also has to add it to fstab just like YaST and snapper do with the subvolumes they create.
The problem with that is there's no standardization or agreement yet
on how to organize Btrfs subvolumes. Upstream has been fairly clear in
the Btrfs wiki and linux-btrfs@ that it's preferable to
a. put subvolumes only in subvolid 5, and assemble them via fstab,
rather than nesting
b. not put any system root on subvolid 5, rather those should go in
their own subvolume (subvolid 5 cannot be (re)named or deleted so its
purpose is neither discoverable or removable)
c. not have utilities changing the default subvolume since that's a
user domain (and user space changeable) shortcut feature
SUSE is totally contrary to a, b, and c. And as a consequence you
can't expect systemd to create
On Fri, Jul 17, 2015 at 09:13:58AM -0600, Chris Murphy wrote:
The problem with that is there's no standardization or agreement yet on how to organize Btrfs subvolumes.
Indeed, people and distributions are still experimenting with subvolumes.
Upstream has been fairly clear in the Btrfs wiki and linux-btrfs@ that it's preferable to a. put subvolumes only in subvolid 5, and assemble them via fstab, rather than nesting b. not put any system root on subvolid 5, rather those should go in their own subvolume (subvolid 5 cannot be (re)named or deleted so its purpose is neither discoverable or removable) c. not have utilities changing the default subvolume since that's a user domain (and user space changeable) shortcut feature
SUSE is totally contrary to a, b, and c.
This is not true for Factory where the system is installed into a subvolume so that it can be deleted easily.
And as a consequence you can't expect systemd to create
/var/lib/machines and mount it at /var/lib/machines, rather than creating a /systemd-machines subvolume mounted at /var/lib/machines. Let alone systemd doing this differently for each distribution.
If systemd (or any other program) would follow the
recommendations and thus add subvolumes to fstab and create them
under subvolid 5 (and not the currently mounted root) thing might
be already fine.
Regards,
Arvin
--
Arvin Schnell,
В Wed, 15 Jul 2015 22:05:44 -0600
Chris Murphy
Starting with systemd 219, it automatically creates subvolumes for nspawn containers at /var/lib/machines. And there's also an commit sometime in March/April that brings snapshot and rollback control of containers into machinectl which leverages btrfs snapshotting.
Has anyone using Factory looked at how snapper behaves when doing rollbacks? I'm pretty sure the following will happen:
The snapshot of the top level of the file system done by snapper will stop at /var/lib/machines/ and not include any of the subvolumes in it. That's not the problem though. The problem happens if you do a rollback, and of course now the /var/lib/machines directory will be empty. All of your containers are in a different subvolume.
Well, as soon as you start messing around with snapshots and rollbacks you need to have explicit indication of mount point for each subvolume. /etc/fstab is as good as anything. ZFS solves it differently and IMNSHO much more elegant. It has no issues with rollbacks.
I don't think it's snappers job to do a recursive snapshot in order to make sure these containers are present in every snapshot. This explodes the number of subvolumes. It duplicates snapshotting (both machinectl and snapper).
I think this is "yet another example" of why nested subvolumes usually aren't a good idea.
Tell that ZFS folks. Do not confuse implementation with idea.
There probably should be a systemd-machines subvolume at the top level of the file system, which is added to fstab to mount it at /var/lib/machines. And then snapper needs a way to know to exclude systemd-machines from its snapshotting management.
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, Jul 17, 2015 at 11:27 AM, Andrei Borzenkov
В Wed, 15 Jul 2015 22:05:44 -0600 Chris Murphy
пишет: Starting with systemd 219, it automatically creates subvolumes for nspawn containers at /var/lib/machines. And there's also an commit sometime in March/April that brings snapshot and rollback control of containers into machinectl which leverages btrfs snapshotting.
Has anyone using Factory looked at how snapper behaves when doing rollbacks? I'm pretty sure the following will happen:
The snapshot of the top level of the file system done by snapper will stop at /var/lib/machines/ and not include any of the subvolumes in it. That's not the problem though. The problem happens if you do a rollback, and of course now the /var/lib/machines directory will be empty. All of your containers are in a different subvolume.
Well, as soon as you start messing around with snapshots and rollbacks you need to have explicit indication of mount point for each subvolume. /etc/fstab is as good as anything. ZFS solves it differently and IMNSHO much more elegant. It has no issues with rollbacks.
It has lots of issues with rollbacks, seeing as it has such a concept and Btrfs doesn't. And this rollback concept provides a clearer path for what should be done, rather than it being rather self-guided like Btrfs.
I don't think it's snappers job to do a recursive snapshot in order to make sure these containers are present in every snapshot. This explodes the number of subvolumes. It duplicates snapshotting (both machinectl and snapper).
I think this is "yet another example" of why nested subvolumes usually aren't a good idea.
Tell that ZFS folks. Do not confuse implementation with idea.
The reason why it is more sane on ZFS is because there is an explicit
parent-child relationship with ZFS snapshots, and no such thing as
writable snapshots. A writable clone on ZFS is different than a Btrfs
writable (by default) snapshot. And on Btrfs there is no explicit
parent-child relationship with snapshots, this is included in the
metadata in the form of UUID, but it's completely valid to delete the
"parent" or origin subvolume for a "child" snapshot - that snapshot
remains unchanged. This can't be done on ZFS where you have to delete
all children before the volume can be deleted.
When doing rollbacks on ZFS, intermediate snapshots are deleted,
because you can only rollback to the most recent snapshot. Changes
since the snapshot are discarded. And again Btrfs doesn't have this
concept at all. Every snapshot is merely a pre-populated subvolume,
that's completely independent from the original. The one thing that
indirectly ties it to the original is the parent subvolume UUID.
The bootloader and its configuration could specify a subvolume to boot
by path (which should always be relative to
Hello Chris and all, On 2015-07-20 T 10:35 -0600 Chris Murphy wrote:
The bootloader and its configuration could specify a subvolume to boot by path (which should always be relative to
, i.e. subvolid 5); or by subvolid, or by subvolume UUID. But none of those three are being done right now by (open)SUSE, instead a completely esoteric and hidden default subvolume switch is used, which totally masks how the system boots.
fortunately, that is not true. If you are booting into a read-only snapshot via Grub2, it is used what you are suggesting. Only if you advice snapper to make this current snapshot the permanent boot target (via "snapper rollback"), then the "btrfs subvol set-default" call is used (internally). It's neither esoteric nor hidden. My suggestion: don't touch the "btrfs" cmdline tool for snapshots, but do everything via "snapper", and you'll see some light:-)
The bottom line for Btrfs though, is scalability of subvolumes is still a problem apparently. Upstream is loosely saying in the realm of a few hundred subvolumes (which includes snapshots) is sane. Many hundreds or thousands are not a good idea. If it's sane to have 20 nspawn containers, each with their own subvolume, each with multiple snapshots, the idea of snapper doing recursive snapshots is rather premature because it easily means thousands of snapshots inside of a week.
Snapper does not do "recursive snapshots", as explained in other E-Mails already. It seems there is some mis-conception, how snapshots with snapper on btrfs work. And with respect to the number of snapshots: size is the limit. So long - MgE -- Matthias G. Eckermann - Senior Product Manager SUSE® Linux Enterprise SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 11:13 AM, Matthias G. Eckermann
Hello Chris and all,
On 2015-07-20 T 10:35 -0600 Chris Murphy wrote:
The bootloader and its configuration could specify a subvolume to boot by path (which should always be relative to
, i.e. subvolid 5); or by subvolid, or by subvolume UUID. But none of those three are being done right now by (open)SUSE, instead a completely esoteric and hidden default subvolume switch is used, which totally masks how the system boots. fortunately, that is not true. If you are booting into a read-only snapshot via Grub2, it is used what you are suggesting.
If only the snapper persistent rollback worked the same way.
Only if you advice snapper to make this current snapshot the permanent boot target (via "snapper rollback"), then the "btrfs subvol set-default" call is used (internally).
It's neither esoteric nor hidden.
Ok well saying things doesn't make them true. It's esoteric because it requires a.) knowing that you have a Btrfs file system; b.) that it uses a concept of subvolumes; c.) that it uses the concept of default subvolumes; d.) that there's a user space command to determine and set the default; e.) that there's no indication in dmesg or journal what subvolume is actually used to boot, so if you either have short term memory or you're not the user who did the rollback, you have next to no chance of figuring out how your system boots without hours of research. Yes it's esoteric, yes it's hidden.
My suggestion: don't touch the "btrfs" cmdline tool for snapshots, but do everything via "snapper", and you'll see some light:-)
The bottom line for Btrfs though, is scalability of subvolumes is still a problem apparently. Upstream is loosely saying in the realm of a few hundred subvolumes (which includes snapshots) is sane. Many hundreds or thousands are not a good idea. If it's sane to have 20 nspawn containers, each with their own subvolume, each with multiple snapshots, the idea of snapper doing recursive snapshots is rather premature because it easily means thousands of snapshots inside of a week.
Snapper does not do "recursive snapshots", as explained in other E-Mails already. It seems there is some mis-conception, how snapshots with snapper on btrfs work.
No, the conversation has digressed somewhat to the "what if" snapper, and by extension btrfs user space tools, offered a recursive snapshot option - which is what systemd developers have asked btrfs upstream for. So no on is saying snapper does recursive snapshots today. I'm saying it's a specious idea that it should be capable of doing so.
And with respect to the number of snapshots: size is the limit.
In theory that's true, in practice Btrfs itself has problems with many snapshots at the moment irrespective of available free space in the volume. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
Yes it's esoteric, yes it's hidden.
More examples of this: - posting grub prefix, grub.cfg, dmesg or journal, no one can have any idea what root the system actually boots from. This is unlike non-Btrfs installations. - from GRUB, editing the grub menu, I have no way to determine what subvolume will be booted. - from GRUB CLI, I can't determine what the default subvolume is, even if I have the esoteric knowledge that is subvolumes and subvolume ID and the concept of default subvolumes. There's no way to know what subvolume was or will be booted, unless you have a working Linux installation with btrfs-progs in order to a.) mount the btrfs volume, and then b.) btrfs sub get-default. It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Me wonders: why do you care about the specific subvolume,
which the system is booting from?
That information is "irrelevant" in the sence
that if you roll back you don't care about btrfs or subvolumes,
But you care about the specific configuration
(Date, kernel version, config status)
you want to go back to.
Don't you?
MgE
Am 20. Juli 2015 21:26:56 MESZ, schrieb Chris Murphy
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: Yes it's esoteric, yes it's hidden.
More examples of this:
- posting grub prefix, grub.cfg, dmesg or journal, no one can have any idea what root the system actually boots from. This is unlike non-Btrfs installations.
- from GRUB, editing the grub menu, I have no way to determine what subvolume will be booted.
- from GRUB CLI, I can't determine what the default subvolume is, even if I have the esoteric knowledge that is subvolumes and subvolume ID and the concept of default subvolumes.
There's no way to know what subvolume was or will be booted, unless you have a working Linux installation with btrfs-progs in order to a.) mount the btrfs volume, and then b.) btrfs sub get-default.
It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
-- Sent from my mobile phone. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 3:27 PM, Matthias G. Eckermann
Me wonders: why do you care about the specific subvolume, which the system is booting from?
That information is "irrelevant" in the sence that if you roll back you don't care about btrfs or subvolumes,
It's like saying what drive, what partition/volume, what directory is booted is also irrelevant. For most people as long as is just works, it's not immediately relevant. But for free and open source software in particular I think even when things are working it's also important that how they work, their discrete steps, can be followed and understood logically, are discoverable and self-describing as much as practical. This is a regression in that sense because it lacks all of those things.
But you care about the specific configuration (Date, kernel version, config status) you want to go back to. Don't you?
Sure. I also care about consistent domain of file system features. And this is a user domain feature, but it's been taken away by design for the exclusive use by a utility for just the boot volume. That's also confusing and non-obvious. The bigger issue here is that the way Linux OS's boot are diverging in increasingly incompatible ways. It's the opposite of standardization. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 01:26:56PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
That basically means you have to update those self contained grub.cfg whenever it's in new created snapshots and also reinstalling bootloader for pointing prefix to new rollback subvolume. We want to avoid that as the requirement is snapshot and rollback operations to be atomic and as plain file system operations. Thanks, Michael -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 10:33 PM, Michael Chang
On Mon, Jul 20, 2015 at 01:26:56PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
That basically means you have to update those self contained grub.cfg whenever it's in new created snapshots and also reinstalling bootloader for pointing prefix to new rollback subvolume. We want to avoid that as the requirement is snapshot and rollback operations to be atomic and as plain file system operations.
OSTree manages to do atomic updates and rollbacks on ext4 and XFS, without any of the negatives you list. There is no need to recreate or modify grub.cfg from scratch, it becomes a static configuration file, augmented by a drop-in snippet per boot entry. These are human readable, and easier to understand than grub.cfg. Updates and rollbacks are completely atomic operations, even on non-COW file systems. And ostree/rpm-ostree aren't the only way to do this correctly. However that project proves it's possible to: - use BLS snippets to augment grub.cfg, obviating the need to replace or modify grub.cfg; - avoid reinstalling the bootloader; - maintain subvol set-default as the user feature it was intended to be; - provide clarity of what fs tree will be booted, is booted, and has been booted; - present an explicit way forward for better dual boot cooperation among the distros. None of those things are true with the current implementation. None are addressed by your response. I think these things should be requirements too, not just atomicity of updates and rollbacks. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 11:57:23PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 10:33 PM, Michael Chang
wrote: On Mon, Jul 20, 2015 at 01:26:56PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
That basically means you have to update those self contained grub.cfg whenever it's in new created snapshots and also reinstalling bootloader for pointing prefix to new rollback subvolume. We want to avoid that as the requirement is snapshot and rollback operations to be atomic and as plain file system operations.
OSTree manages to do atomic updates and rollbacks on ext4 and XFS, without any of the negatives you list. There is no need to recreate or modify grub.cfg from scratch, it becomes a static configuration file, augmented by a drop-in snippet per boot entry. These are human readable, and easier to understand than grub.cfg. Updates and rollbacks are completely atomic operations, even on non-COW file systems.
And ostree/rpm-ostree aren't the only way to do this correctly. However that project proves it's possible to:
- use BLS snippets to augment grub.cfg, obviating the need to replace or modify grub.cfg;
Actually it could be the created BLS snippets that boots the kernel and initrd, compared to the approach and needs to boot from snapshotted grub.cfg which should be viewed as different questions.
- avoid reinstalling the bootloader; - maintain subvol set-default as the user feature it was intended to be; - provide clarity of what fs tree will be booted, is booted, and has been booted; - present an explicit way forward for better dual boot cooperation among the distros.
None of those things are true with the current implementation. None are addressed by your response. I think these things should be requirements too, not just atomicity of updates and rollbacks.
As long as ostree did not perform btrfs set-default and also don't need to boot from btrfs snapshotted grub.cfg, I agree it may work without these considerations we have. Thanks, Michael
-- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jul 21, 2015 at 2:10 AM, Michael Chang
On Mon, Jul 20, 2015 at 11:57:23PM -0600, Chris Murphy wrote:
OSTree manages to do atomic updates and rollbacks on ext4 and XFS, without any of the negatives you list. There is no need to recreate or modify grub.cfg from scratch, it becomes a static configuration file, augmented by a drop-in snippet per boot entry. These are human readable, and easier to understand than grub.cfg. Updates and rollbacks are completely atomic operations, even on non-COW file systems.
And ostree/rpm-ostree aren't the only way to do this correctly. However that project proves it's possible to:
- use BLS snippets to augment grub.cfg, obviating the need to replace or modify grub.cfg;
Actually it could be the created BLS snippets that boots the kernel and initrd, compared to the approach and needs to boot from snapshotted grub.cfg which should be viewed as different questions.
The grub.cfg containing paths relative to the default subvolume
instead of
- avoid reinstalling the bootloader; - maintain subvol set-default as the user feature it was intended to be; - provide clarity of what fs tree will be booted, is booted, and has been booted; - present an explicit way forward for better dual boot cooperation among the distros.
None of those things are true with the current implementation. None are addressed by your response. I think these things should be requirements too, not just atomicity of updates and rollbacks.
As long as ostree did not perform btrfs set-default and also don't need to boot from btrfs snapshotted grub.cfg, I agree it may work without these considerations we have.
ostree doesn't require btrfs, let alone changing default subvolume. It works the same on ext4 and XFS. There's no snapshotting of grub.cfg. At the moment ostree doesn't snapshot anything, but does do reflink copies of /etc when on Btrfs (instead of hard links), other optimizations for Btrfs are possible but not yet done. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jul 21, 2015 at 7:33 AM, Michael Chang
On Mon, Jul 20, 2015 at 01:26:56PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
That basically means you have to update those self contained grub.cfg whenever it's in new created snapshots and also reinstalling bootloader for pointing prefix to new rollback subvolume. We want to avoid that as the requirement is snapshot and rollback operations to be atomic and as plain file system operations.
Have you seen my reply which outlines possible implementation that avoids it but also does not require fiddling with default subvolume? http://lists.opensuse.org/opensuse/2015-07/msg00795.html -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jul 21, 2015 at 09:46:12AM +0300, Andrei Borzenkov wrote:
On Tue, Jul 21, 2015 at 7:33 AM, Michael Chang
wrote: On Mon, Jul 20, 2015 at 01:26:56PM -0600, Chris Murphy wrote:
On Mon, Jul 20, 2015 at 12:20 PM, Chris Murphy
wrote: It's so esoteric and hidden I can't imagine on what basis someone could describe it as being obvious and found in plain sight.
That basically means you have to update those self contained grub.cfg whenever it's in new created snapshots and also reinstalling bootloader for pointing prefix to new rollback subvolume. We want to avoid that as the requirement is snapshot and rollback operations to be atomic and as plain file system operations.
Have you seen my reply which outlines possible implementation that avoids it but also does not require fiddling with default subvolume?
Somehow it get skipped in my mailbox, I just replied it. Sorry about that. Regards, Michael
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello Chris,
Am 20. Juli 2015 20:20:36 MESZ, schrieb Chris Murphy
On Mon, Jul 20, 2015 at 11:13 AM, Matthias G. Eckermann
wrote:
And with respect to the number of snapshots: size is the limit.
In theory that's true, in practice Btrfs itself has problems with many snapshots at the moment irrespective of available free space in the volume.
my experience with btrfs is different in the positive way. Can you point us to SUSE bugreports indicating the opposite? MgE -- Sent from my mobile phone. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jul 20, 2015 at 3:21 PM, Matthias G. Eckermann
Hello Chris,
Am 20. Juli 2015 20:20:36 MESZ, schrieb Chris Murphy
: On Mon, Jul 20, 2015 at 11:13 AM, Matthias G. Eckermann
wrote: And with respect to the number of snapshots: size is the limit.
In theory that's true, in practice Btrfs itself has problems with many snapshots at the moment irrespective of available free space in the volume.
my experience with btrfs is different in the positive way.
Can you point us to SUSE bugreports indicating the opposite?
No, that information is on the upstream linux-btrfs@ list. It's come up several times, in particular when deleting many snapshots because the metadata for all extents involved must be visited and rewritten so a lot of deletion can cause a lot of writing. On of the more well knowns cases is with VM images and databasess, even when not snapshotting and even when using chattr +C. vmm-libvirt places its images in /var/lib/libvirt/images and isn't excluded from snapper, so that's a problem source. But snapshotting makes the problem worse, quickly. http://www.spinics.net/lists/linux-btrfs/msg40563.html -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 2015-07-20 T 16:51 -0600 Chris Murphy wrote:
On Mon, Jul 20, 2015 Matthias G. Eckermann wrote:
Am 20. Juli 2015 20:20:36 MESZ, schrieb Chris Murphy:
On Mon, Jul 20, 2015 at 11:13 AM, Matthias G. Eckermann wrote:
And with respect to the number of snapshots: size is the limit.
In theory that's true, in practice Btrfs itself has problems with many snapshots at the moment irrespective of available free space in the volume.
my experience with btrfs is different in the positive way.
Can you point us to SUSE bugreports indicating the opposite?
No, that information is on the upstream linux-btrfs@ list. It's come up several times, in particular when deleting many snapshots because the metadata for all extents involved must be visited and rewritten so a lot of deletion can cause a lot of writing.
While this is true, it does not limit the number of snapshots. It only determines the time to delete snapshots (which is a different question).
On of the more well knowns cases is with VM images and databasess, even when not snapshotting and even when using chattr +C.
That is a different question than the _number_ of snapshots, and frankly, the same is true for all CoW filesystems (including for example ZFS).
vmm-libvirt places its images in /var/lib/libvirt/images and isn't excluded from snapper, so that's a problem source. But snapshotting makes the problem worse, quickly. http://www.spinics.net/lists/linux-btrfs/msg40563.html
As far as I am aware, VM images under /var/lib/libvirt are automatically created with NoCOW on openSUSE (>13.1) and SUSE Linux Enterprise 12 systems. Please check. If this is not the case, it's worth a bug report. So logn - MgE -- Matthias G. Eckermann - Senior Product Manager SUSE® Linux Enterprise Phone: +49 30 44315731 Mobile/DE: +49 179 2949448 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jul 21, 2015 at 2:33 AM, Matthias G. Eckermann
On 2015-07-20 T 16:51 -0600 Chris Murphy wrote:
On Mon, Jul 20, 2015 Matthias G. Eckermann wrote:
Am 20. Juli 2015 20:20:36 MESZ, schrieb Chris Murphy:
On Mon, Jul 20, 2015 at 11:13 AM, Matthias G. Eckermann wrote:
And with respect to the number of snapshots: size is the limit.
In theory that's true, in practice Btrfs itself has problems with many snapshots at the moment irrespective of available free space in the volume.
my experience with btrfs is different in the positive way.
Can you point us to SUSE bugreports indicating the opposite?
No, that information is on the upstream linux-btrfs@ list. It's come up several times, in particular when deleting many snapshots because the metadata for all extents involved must be visited and rewritten so a lot of deletion can cause a lot of writing.
While this is true, it does not limit the number of snapshots. It only determines the time to delete snapshots (which is a different question).
I said in particular when, not only when or especially when.
On of the more well knowns cases is with VM images and databasess, even when not snapshotting and even when using chattr +C.
That is a different question than the _number_ of snapshots, and frankly, the same is true for all CoW filesystems (including for example ZFS).
It isn't orthogonal to the number, because new writes are cow even on nocow files, when they are reflink copied or the containing subvolume is snapshot. So yes the number of snapshot will affect the fragmentation and performance and it also depends on the concurrency of writes to the copies. Pretty much the only thing that's next to free is the snapshot or reflink copy operation itself. Any additional operations are going to be impacted and not always in obvious ways.
vmm-libvirt places its images in /var/lib/libvirt/images and isn't excluded from snapper, so that's a problem source. But snapshotting makes the problem worse, quickly. http://www.spinics.net/lists/linux-btrfs/msg40563.html
As far as I am aware, VM images under /var/lib/libvirt are automatically created with NoCOW on openSUSE (>13.1) and SUSE Linux Enterprise 12 systems. Please check. If this is not the case, it's worth a bug report.
That does matter for performance reasons, but also that directory should be a subvolume so it doesn't get snapshot along with /, and that subvolume needs to be in fstab so that upon rollback the VM images don't disappear. I filed that bug as: https://features.opensuse.org/319299 I haven't tested whether virt-manager on opensuse defaults to -nocow=on for images; that's not the case with qemu-img or virt-manager upstream. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (8)
-
Andrei Borzenkov
-
Arvin Schnell
-
Chris Murphy
-
Cristian Rodríguez
-
Dr. Werner Fink
-
Matthias G. Eckermann
-
Michael Chang
-
Richard Brown