[opensuse] Btrfs clone with dd, gotcha
I'm catching up with archives and found a thread on cloning and wanted to point out that in particular on Btrfs this can be a problem: "Do not make a block-level copy of a btrfs filesystem to a block device, and then try to mount either the original or the copy while both are visible to the same kernel." https://btrfs.wiki.kernel.org/index.php/Gotchas The same problem applies to LVM snapshots (thick or thin). And likely similar problems apply to ext4 and XFS when using metadata checksumming. This isn't the default yet in ext4, but is the default with XFS format v5 starting with xfsprogs 3.2. btrfs-progs 4.1 now has a way to change the fs volume UUID. So after cloning if there is *any* possibility the two file systems will be visible to the kernel at the same time while one will be mounted, first it's best to change the UUID on one of them. This will take a while as the fs volume UUID is used constantly in the metadata format. In effect the whole fs (minus data) has to be read and written. There might still be bugs, but another way to clone a Btrfs volume is the seed device. Make A a seed device, and it mounts read-only, add a new device to the volume, then umount and remount, and it mounts read-write, then delete the seed device and data is migrated to the new device. But also, the UUID is unique on the new device. After it's completely, the seed device can be made a non-seed volume again (no longer mounts read only). Just an idea. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/29/2015 02:48 PM, Chris Murphy wrote:
I'm catching up with archives and found a thread on cloning and wanted to point out that in particular on Btrfs this can be a problem:
"Do not make a block-level copy of a btrfs filesystem to a block device, and then try to mount either the original or the copy while both are visible to the same kernel." https://btrfs.wiki.kernel.org/index.php/Gotchas
The same problem applies to LVM snapshots (thick or thin). And likely similar problems apply to ext4 and XFS when using metadata checksumming. This isn't the default yet in ext4, but is the default with XFS format v5 starting with xfsprogs 3.2.
btrfs-progs 4.1 now has a way to change the fs volume UUID. So after cloning if there is *any* possibility the two file systems will be visible to the kernel at the same time while one will be mounted, first it's best to change the UUID on one of them. This will take a while as the fs volume UUID is used constantly in the metadata format. In effect the whole fs (minus data) has to be read and written.
There might still be bugs, but another way to clone a Btrfs volume is the seed device. Make A a seed device, and it mounts read-only, add a new device to the volume, then umount and remount, and it mounts read-write, then delete the seed device and data is migrated to the new device. But also, the UUID is unique on the new device. After it's completely, the seed device can be made a non-seed volume again (no longer mounts read only).
Just an idea.
Wow. Who will remember that? What an amazing gotcha. -- After all is said and done, more is said than done. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/29/2015 04:53 PM, John Andersen wrote:
On 06/29/2015 02:48 PM, Chris Murphy wrote: <snip>
btrfs-progs 4.1 now has a way to change the fs volume UUID. So after cloning if there is *any* possibility the two file systems will be visible to the kernel at the same time while one will be mounted, first it's best to change the UUID on one of them. This will take a while as the fs volume UUID is used constantly in the metadata format. In effect the whole fs (minus data) has to be read and written.
There might still be bugs, but another way to clone a Btrfs volume is the seed device. Make A a seed device, and it mounts read-only, add a new device to the volume, then umount and remount, and it mounts read-write, then delete the seed device and data is migrated to the new device. But also, the UUID is unique on the new device. After it's completely, the seed device can be made a non-seed volume again (no longer mounts read only).
<snip>
JA, Haven't you figured it out yet? It's a plot I tell ya'. In order to preserve data integrity, we are supposed to spend every hour of every day keeping up with so-call 'advancements' in new filesystems to the point that nefarious interests can easily mount a world takeover without notice or resistance. Resistance is futile.... -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2015-06-29 23:48, Chris Murphy wrote:
btrfs-progs 4.1 now has a way to change the fs volume UUID. So after cloning if there is *any* possibility the two file systems will be visible to the kernel at the same time while one will be mounted, first it's best to change the UUID on one of them. This will take a while as the fs volume UUID is used constantly in the metadata format. In effect the whole fs (minus data) has to be read and written.
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlWRyJwACgkQja8UbcUWM1wJAQD+PgW6VJkE2gyNhCxZKtt8CRvO AJ5a3m9UCfN+jcJJ25YA/3Y0VPniOGDzWgWBDcd2RMlKD1yHBZWfZzAfw+l4n675 =LU11 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, Jun 29, 2015 at 4:37 PM, Carlos E. R. <carlos.e.r@opensuse.org> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 2015-06-29 23:48, Chris Murphy wrote:
btrfs-progs 4.1 now has a way to change the fs volume UUID. So after cloning if there is *any* possibility the two file systems will be visible to the kernel at the same time while one will be mounted, first it's best to change the UUID on one of them. This will take a while as the fs volume UUID is used constantly in the metadata format. In effect the whole fs (minus data) has to be read and written.
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
You might be thinking of this bug where changing UUID on XFS v5 file systems destroys it. http://oss.sgi.com/archives/xfs/2015-04/msg00021.html This commit fixes that problem by disabling the change UUID feature in xfsprogs, and is found in xfsprogs 3.2.3 now. I don't know if this is temporary, or time line on making it v5 compatible. http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=commit;h=609... -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2015-06-30 04:11, Chris Murphy wrote:
On Mon, Jun 29, 2015 at 4:37 PM, Carlos E. R. <> wrote:
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
You might be thinking of this bug where changing UUID on XFS v5 file systems destroys it. http://oss.sgi.com/archives/xfs/2015-04/msg00021.html
This commit fixes that problem by disabling the change UUID feature in xfsprogs, and is found in xfsprogs 3.2.3 now. I don't know if this is temporary, or time line on making it v5 compatible. http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=commit;h=609...
They
were considering a tool to change the uuid on the whole metadata. I don't read all the posts, so I don't know what was the final result. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlWSceAACgkQja8UbcUWM1w0MQEAmKWi7a2dEBnT1Kd/BuAxoYtW IKOnV3OlJYvXkNRuPpQA/3kgvLsNSbJdOgenBokEoW2Jh63Qy+D5KwT5pCrhwFD2 =yBpM -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 30/06/2015 00:37, Carlos E. R. a écrit :
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
in fact, *cloning with dd" is made to have an exact replacement, a disk that can be used if the first one break, not as copy (use rsync instead) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, Jun 30, 2015 at 8:08 AM, jdd <jdd@dodin.org> wrote:
Le 30/06/2015 00:37, Carlos E. R. a écrit :
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
in fact, *cloning with dd" is made to have an exact replacement, a disk that can be used if the first one break, not as copy (use rsync instead)
If the first one breaks, any copy made this way will be hopelessly outdated to be of any use. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 30/06/2015 08:36, Andrei Borzenkov a écrit :
If the first one breaks, any copy made this way will be hopelessly outdated to be of any use.
why? it's only useful for system disk, of course, but data have to be backed anyway jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, Jun 30, 2015 at 1:40 AM, jdd <jdd@dodin.org> wrote:
Le 30/06/2015 08:36, Andrei Borzenkov a écrit :
If the first one breaks, any copy made this way will be hopelessly outdated to be of any use.
why?
it's only useful for system disk, of course, but data have to be backed anyway
Yeah I think it's better that the work on stateless systems actually happens. And maybe once the Btrfs seed device stuff stabilizes it can help out and for some people even obviate the traditional installer. So the idea of the seed device with live media install is there is an btrfs image set as seed device (means it mounts read-only). The installer's job is fairly simple (which of course is fairly not simple behind the scenes to deal with myriad possible pre-existing layouts, but simple to the user as in, far fewer options than most Linux installers currently have), which is just to set a standard layout. For example one partition for LUKS+Btrfs, and another partition LUKS+swap, (and extra partitions for booting as needed). The LUKS block device is added, so the sequence is like this: mount -t btrfs /run/media/live-os/opensuse<version>btrfsseed.img /mnt/sysimage btrfs device add /dev/sda2 /mnt/sysimage mount -o remount,rw /dev/sda2 /mnt/sysimage btrfs device delete /run/media/live-os/opensuse<version>btrfsseed.img /mnt/sysimage So now this is a multiple device Btrfs volume: seed device + local user device. And it live migrates chunks from the seed to the user's drive. The user can continue to use the live environment normally, web browsing and setting application preferences. No need to reboot from this installation. After the seed is removed, it's possible to snapshot the current state as a reset point, and be able to do system reset or refresh, without removing user data on /home. Better, with some more discipline it would be possible to better separate OS binaries from OS settings from OS applications (bundled) from extra installed applications, user settings and user data. There's still a lot of confusion around /var and /etc data, and the work to make it possible to boot with them empty and get populated from default data found in /usr. This means less dependence on snapshotting to do a system reset, and to use snapshots mainly as a means to rollback OS subversions (from a bad update, which arguably shouldn't happen, but...) dd as a way to get to statelessness is really tedious and slow because you have to read and write all sectors on the partition even if they're useless. The see device approach (of course even rsync) is much faster. Also, it's possible to concatenate seeds. https://btrfs.wiki.kernel.org/index.php/Seed-device -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On June 29, 2015 11:36:40 PM PDT, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On Tue, Jun 30, 2015 at 8:08 AM, jdd <jdd@dodin.org> wrote:
Le 30/06/2015 00:37, Carlos E. R. a écrit :
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
in fact, *cloning with dd" is made to have an exact replacement, a disk that can be used if the first one break, not as copy (use rsync instead)
If the first one breaks, any copy made this way will be hopelessly outdated to be of any use. ..
How is that different than any other full Image backup? -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 30/06/2015 09:42, John Andersen a écrit :
How is that different than any other full Image backup?
don't know what you call "full image backup", but it keeps UUID and grub infos jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, Jun 30, 2015 at 10:42 AM, John Andersen <jsamyth@gmail.com> wrote:
On June 29, 2015 11:36:40 PM PDT, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On Tue, Jun 30, 2015 at 8:08 AM, jdd <jdd@dodin.org> wrote:
Le 30/06/2015 00:37, Carlos E. R. a écrit :
Rings a bell, I think I read about something similar on the XFS mail list. About XFS, of course.
in fact, *cloning with dd" is made to have an exact replacement, a disk that can be used if the first one break, not as copy (use rsync instead)
If the first one breaks, any copy made this way will be hopelessly outdated to be of any use. ..
How is that different than any other full Image backup?
Mmm ... yes, you are right of course. Sorry for noise. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Andrei Borzenkov
-
Carlos E. R.
-
Chris Murphy
-
David C. Rankin
-
jdd
-
John Andersen