Re: btrfs deduplication and rsync

3 Nov 2023


      On Fri, Nov 3, 2023 at 11:52 AM jdd@dodin.org <jdd@dodin.org> wrote:
...
btrfs deduplication and rsync to other file system...
Hello,
I'm looking on btrfs deduplication system, because I have *data* (not
system) in archives with many duplicates texts. The total is now around
1Tb large and growing and I would like to keep it on 1Tb disks (I have
several copies). Often old text work or mail archives.
I tried fdup or similar deduplication application, but then I got a
scattered archive. I still have the data, but the files that are
together are now on various folders. I had to stop doing so.
If I understand well, btrfs deduplication works with links (the removed
You are mistaken. It works on data extent level and has nothing to do
with links. You are probably confused by the "link" in "reflink".
...
files are replaced by a symlink). duperemove:
https://btrfs.readthedocs.io/en/latest/Deduplication.html
this needs btrfs file system (or similar high end one, I will stay with
btrfs)
question: what happen if I rsync the data to and ext4 file system (on an
other disk? Is deduplication lost or kept?
btrfs deduplication cannot be kept on ext4 by definition, because ext4
does not support reflinks. And even if it did, rsync does not support
them (it works in units of files, not extents), so deduplication would
be lost even if you rsync from btrfs to btrfs.
...
same question is I sync this
to the cloud.
Ask your cloud provider what they do with data they receive. If you
are concerned with the amount of data you upload and its cost - it is
the question to the tool you use to "sync this to the cloud". Your
tool may perform client side deduplication. But whatever happens, it
is completely independent of and unaware of what btrfs does.
...
Why? because I wonder if my archives may be readable by an other (linux)
system in the future, not every one support btrfs when ext4 is global.
by the way, I don't find up to date documentation about btrfs backup
(the wiki is flagged "obsolete").
btrfs is just a filesystem which can be backed up as any other filesystem.
...
If I correctly explained my problem, one may understand that I want to
*archive data*. No incremental backup needed. I keep on total 4Tb of
data (but the 3 more Tb data are photos, videos and iso files kept
unique and already compressed, no chance to gain size).
and in fine, obviously, if one have any better solution to what I try to
do, I'll take it :-)
thanks
jdd
--
https://artdagio.fr

Re: btrfs deduplication and rsync

Andrei Borzenkov