On Fri, Nov 3, 2023 at 11:52 AM jdd@dodin.org <jdd@dodin.org> wrote:
btrfs deduplication and rsync to other file system...
Hello,
I'm looking on btrfs deduplication system, because I have *data* (not system) in archives with many duplicates texts. The total is now around 1Tb large and growing and I would like to keep it on 1Tb disks (I have several copies). Often old text work or mail archives.
I tried fdup or similar deduplication application, but then I got a scattered archive. I still have the data, but the files that are together are now on various folders. I had to stop doing so.
If I understand well, btrfs deduplication works with links (the removed
You are mistaken. It works on data extent level and has nothing to do with links. You are probably confused by the "link" in "reflink".
files are replaced by a symlink). duperemove:
https://btrfs.readthedocs.io/en/latest/Deduplication.html
this needs btrfs file system (or similar high end one, I will stay with btrfs)
question: what happen if I rsync the data to and ext4 file system (on an other disk? Is deduplication lost or kept?
btrfs deduplication cannot be kept on ext4 by definition, because ext4 does not support reflinks. And even if it did, rsync does not support them (it works in units of files, not extents), so deduplication would be lost even if you rsync from btrfs to btrfs.
same question is I sync this to the cloud.
Ask your cloud provider what they do with data they receive. If you are concerned with the amount of data you upload and its cost - it is the question to the tool you use to "sync this to the cloud". Your tool may perform client side deduplication. But whatever happens, it is completely independent of and unaware of what btrfs does.
Why? because I wonder if my archives may be readable by an other (linux) system in the future, not every one support btrfs when ext4 is global.
by the way, I don't find up to date documentation about btrfs backup (the wiki is flagged "obsolete").
btrfs is just a filesystem which can be backed up as any other filesystem.
If I correctly explained my problem, one may understand that I want to *archive data*. No incremental backup needed. I keep on total 4Tb of data (but the 3 more Tb data are photos, videos and iso files kept unique and already compressed, no chance to gain size).
and in fine, obviously, if one have any better solution to what I try to do, I'll take it :-)
thanks jdd -- https://artdagio.fr