[SLE] cp and hardlinks
I'm trying to get cp to copy some directory structures that contain hard links (yes, yet another backup scenario). cp has two features that seem nice: (1) It can copy structures with hardlinks correctly (--preserve=links or perhaps even just -d) (2) It can copy incremented structures. That is, if you copy a structure, then you add some more to the source structure, then copy the structure again, it will handle it. But it looks like (1) and (2) don't coexist. If some new files are hardlinks to old ones, they aren't linked in the copy. Does anybody know a way to make this work? Thanks, Dave PS Before anybody says "use rsync", I was, but ... there are rather a lot of files. I had to give it 12 GB of swap space to build its list and it was taking well over 24 hours to run. I can't give it 24 hours :( -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
PS Before anybody says "use rsync", I was, but ... there are rather a lot of files. I had to give it 12 GB of swap space to build its list and it was taking well over 24 hours to run. I can't give it 24 hours :(
How much files would that be? The 355049 files I have make the rsync process take up 27 MB RSS (`ps u`). Considering that ftp4 for example has 3962138 files makes it roughly 301 MB. If it took 12 GB of swap (plus the regular RAM) would mean that you have nearly 160 MILLION files. (On 64bit it may be less) It if takes too long, remove the -c option. Or split it up by walking down some dirs, IOW cd /source; for i in *; do rsync ... $i /target/$i done; Jan Engelhardt -- -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Jan Engelhardt wrote:
PS Before anybody says "use rsync", I was, but ... there are rather a lot of files. I had to give it 12 GB of swap space to build its list and it was taking well over 24 hours to run. I can't give it 24 hours :(
How much files would that be? The 355049 files I have make the rsync process take up 27 MB RSS (`ps u`). Considering that ftp4 for example has 3962138 files makes it roughly 301 MB. If it took 12 GB of swap (plus the regular RAM) would mean that you have nearly 160 MILLION files. (On 64bit it may be less)
There are 15 M files, occupying 183 GB. It actually seems to use a bit over 8 GB of swap space - 12 GB allows a safety margin. It is a 64-bit system. Perhaps there are other factors than just the number of files.
It if takes too long, remove the -c option.
I don't use the -c option :(
Or split it up by walking down some dirs, IOW
cd /source; for i in *; do rsync ... $i /target/$i done;
I was thinking about doing something like that, but the rsync doc says it can only preserve hardlinks if both files are in its source list. So it's a bit more complicated to see if I can find a usable subset. I guess --link-dest might help. Thanks, Dave -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
There are 15 M files, occupying 183 GB. It actually seems to use a bit over 8 GB of swap space - 12 GB allows a safety margin. It is a 64-bit system. Perhaps there are other factors than just the number of files.
It would be interesting to see what `linux32 rsync_32` (compile rsync_32 yourself) has as memory footprint.
I was thinking about doing something like that, but the rsync doc says it can only preserve hardlinks if both files are in its source list.
That is right. But maybe you find that there are no hardlinks across /src/a and /src/b. Another way of doing it: patch rsync to only consider (A) single-hardlink files (B) more-hardlink files And then do e.g. for ...; rsync ... --only-consider-A /src/... /dst/...; done; rsync --only-consider-B /src/ /dst/; Jan Engelhardt -- -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Dave Howorth wrote:
There are 15 M files, occupying 183 GB. It actually seems to use a bit over 8 GB of swap space - 12 GB allows a safety margin.
Might this be an opportunity use find or something to only rsync files that have actually changed? I know it sort of defeats the idea in this kind of backup-scenario, but still. /Per Jessen, Zürich -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
There are 15 M files,
That's "just" 721 kernel trees.
occupying 183 GB.
Just 694 ktrees. I can't beleive rsync takes so much :p
It actually seems to use a bit over 8 GB of swap space - 12 GB allows a safety margin.
Might this be an opportunity use find or something to only rsync files that have actually changed?
That's the point of rsync, actually, to sync files that have actually changed.
I know it sort of defeats the idea in this kind of backup-scenario, but still.
Jan Engelhardt -- -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
On 7/25/06, Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> wrote:
I'm trying to get cp to copy some directory structures that contain hard links (yes, yet another backup scenario). cp has two features that seem nice:
(1) It can copy structures with hardlinks correctly (--preserve=links or perhaps even just -d)
(2) It can copy incremented structures. That is, if you copy a structure, then you add some more to the source structure, then copy the structure again, it will handle it.
But it looks like (1) and (2) don't coexist. If some new files are hardlinks to old ones, they aren't linked in the copy.
Does anybody know a way to make this work?
Thanks, Dave
PS Before anybody says "use rsync", I was, but ... there are rather a lot of files. I had to give it 12 GB of swap space to build its list and it was taking well over 24 hours to run. I can't give it 24 hours :(
Dave, I assume my previous suggestion to try rdiff-backup also blew-up? It does not use rsync, but instead uses a very rsync like algorythm so it may not have the same issue with a large number of files. Also, the rdiff-backup list has at least some traffic, so a post there might get you a work around or a developer looking into it. http://www.nongnu.org/rdiff-backup/savannah.html#mailing_list HTH Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Greg Freemyer wrote:
I assume my previous suggestion to try rdiff-backup also blew-up? It does not use rsync, but instead uses a very rsync like algorythm so it may not have the same issue with a large number of files.
Also, the rdiff-backup list has at least some traffic, so a post there might get you a work around or a developer looking into it.
http://www.nongnu.org/rdiff-backup/savannah.html#mailing_list
Hi Greg, This is a very different system to my other question. But perhaps it's worth experimenting with rdiff-backup to see if it's faster/smaller than rsync. Thanks, Dave -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
On Tuesday 25 July 2006 17:06, Dave Howorth wrote:
PS Before anybody says "use rsync", I was, but ... there are rather a lot of files. I had to give it 12 GB of swap space to build its list and it was taking well over 24 hours to run. I can't give it 24 hours :(
i believe you may have had a recursive link in there somewhere. i've used rsync a lot, on large data amounts (50k files+), and never seen it swap like that. See rsync's options for following/copying symlinks. -- ----- stephan@s11n.net http://s11n.net "...pleasure is a grace and is not obedient to the commands of the will." -- Alan W. Watts
participants (5)
-
Dave Howorth
-
Greg Freemyer
-
Jan Engelhardt
-
Per Jessen
-
stephan beal