Feature changed by: Jan Engelhardt (jengelh) Feature #306379, revision 9 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 + #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) + Making use of rsync would bring zypper the checksumming, automatic + download resuming/repairing at no cost ;-) -- openSUSE Feature: https://features.opensuse.org/306379