Mailinglist Archive: opensuse-features (893 mails)

< Previous Next >
[openFATE 306379] Use rsync when refreshing repositories
  • From: fate_noreply@xxxxxxx
  • Date: Tue, 11 Aug 2009 16:03:17 +0200 (CEST)
  • Message-id: <feature-306379-11@xxxxxxxxxxxxxx>
Feature changed by: Michael Löffler (michl19)
Feature #306379, revision 11
Title: Use rsync when refreshing repositories

- openSUSE-11.2: Evaluation
+ openSUSE-11.2: Rejected by Michael Löffler (michl19)
+ reject date: 2009-08-11 16:01:45
+ reject reason: too late for 11.2, moved to 11.3
Priority
Requester: Important

+ openSUSE-11.3: Evaluation
+ Priority
+ Requester: Important

Requested by: Piotrek Juzwiak (benderbendingrodriguez)

Description:
It would be a great idea to use rsync when refreshing repositories, one
of the bad things is the refresh speed. It gets worse when people have
many repositories. I'm not sure but it already compares if something
has changed in the repo but to speed things up it would be great to use
rsync. For example big repositories like Packman for example download
every time the default 10 minutes are over (in zypp settings) while
nothing great changes there.

Discussion:
#1: Roberto Mannai (robermann79) (2009-05-08 14:57:32)
The best way to download incrementally only the diff of a binary file,
for my best knowledge, is using the GDIFF protocol, who was submitted
ten years ago to the W3C consortium:
http://www.w3.org/TR/NOTE-gdiff-19970901
(http://www.w3.org/TR/NOTE-gdiff-19970901)
I know for sure that a commercial product of Configuration Management
(Marimba, now buyed by BMC - see http://www.marimba.com
(http://www.marimba.com/) ) use it, implemented in Java: it is very
useful in low bandwidth nets, when downloading a service pack, for
example. I don’t know if one person could use that Java algorithm
implementation, anyway, being a commercial application.
Other implementations are in PERL and RUBY:
http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm
(http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.
pm)
http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-18695.html

(http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-18695.html)
An open source .NET (C#) implementation: http://gdiff.codeplex.com/
(http://gdiff.codeplex.com/) with MPL license
I cannot understand why that algorithm is not widely used, given its
quality; it shoud be useful if it was available when downloading large
files like ISOs or VM images, or repositories information

#2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1)
In your usecase, the repository could provide a GDIFF file of content
metadata variation, the delta between two known "versions" of it in the
time.

#3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55)
Hmm, i guess packman wouldn't implement that only for me ;) Though it
would speed things up as it is a widely known and spoken that
refreshing the repo in openSUSE is slow.

#4: Luc de Louw (delouw) (2009-07-04 18:21:31)
Why not rsync? Because it does not work with http(s). This is important
since in many companies the only way to get data from the internet is
via http proxy.
The GDIFF approach sounds promissing

#5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4)
For a "GDIFF on HTTP" implementation, see
http://www.w3.org/TR/NOTE-drp-19970825

#6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26)
Making use of rsync would bring zypper the checksumming, automatic
download resuming/repairing at no cost ;-)



--
openSUSE Feature:
https://features.opensuse.org/306379

< Previous Next >
This Thread
  • No further messages