[openFATE 306379] Use rsync when refreshing repositories
Feature added by: Piotrek Juzwiak (BenderBendingRodriguez) Feature #306379, revision 1, last change by Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Piotrek Juzwiak (BenderBendingRodriguez) Feature #306379, revision 2 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use - rsync. + rsync. For example big repositories like Packman for example download + every time the default 10 minutes are over (in zypp settings) while + nothing great changes there. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Roberto Mannai (robermann79) Feature #306379, revision 4 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. + Discussion: + #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) + The best way to download incrementally only the diff of a binary file, + for my best knowledge, is using the GDIFF protocol, who was submitted + ten years ago to the W3C consortium: + http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) + + I know for sure that a commercial product of Configuration Management + (Marimba, now buyed by BMC - see http://www.marimba.com + (http://www.marimba.com/) ) use it, implemented in Java: it is very + useful in low bandwidth nets, when downloading a service pack, for + example. I don’t know if one person could use that Java algorithm + implementation, anyway, being a commercial application. + Other implementations are in PERL and RUBY: + http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm + (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. + pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... + (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) + An open source .NET (C#) implementation: http://gdiff.codeplex.com/ + (http://gdiff.codeplex.com/) with MPL license + I cannot understand why that algorithm is not widely used, given its + quality; it shoud be useful if it was available when downloading large + files like ISOs or VM images, or repositories information -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Roberto Mannai (robermann79) Feature #306379, revision 5 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information + #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) + In your usecase, the repository could provide a GDIFF file of content + metadata variation, the delta between two known "versions" of it in the + time. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Piotrek Juzwiak (BenderBendingRodriguez) Feature #306379, revision 6 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. + #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) + Hmm, i guess packman wouldn't implement that only for me ;) Though it + would speed things up as it is a widely known and spoken that + refreshing the repo in openSUSE is slow. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Luc de Louw (delouw) Feature #306379, revision 7 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. + #4: Luc de Louw (delouw) (2009-07-04 18:21:31) + Why not rsync? Because it does not work with http(s). This is important + since in many companies the only way to get data from the internet is + via http proxy. + The GDIFF approach sounds promissing -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Roberto Mannai (robermann79) Feature #306379, revision 8 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing + #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) + For a "GDIFF on HTTP" implementation, see + http://www.w3.org/TR/NOTE-drp-19970825 -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Jan Engelhardt (jengelh) Feature #306379, revision 9 Title: Use rsync when refreshing repositories openSUSE-11.2: Unconfirmed Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 + #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) + Making use of rsync would bring zypper the checksumming, automatic + download resuming/repairing at no cost ;-) -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Andreas Jaeger (a_jaeger) Feature #306379, revision 10 Title: Use rsync when refreshing repositories - openSUSE-11.2: Unconfirmed + openSUSE-11.2: Evaluation Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) - I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Michael Löffler (michl19) Feature #306379, revision 11 Title: Use rsync when refreshing repositories - openSUSE-11.2: Evaluation + openSUSE-11.2: Rejected by Michael Löffler (michl19) + reject date: 2009-08-11 16:01:45 + reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important + openSUSE-11.3: Evaluation + Priority + Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Roberto Mannai (robermann79) Feature #306379, revision 12 Title: Use rsync when refreshing repositories openSUSE-11.2: Rejected by Michael Löffler (michl19) reject date: 2009-08-11 16:01:45 reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important openSUSE-11.3: Evaluation Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) + #7: Roberto Mannai (robermann79) (2009-11-30 21:58:19) (reply to #6) + "delouw" says that rsync does not support HTTP. This is a real blocking + problem. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Robert Davies (robopensuse) Feature #306379, revision 13 Title: Use rsync when refreshing repositories openSUSE-11.2: Rejected by Michael Löffler (michl19) reject date: 2009-08-11 16:01:45 reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important openSUSE-11.3: Evaluation Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) #7: Roberto Mannai (robermann79) (2009-11-30 21:58:19) (reply to #6) "delouw" says that rsync does not support HTTP. This is a real blocking problem. + #8: Robert Davies (robopensuse) (2009-11-30 22:23:41) + Why not make the transfered refresh file, by definition based on + deltas? First time through, you have current v empty file, the repo + can save a delta against the empty file & and monthly / weekly + changes, with delta's for changes made against those, then a refresh + can check for updates in last week if it has an current week file + (download only if exists), monthly if the weekly is out of date and + fall backto delta v empty ifle if local montly & weekly are both out of + date. Some sanity check, based on server's idea of date can prevent + clients getting things too horribly wrong. + Then most of the time it's a small file, that's easily cached for short + time; the monthly can be cached for longer with predictable TTL, and + current v empty could be cached for a day say. + I'm not sure about the repo format, but if delta handling involved + compression then wouldn't simple text format for the repo transfer + contents be natural and efficient, any binary file should likely be a + cache generated locally. + Though this sounds much more complicated, presumably there's tools for + generating the repo contents file, and processing the downloaded repo + file, so it ought not be so difficult in principal. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Jan Engelhardt (jengelh) Feature #306379, revision 16 Title: Use rsync when refreshing repositories openSUSE-11.2: Rejected by Michael Löffler (michl19) reject date: 2009-08-11 16:01:45 reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important openSUSE-11.3: Evaluation Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) #7: Roberto Mannai (robermann79) (2009-11-30 21:58:19) (reply to #6) "delouw" says that rsync does not support HTTP. This is a real blocking problem. + #9: Jan Engelhardt (jengelh) (2009-12-05 13:30:59) (reply to #7) + Even if the rsync *program* could do HTTP, it would not help you much, + because HTTP does not implement that rolling checksum and all the other + fluffy things of rsync. + Also, it seems obvious to me that use of rsync is an optional extra + feature that you can chose to ignore when refreshing your + repositories. #8: Robert Davies (robopensuse) (2009-11-30 22:23:41) Why not make the transfered refresh file, by definition based on deltas? First time through, you have current v empty file, the repo can save a delta against the empty file & and monthly / weekly changes, with delta's for changes made against those, then a refresh can check for updates in last week if it has an current week file (download only if exists), monthly if the weekly is out of date and fall backto delta v empty ifle if local montly & weekly are both out of date. Some sanity check, based on server's idea of date can prevent clients getting things too horribly wrong. Then most of the time it's a small file, that's easily cached for short time; the monthly can be cached for longer with predictable TTL, and current v empty could be cached for a day say. I'm not sure about the repo format, but if delta handling involved compression then wouldn't simple text format for the repo transfer contents be natural and efficient, any binary file should likely be a cache generated locally. Though this sounds much more complicated, presumably there's tools for generating the repo contents file, and processing the downloaded repo file, so it ought not be so difficult in principal. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Robert Davies (robopensuse) Feature #306379, revision 17 Title: Use rsync when refreshing repositories openSUSE-11.2: Rejected by Michael Löffler (michl19) reject date: 2009-08-11 16:01:45 reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important openSUSE-11.3: Evaluation Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) #7: Roberto Mannai (robermann79) (2009-11-30 21:58:19) (reply to #6) "delouw" says that rsync does not support HTTP. This is a real blocking problem. #9: Jan Engelhardt (jengelh) (2009-12-05 13:30:59) (reply to #7) Even if the rsync *program* could do HTTP, it would not help you much, because HTTP does not implement that rolling checksum and all the other fluffy things of rsync. Also, it seems obvious to me that use of rsync is an optional extra feature that you can chose to ignore when refreshing your repositories. + #10: Robert Davies (robopensuse) (2009-12-17 11:27:59) (reply to #6) + There would be a cost, but it would be born by Mirrors and their + Admins. Currently Mirrors can offer HTTP and trad. ftp, Mirror Brain + distributes downloads transparently and re-uses general cacheing + infrastructure. Gentoo have used a seperate infrastructure for rsync- + ed portage data, rather than the usual high bandwidth/storage + traditional mirrors because rsync support was niche. It would be rare + to enable the rsync checksum for daemon access on a public server, + because of the high CPU load of that feature, and potential for DOS + attack. + Implementing optionally, risks new ways for refresh to be slow eg) + rsync protocol requests are tried and dropped silently by uncooperative + firewalls. + If checksums & delta's are desirable additions for repo format, then a + more general solution which worked with HTTP would be better and + benefit more users and automatically re-use local proxy caches. #8: Robert Davies (robopensuse) (2009-11-30 22:23:41) Why not make the transfered refresh file, by definition based on deltas? First time through, you have current v empty file, the repo can save a delta against the empty file & and monthly / weekly changes, with delta's for changes made against those, then a refresh can check for updates in last week if it has an current week file (download only if exists), monthly if the weekly is out of date and fall backto delta v empty ifle if local montly & weekly are both out of date. Some sanity check, based on server's idea of date can prevent clients getting things too horribly wrong. Then most of the time it's a small file, that's easily cached for short time; the monthly can be cached for longer with predictable TTL, and current v empty could be cached for a day say. I'm not sure about the repo format, but if delta handling involved compression then wouldn't simple text format for the repo transfer contents be natural and efficient, any binary file should likely be a cache generated locally. Though this sounds much more complicated, presumably there's tools for generating the repo contents file, and processing the downloaded repo file, so it ought not be so difficult in principal. -- openSUSE Feature: https://features.opensuse.org/306379
Feature changed by: Christoph Thiel (cthiel1) Feature #306379, revision 18 Title: Use rsync when refreshing repositories openSUSE-11.2: Rejected by Michael Löffler (michl19) reject date: 2009-08-11 16:01:45 reject reason: too late for 11.2, moved to 11.3 Priority Requester: Important - openSUSE-11.3: Evaluation + openSUSE-11.3: Rejected by (cthiel1) + reject date: 2010-03-02 15:45:59 + reject reason: This feature is out of scope for 11.3. Priority Requester: Important Requested by: Piotrek Juzwiak (benderbendingrodriguez) Description: It would be a great idea to use rsync when refreshing repositories, one of the bad things is the refresh speed. It gets worse when people have many repositories. I'm not sure but it already compares if something has changed in the repo but to speed things up it would be great to use rsync. For example big repositories like Packman for example download every time the default 10 minutes are over (in zypp settings) while nothing great changes there. Discussion: #1: Roberto Mannai (robermann79) (2009-05-08 14:57:32) The best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901 (http://www.w3.org/TR/NOTE-gdiff-19970901) I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com (http://www.marimba.com/) ) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application. Other implementations are in PERL and RUBY: http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm (http://search.cpan.org/%7Egeoffr/Algorithm-GDiffDelta-0.01/GDiffDelta. pm) http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1... (http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-1...) An open source .NET (C#) implementation: http://gdiff.codeplex.com/ (http://gdiff.codeplex.com/) with MPL license I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images, or repositories information #2: Roberto Mannai (robermann79) (2009-05-08 15:08:01) (reply to #1) In your usecase, the repository could provide a GDIFF file of content metadata variation, the delta between two known "versions" of it in the time. #3: Piotrek Juzwiak (benderbendingrodriguez) (2009-06-18 23:16:55) Hmm, i guess packman wouldn't implement that only for me ;) Though it would speed things up as it is a widely known and spoken that refreshing the repo in openSUSE is slow. #4: Luc de Louw (delouw) (2009-07-04 18:21:31) Why not rsync? Because it does not work with http(s). This is important since in many companies the only way to get data from the internet is via http proxy. The GDIFF approach sounds promissing #5: Roberto Mannai (robermann79) (2009-07-04 18:39:56) (reply to #4) For a "GDIFF on HTTP" implementation, see http://www.w3.org/TR/NOTE-drp-19970825 #6: Jan Engelhardt (jengelh) (2009-07-05 15:23:26) Making use of rsync would bring zypper the checksumming, automatic download resuming/repairing at no cost ;-) #7: Roberto Mannai (robermann79) (2009-11-30 21:58:19) (reply to #6) "delouw" says that rsync does not support HTTP. This is a real blocking problem. #9: Jan Engelhardt (jengelh) (2009-12-05 13:30:59) (reply to #7) Even if the rsync *program* could do HTTP, it would not help you much, because HTTP does not implement that rolling checksum and all the other fluffy things of rsync. Also, it seems obvious to me that use of rsync is an optional extra feature that you can chose to ignore when refreshing your repositories. #10: Robert Davies (robopensuse) (2009-12-17 11:27:59) (reply to #6) There would be a cost, but it would be born by Mirrors and their Admins. Currently Mirrors can offer HTTP and trad. ftp, Mirror Brain distributes downloads transparently and re-uses general cacheing infrastructure. Gentoo have used a seperate infrastructure for rsync- ed portage data, rather than the usual high bandwidth/storage traditional mirrors because rsync support was niche. It would be rare to enable the rsync checksum for daemon access on a public server, because of the high CPU load of that feature, and potential for DOS attack. Implementing optionally, risks new ways for refresh to be slow eg) rsync protocol requests are tried and dropped silently by uncooperative firewalls. If checksums & delta's are desirable additions for repo format, then a more general solution which worked with HTTP would be better and benefit more users and automatically re-use local proxy caches. #8: Robert Davies (robopensuse) (2009-11-30 22:23:41) Why not make the transfered refresh file, by definition based on deltas? First time through, you have current v empty file, the repo can save a delta against the empty file & and monthly / weekly changes, with delta's for changes made against those, then a refresh can check for updates in last week if it has an current week file (download only if exists), monthly if the weekly is out of date and fall backto delta v empty ifle if local montly & weekly are both out of date. Some sanity check, based on server's idea of date can prevent clients getting things too horribly wrong. Then most of the time it's a small file, that's easily cached for short time; the monthly can be cached for longer with predictable TTL, and current v empty could be cached for a day say. I'm not sure about the repo format, but if delta handling involved compression then wouldn't simple text format for the repo transfer contents be natural and efficient, any binary file should likely be a cache generated locally. Though this sounds much more complicated, presumably there's tools for generating the repo contents file, and processing the downloaded repo file, so it ought not be so difficult in principal. -- openSUSE Feature: https://features.opensuse.org/306379
participants (1)
-
fate_noreply@suse.de