Hi, since most mirrors (except gwdg) seemed to have a hard time syncing the betas, I have created a wiki page http://www.opensuse.org/How_to_mirror in the hope that this will help mirror admins to speed up syncing the openSUSE trees to a point where a full propagation takes less than 6 hours. Eberhard, did I forget anything important? Regards, Carl-Daniel
Hi, On Thu, 8 Sep 2005, Carl-Daniel Hailfinger wrote:
since most mirrors (except gwdg) seemed to have a hard time syncing the betas, I have created a wiki page
http://www.opensuse.org/How_to_mirror
in the hope that this will help mirror admins to speed up syncing the openSUSE trees to a point where a full propagation takes less than 6 hours.
Eberhard, did I forget anything important?
In order to get as fast as possible to the point where a local inst-source/ is usable, one could exclude the inst-source/suse/src/ directory too. Maybe you can create a "real" example scenario and add all the commands for it? BTW: http://ftp.gwdg.de/pub/linux/people/emoenke/rsync.suse_update is a sample rsync script which allows easy addition of exclusions. It has some more features like logging, mailing results and - most important - preserving the deleted files. Further it shows how to migrate a ftp mirroring to rsync mirroring without refetching all files due to wrong timestamps, by use of "--dry-run", "--size-only" and "--existing". Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)
Eberhard Moenkeberg schrieb:
Hi,
On Thu, 8 Sep 2005, Carl-Daniel Hailfinger wrote:
since most mirrors (except gwdg) seemed to have a hard time syncing the betas, I have created a wiki page
http://www.opensuse.org/How_to_mirror
in the hope that this will help mirror admins to speed up syncing the openSUSE trees to a point where a full propagation takes less than 6 hours.
Eberhard, did I forget anything important?
In order to get as fast as possible to the point where a local inst-source/ is usable, one could exclude the inst-source/suse/src/ directory too.
Done. Also excluded all debuginfo rpms and the suse/nosrc/ directory.
Maybe you can create a "real" example scenario and add all the commands for it?
BTW: http://ftp.gwdg.de/pub/linux/people/emoenke/rsync.suse_update
is a sample rsync script which allows easy addition of exclusions. It has some more features like logging, mailing results and - most important - preserving the deleted files. Further it shows how to migrate a ftp mirroring to rsync mirroring without refetching all files due to wrong timestamps, by use of "--dry-run", "--size-only" and "--existing".
I took a look at it and lifted a few ideas. However, --size-only is not needed when converting from ftp to rsync. rsync will use its sliding checksum algorithm (NOT the dreaded MD4 checksum) to compare the files and not redownload them. The script I wrote syncs in five stages with a flash cutover directly after each stage. That means users will only see the already complete stages and never have to fear to get incomplete files or an incomplete installation source. Stage 1: Delta-ISO Stage 2: all ISOs Stage 3: inst-source without *src.rpm and *debuginfo*.rpm Stage 4: inst-source-java without *src.rpm and *debuginfo*.rpm Stage 5 (optional): complete sync The flash cutover is imho the best feature of my script. And the fact that Delta-ISOs are synced first makes mirrors usable after transferring less than 5% of the overall content. After the script has passed my local tests I will make it available via http://www.opensuse.org/How_to_mirror Regards, Carl-Daniel
On Fri, Sep 09, 2005 at 03:13:46AM +0200, Carl-Daniel Hailfinger wrote:
I took a look at it and lifted a few ideas. However, --size-only is not needed when converting from ftp to rsync. rsync will use its sliding checksum algorithm (NOT the dreaded MD4 checksum) to compare the files and not redownload them.
Sure but this still forces the server to read the full file ftom disk. Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de
Hi, On Fri, 9 Sep 2005, Robert Schiele wrote:
On Fri, Sep 09, 2005 at 03:13:46AM +0200, Carl-Daniel Hailfinger wrote:
I took a look at it and lifted a few ideas. However, --size-only is not needed when converting from ftp to rsync. rsync will use its sliding checksum algorithm (NOT the dreaded MD4 checksum) to compare the files and not redownload them.
Sure but this still forces the server to read the full file ftom disk.
Exactly; it would force the server to behave like with "--checksum" even if the server has disabled it. There is a special situation when an ftp mirror converts to rsync: Via ftp, only crippled and/or "time zone shifted" time stamps are available, but rsync communicates true inode time stamp values. So here we "know" the file contents are equal but only the time stamps not. Best scenario for "--size-only". Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)
Eberhard Moenkeberg schrieb:
Hi,
On Fri, 9 Sep 2005, Robert Schiele wrote:
On Fri, Sep 09, 2005 at 03:13:46AM +0200, Carl-Daniel Hailfinger wrote:
I took a look at it and lifted a few ideas. However, --size-only is not needed when converting from ftp to rsync. rsync will use its sliding checksum algorithm (NOT the dreaded MD4 checksum) to compare the files and not redownload them.
Sure but this still forces the server to read the full file ftom disk.
Exactly; it would force the server to behave like with "--checksum" even if the server has disabled it.
Except that it needs fewer processor cycles. But I see your point about server IO bottlenecks.
There is a special situation when an ftp mirror converts to rsync: Via ftp, only crippled and/or "time zone shifted" time stamps are available, but rsync communicates true inode time stamp values. So here we "know" the file contents are equal but only the time stamps not. Best scenario for "--size-only".
Yes. My past experience has shown me that some ftp client/server combinations corrupt resumed downloads, that's why I don't use --size-only. To think again about it, a MD5SUMS.gz file covering every file in the tree would help checking against such problems. After that, rsync could be run with --size-only. This way the server would not suffer under additional load and the client could still verify the correctness of all files. My script has a few features which are only desirable if you do not convert from ftp. Thinking again, I'll add support for a MD5SUMS.gz file if there is any. That would combine the best of both worlds and keep the load on the server low. Regards, Carl-Daniel
On Fri, Sep 09, 2005 at 02:32:05PM +0200, Carl-Daniel Hailfinger wrote:
Eberhard Moenkeberg schrieb:
Exactly; it would force the server to behave like with "--checksum" even if the server has disabled it.
Except that it needs fewer processor cycles. But I see your point about server IO bottlenecks.
Why does it need fewer processor cycles? Have you ever seen a pure file server where processor usage is the limiting factor? Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de
Robert Schiele schrieb:
On Fri, Sep 09, 2005 at 02:32:05PM +0200, Carl-Daniel Hailfinger wrote:
Eberhard Moenkeberg schrieb:
Exactly; it would force the server to behave like with "--checksum" even if the server has disabled it.
Except that it needs fewer processor cycles. But I see your point about server IO bottlenecks.
Why does it need fewer processor cycles?
I assumed CRC32 needs less processor cycles than MD4, but you are right, this was an assumption and is likely to be wrong.
Have you ever seen a pure file server where processor usage is the limiting factor?
Yes, but that machine had a fast nic and the working set completely fit into RAM. Since ftp.gwdg.de (and probably all public mirror servers) does not have enough RAM to keep its working set completely in memory (and even if it had, it would probably be network-bound), --size-only is the way to go for syncing. So you're right on both counts. Thanks for correcting me, Carl-Daniel
Hi, On Fri, 9 Sep 2005, Robert Schiele wrote:
On Fri, Sep 09, 2005 at 02:32:05PM +0200, Carl-Daniel Hailfinger wrote:
Eberhard Moenkeberg schrieb:
Exactly; it would force the server to behave like with "--checksum" even if the server has disabled it.
Except that it needs fewer processor cycles. But I see your point about server IO bottlenecks.
Why does it need fewer processor cycles?
Have you ever seen a pure file server where processor usage is the limiting factor?
Yes; just think about Linux NFS... Or an FTP daemon which allows ASCII mode downloads or "get directory.tar.gz" tar-and-gz-on-the-fly... Or just rsync with --checksum. But I second your first question. Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)
participants (3)
-
Carl-Daniel Hailfinger
-
Eberhard Moenkeberg
-
Robert Schiele