[suse-mirror] uploading openSUSE 11.1
Hi, We're not there yet, but this afternoon I'll start uploading 11.1 to the stage server. Some numbers: 31G distribution/11.1/ (the deltas are not there yet, will add ~2G) 15G source/distribution/11.1/ 12G debug/distribution/11.1/ The public release is due on 18th. BTW: we considered moving the source/ and debug/ of 11.0 and lower into the subtrees too. Would this be fine with you? 11.0 is currently 62G and we would remove around the same amount as with 11.1 Greetings, Stephan -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hello, Stephan Kulow írta:
We're not there yet, but this afternoon I'll start uploading 11.1 to the stage server.
Some numbers: 31G distribution/11.1/ (the deltas are not there yet, will add ~2G) 15G source/distribution/11.1/ 12G debug/distribution/11.1/
Does this also include PPC, or that will be from powerpc.opensuse.org? Bye, CzP -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Am Donnerstag 11 Dezember 2008 schrieb Peter Czanik:
Hello,
Stephan Kulow írta:
We're not there yet, but this afternoon I'll start uploading 11.1 to the stage server.
Some numbers: 31G distribution/11.1/ (the deltas are not there yet, will add ~2G) 15G source/distribution/11.1/ 12G debug/distribution/11.1/
Does this also include PPC, or that will be from powerpc.opensuse.org?
powerpc.o.o stage is also prepared, but I'm not sure powerpc does the staging. I'm fine with releaseing the powerpc repos earlier. But yes, the ppc ISOS are part of the above 31G. Greetings, Stephan -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
On Thu, Dec 11, 2008 at 10:30:51AM +0100, Stephan Kulow wrote:
Am Donnerstag 11 Dezember 2008 schrieb Peter Czanik:
Hello,
Stephan Kulow írta:
We're not there yet, but this afternoon I'll start uploading 11.1 to the stage server.
Some numbers: 31G distribution/11.1/ (the deltas are not there yet, will add ~2G) 15G source/distribution/11.1/ 12G debug/distribution/11.1/
Does this also include PPC, or that will be from powerpc.opensuse.org?
powerpc.o.o stage is also prepared, but I'm not sure powerpc does the staging. I'm fine with releaseing the powerpc repos earlier. But yes, the ppc ISOS are part of the above 31G.
Greetings, Stephan
powerpc.o.o doesn't do staging. We should be fine without I guess. By the way, at some point in the future I could set up a redirector on powerpc for the mirrors that do mirror it. (Could be a nice playground for the next generation redirector. :-) Peter -- Contact: admin@opensuse.org (a.k.a. ftpadmin@suse.com) #opensuse-mirrors on freenode.net Info: http://en.opensuse.org/Mirror_Infrastructure SUSE LINUX Products GmbH Research & Development
Hi, On Thu, 11 Dec 2008, Stephan Kulow wrote:
We're not there yet, but this afternoon I'll start uploading 11.1 to the stage server.
Some numbers: 31G distribution/11.1/ (the deltas are not there yet, will add ~2G) 15G source/distribution/11.1/ 12G debug/distribution/11.1/
The public release is due on 18th.
BTW: we considered moving the source/ and debug/ of 11.0 and lower into the subtrees too. Would this be fine with you? 11.0 is currently 62G and we would remove around the same amount as with 11.1
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old. Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard! Peter -- Contact: admin@opensuse.org (a.k.a. ftpadmin@suse.com) #opensuse-mirrors on freenode.net Info: http://en.opensuse.org/Mirror_Infrastructure SUSE LINUX Products GmbH Research & Development
Hi Peter, On Fri, 12 Dec 2008, Peter Poeml wrote:
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard!
Why is the PPC distribution "double-published"? At least repo it seems, and: with "too open" permissions. Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
Hi Peter,
On Fri, 12 Dec 2008, Peter Poeml wrote:
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard!
Why is the PPC distribution "double-published"? At least repo it seems, and: with "too open" permissions.
Not sure what you mean with double-published? And for the "too open", see my mail to Peter C. - powerpc.opensuse doesn't do the staging for the repos as it's currently the only hoster, no redirector involved. Greetings, Stephan -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi, On Fri, 12 Dec 2008, Stephan Kulow wrote:
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
On Fri, 12 Dec 2008, Peter Poeml wrote:
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard!
Why is the PPC distribution "double-published"? At least repo it seems, and: with "too open" permissions.
Not sure what you mean with double-published? And for the "too open", see my mail to Peter C. - powerpc.opensuse doesn't do the staging for the repos as it's currently the only hoster, no redirector involved.
OK, I see the ppc repo is not with the other repos, and all isos are gone again. Please give a note when the new isos will come. Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
Hi,
On Fri, 12 Dec 2008, Stephan Kulow wrote:
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
On Fri, 12 Dec 2008, Peter Poeml wrote:
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard!
Why is the PPC distribution "double-published"? At least repo it seems, and: with "too open" permissions.
Not sure what you mean with double-published? And for the "too open", see my mail to Peter C. - powerpc.opensuse doesn't do the staging for the repos as it's currently the only hoster, no redirector involved.
OK, I see the ppc repo is not with the other repos, and all isos are gone again.
Please give a note when the new isos will come.
Huh? What isos are gone? You scare me. I didn't touch ISOs after announcement, only small number of deltas are missing. Greetings, Stephan -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi, On Fri, 12 Dec 2008, Stephan Kulow wrote:
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
On Fri, 12 Dec 2008, Stephan Kulow wrote:
Am Freitag 12 Dezember 2008 schrieb Eberhard Moenkeberg:
On Fri, 12 Dec 2008, Peter Poeml wrote:
On Fri, Dec 12, 2008 at 01:14:15AM +0100, Eberhard Moenkeberg wrote:
To support stage, ftp5.gwdg.de has the rsync module os111 with the old (known from older distributions) user/pass authentification. I will not repost it here, so if you are new, ask an old.
Thank you, Eberhard!
Why is the PPC distribution "double-published"? At least repo it seems, and: with "too open" permissions.
Not sure what you mean with double-published? And for the "too open", see my mail to Peter C. - powerpc.opensuse doesn't do the staging for the repos as it's currently the only hoster, no redirector involved.
OK, I see the ppc repo is not with the other repos, and all isos are gone again.
Please give a note when the new isos will come.
Huh? What isos are gone? You scare me. I didn't touch ISOs after announcement, only small number of deltas are missing.
Sorry, my error. But something has gone i think - size is shrinked to 34 GB, i think it was more yesterday. Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Hi All, Since we should all be properly synced by now and release is only 3 days ahead, I have a quick question. At what time you're planning to bitflip? I'd like to shift my rsync schedule to be about 15-30 minutes after that. My current rsync is 1:23 AM. Regards, -- Jaroslaw Zachwieja Centre for Scientific Computing, University of Warwick, UK -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi, afaik there is no rsync module for the opensuse sources on stage.opensuse.org Am I the only one who miss it much? I'd hate to set up an ftp mirror these days. Please give us an rsync module. In my opinion adding the sources to the opensuse-full modul would be best, but an additional opensuse-sources module would be just as fine. Thanks a lot, Bing -- Bernd 'Bing' Leibing KIZ Infrastruktur, University of Ulm, Germany Email: <bernd.leibing@uni-ulm.de> Tel. 0731-50-22516 Homepage: http://www.uni-ulm.de/~leibing O26/5215 -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi Bernd, On Tue, Feb 10, 2009 at 04:19:58PM +0100, Bernd Leibing wrote:
afaik there is no rsync module for the opensuse sources on stage.opensuse.org
Yes, true. And the public rsync server doesn't have the tree at all.
Am I the only one who miss it much?
There was one other person asking for it around Christmas, and I didn't get around to set up an rsync module yet. Sorry to everyone who has been waiting, or looking around and didn't find anything. I pointed someone to hosteurope.de, which mirror the sources (rsync URL listed on http://mirrors.opensuse.org/list/11.1.html). The reason that hosteurope has the sources is that they use an old rsync module that I wanted to depreciate, but I'm in fact quite glad that these things are available in *some* place. Hosteurope also does us the favour of storing discontinued trees that we can't archive to space reasons ourselves. (Thanks, Tobi!) Note that the /source tree contains only the sources of 11.1, because we moved them there recently (splitting them from the main file tree, just as the debug packages). The intent is to keep this kind of content, which is infrequently used, away from normal mirrors, because it is obvious that we won't have many mirrors when our rsync module is a TB in size. I would think that one mirror carrying the /source tree per continent would be appropriate. Everything is a pure waste of space. What we need is *many* mirrors for the popular files.
I'd hate to set up an ftp mirror these days. Please give us an rsync module. In my opinion adding the sources to the opensuse-full modul would be best, but an additional opensuse-sources module would be just as fine.
I see no need to mirror 26G around the world, for something that is hardly every downloaded. Thus, I won't add it to the opensuse-full module. In addition, /source contains only one release so far, so it'll grow to about the 4-fold during the next two years. And in addition, source rpms are becoming less and less important, and less practical also. It is much more convenient to check out sources from the openSUSE build service, and do reproducible builds with that. (Anyone can do that, an account on build.opensuse.org is easy to get, and 12.000 people did this. The sources are exactly the same, and they can be checked out in versioned form, being worked on and contributions being submitted back to the Factory tree via the build service.) The time of source rpms is largely over, so to speak. Of course they are sometimes useful, but it's not so frequent that it warrants to mirror them around the world. And an rsync module for /source should be public, I guess - there is not point in staging it, nor in restricting access to it to registered mirrors (except in order to protect our bandwidth maybe). So I would rather have the rsync module on rsync.opensuse.org than on stage.opensuse.org, which would be more obvious to find, but there we don't have the tree, and space is limited. So, all in all, an opensuse-source module on stage.opensuse.org without restrictions seems to be the best thing to do for now - right? I would like to hear the opinions of all of you. In addition, /source contains /source/factory, which again is a candidate to be excluded I guess. Thanks! Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development
Hi Peter, On Tue, 10 Feb 2009, Peter Poeml wrote:
On Tue, Feb 10, 2009 at 04:19:58PM +0100, Bernd Leibing wrote:
afaik there is no rsync module for the opensuse sources on stage.opensuse.org
Yes, true. And the public rsync server doesn't have the tree at all.
Am I the only one who miss it much?
There was one other person asking for it around Christmas, and I didn't get around to set up an rsync module yet. Sorry to everyone who has been waiting, or looking around and didn't find anything.
It was me.
I pointed someone to hosteurope.de, which mirror the sources (rsync URL listed on http://mirrors.opensuse.org/list/11.1.html). The reason that hosteurope has the sources is that they use an old rsync module that I wanted to depreciate, but I'm in fact quite glad that these things are available in *some* place. Hosteurope also does us the favour of storing discontinued trees that we can't archive to space reasons ourselves. (Thanks, Tobi!)
I like to know that old module name too. discontinued is even more important for me than source. To get everything within one module would have the benefit that all moves to discontinued would not produce re-tanansmits - it can be done with two steps: first hardlinking, later deleting, and rsync with "-H" would understand it.
Note that the /source tree contains only the sources of 11.1, because we moved them there recently (splitting them from the main file tree, just as the debug packages). The intent is to keep this kind of content, which is infrequently used, away from normal mirrors, because it is obvious that we won't have many mirrors when our rsync module is a TB in size.
I would think that one mirror carrying the /source tree per continent would be appropriate. Everything is a pure waste of space.
What we need is *many* mirrors for the popular files.
I'd hate to set up an ftp mirror these days. Please give us an rsync module. In my opinion adding the sources to the opensuse-full modul would be best, but an additional opensuse-sources module would be just as fine.
I see no need to mirror 26G around the world, for something that is hardly every downloaded. Thus, I won't add it to the opensuse-full module.
In addition, /source contains only one release so far, so it'll grow to about the 4-fold during the next two years.
And in addition, source rpms are becoming less and less important, and less practical also. It is much more convenient to check out sources from the openSUSE build service, and do reproducible builds with that. (Anyone can do that, an account on build.opensuse.org is easy to get, and 12.000 people did this. The sources are exactly the same, and they can be checked out in versioned form, being worked on and contributions being submitted back to the Factory tree via the build service.) The time of source rpms is largely over, so to speak. Of course they are sometimes useful, but it's not so frequent that it warrants to mirror them around the world.
And an rsync module for /source should be public, I guess - there is not point in staging it, nor in restricting access to it to registered mirrors (except in order to protect our bandwidth maybe). So I would rather have the rsync module on rsync.opensuse.org than on stage.opensuse.org, which would be more obvious to find, but there we don't have the tree, and space is limited.
So, all in all, an opensuse-source module on stage.opensuse.org without restrictions seems to be the best thing to do for now - right? I would like to hear the opinions of all of you.
In addition, /source contains /source/factory, which again is a candidate to be excluded I guess.
If you give us separate module names for source, debug and discontinued - guess what we will do? "Naturally" we will place our copies below /pub/opensuse/ - so joe user who likes to get one of "many mirrors" would have the need of excluding unwanted directories anyways. Better you give hints for the use of rsync's exclude option and some examples with sizes. I would not mind if you would tell tlat in rsync.motd... stage.opensuse.org should have it all, have it all in place, and give access to the topmost point. Everything else is a try to serve virtual fools, but only creating one more real fool. ;-)) Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Dipl.-Kfm. Markus Hoppe Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Peter Poeml wrote:
Hi Bernd,
On Tue, Feb 10, 2009 at 04:19:58PM +0100, Bernd Leibing wrote:
afaik there is no rsync module for the opensuse sources on stage.opensuse.org
Yes, true. And the public rsync server doesn't have the tree at all.
Am I the only one who miss it much?
There was one other person asking for it around Christmas, and I didn't get around to set up an rsync module yet. Sorry to everyone who has been waiting, or looking around and didn't find anything.
I pointed someone to hosteurope.de, which mirror the sources (rsync URL listed on http://mirrors.opensuse.org/list/11.1.html). The reason that hosteurope has the sources is that they use an old rsync module that I wanted to depreciate, but I'm in fact quite glad that these things are available in *some* place. Hosteurope also does us the favour of storing discontinued trees that we can't archive to space reasons ourselves. (Thanks, Tobi!)
I noticed that myself, but they have no rsync module for the opensuse content. Tobi please create an opensuse rsync module.
Note that the /source tree contains only the sources of 11.1, because we moved them there recently (splitting them from the main file tree, just as the debug packages). The intent is to keep this kind of content, which is infrequently used, away from normal mirrors, because it is obvious that we won't have many mirrors when our rsync module is a TB in size.
Agreed, the splitting was indeed needed.
I would think that one mirror carrying the /source tree per continent would be appropriate. Everything is a pure waste of space.
Most users don't care for the sources, but some do. And it should be accessible as easy and reliable as the binary packages. Open Source !
What we need is *many* mirrors for the popular files.
I'd hate to set up an ftp mirror these days. Please give us an rsync module. In my opinion adding the sources to the opensuse-full modul would be best, but an additional opensuse-sources module would be just as fine.
I see no need to mirror 26G around the world, for something that is hardly every downloaded. Thus, I won't add it to the opensuse-full module.
OK
In addition, /source contains only one release so far, so it'll grow to about the 4-fold during the next two years.
Sources for 11.0 use 11GB disk space, 4-fold that amount would be ok for some mirrors
And in addition, source rpms are becoming less and less important, and less practical also. It is much more convenient to check out sources from the openSUSE build service, and do reproducible builds with that. (Anyone can do that, an account on build.opensuse.org is easy to get, and 12.000 people did this. The sources are exactly the same, and they can be checked out in versioned form, being worked on and contributions being submitted back to the Factory tree via the build service.) The time of source rpms is largely over, so to speak. Of course they are sometimes useful, but it's not so frequent that it warrants to mirror them around the world.
Working with the sources may be easier (= better) with the openSUSE build service. But mirroring is also a matter of trust. If you have independent mirrors, taking down of a site is much less problematic. Oh, by the way, I tried to get the 11.1 cups sources via http://software.opensuse.org/search I got the address: http://download.opensuse.org/repositories/openSUSE:/11.1/standard/src/cups-1... but got error 404 when I tried to download it ....
And an rsync module for /source should be public, I guess - there is not point in staging it, nor in restricting access to it to registered mirrors (except in order to protect our bandwidth maybe). So I would rather have the rsync module on rsync.opensuse.org than on stage.opensuse.org, which would be more obvious to find, but there we don't have the tree, and space is limited.
Too bad, rsync.opensuse.org is the most obvios place
So, all in all, an opensuse-source module on stage.opensuse.org without restrictions seems to be the best thing to do for now - right? I would like to hear the opinions of all of you.
OK
In addition, /source contains /source/factory, which again is a candidate to be excluded I guess.
Arrghh, opensuse-source should be without factory, indeed.
Thanks! Peter
Thanks, Bernd -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi Eberhard & Bernd, and hi list, On Tue, Feb 10, 2009 at 10:05:51PM +0100, Eberhard Moenkeberg wrote:
I pointed someone to hosteurope.de, which mirror the sources (rsync URL listed on http://mirrors.opensuse.org/list/11.1.html). The reason that hosteurope has the sources is that they use an old rsync module that I wanted to depreciate, but I'm in fact quite glad that these things are available in *some* place. Hosteurope also does us the favour of storing discontinued trees that we can't archive to space reasons ourselves. (Thanks, Tobi!)
I like to know that old module name too. discontinued is even more important for me than source.
If you are after the discontinued stuff, the bad news is that we don't have disk space to keep it (we keep it *somewhere*, but we have no disk space to make it available), so there is actually no rsync module that contains anything of that kind.
If you give us separate module names for source, debug and discontinued - guess what we will do? "Naturally" we will place our copies below /pub/opensuse/ - so joe user who likes to get one of "many mirrors" would have the need of excluding unwanted directories anyways.
Better you give hints for the use of rsync's exclude option and some examples with sizes. I would not mind if you would tell tlat in rsync.motd...
stage.opensuse.org should have it all, have it all in place, and give access to the topmost point. Everything else is a try to serve virtual fools, but only creating one more real fool. ;-))
Yes, in a way this would make a lot things easier. I appreciate your thoughts. I lack the time to really do something about this right now, but I'll keep it noted. On Wed, Feb 11, 2009 at 04:57:29PM +0100, Bernd Leibing wrote:
Sources for 11.0 use 11GB disk space, 4-fold that amount would be ok for some mirrors
11.1 is 15.47 GB right now.
Working with the sources may be easier (= better) with the openSUSE build service. But mirroring is also a matter of trust. If you have independent mirrors, taking down of a site is much less problematic.
That's all true. (By the way, we also plan to make the build service accessible without login, but I don't know when this will happen.) So, I have created opensuse-source as rsync module at rsync://stage.opensuse.org/opensuse-source now. I can't create the same on the public server, rsync.opensuse.org, because of lack of disk space on that host. But I made the module on stage.opensuse.org publicly readable without restrictions.
Oh, by the way, I tried to get the 11.1 cups sources via http://software.opensuse.org/search
I got the address:
http://download.opensuse.org/repositories/openSUSE:/11.1/standard/src/cups-1...
but got error 404 when I tried to download it ....
That works meanwhile; those source packages are not really at that URL, and require a Rewrite at Apache level to point them to the /source tree. This works now, I fixed it a few weeks ago. Thanks! Peter -- Contact: admin@opensuse.org (a.k.a. ftpadmin@suse.com) #opensuse-mirrors on freenode.net Info: http://en.opensuse.org/Mirror_Infrastructure SUSE LINUX Products GmbH Research & Development
Peter Poeml (poeml@suse.de) wrote on 30 March 2009 17:02:
So, I have created opensuse-source as rsync module at rsync://stage.opensuse.org/opensuse-source now.
It's rather cumbersome to have to do separate syncs for parts of the same repository. Priority is not measured only by downloads, importance also counts, and sources are essential for free software. That's why it should be possible to mirror them together.
stage.opensuse.org should have it all, have it all in place, and give access to the topmost point.
Yes.
Yes, in a way this would make a lot things easier. I appreciate your thoughts. I lack the time to really do something about this right now, but I'll keep it noted.
One possibility that seems easy enough is to include sources in the larger modules, like full-with-factory, in stage. Looks like just removing some exclusions would do. Is it possible Peter? -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
On Mon, Mar 30, 2009 at 02:45:22PM -0300, Carlos Carvalho wrote:
Peter Poeml (poeml@suse.de) wrote on 30 March 2009 17:02:
So, I have created opensuse-source as rsync module at rsync://stage.opensuse.org/opensuse-source now.
It's rather cumbersome to have to do separate syncs for parts of the same repository.
Yes, I see that. I am not sure though if it would be better if there was only one rsync module for the entire tree, because, you would still need to set up different syncs, because there are parts of the tree that change frequently (updates) and other parts that change nearly never (released products). It wouldn't make sense to sync the released products every four hours, and in addition to that, we would not be able to deal with this, with our resources. Or do you have a different opinion? A trigger-based sync mechanism might be a way around this. I have some things in mind, and know some ways how other projects deal with this, but other than ideas there is not much resources to work on this. Maybe you have ideas to contribute on this, examples how others manage to do better, and such. Please continue to let me/us know.
Priority is not measured only by downloads, importance also counts, and sources are essential for free software. That's why it should be possible to mirror them together.
I might miss the context that you refer to, but what do you mean by "Priority" here? And "importance"? By the way, maybe you also followed the argument about the openSUSE build service for source access. It is the "modern", convenient and powerful way to access the sources, for openSUSE 11.1 onwards. We basically need source RPMs right now only for those people who don't know that yet, and because the build service doesn't allow anonymous access yet. People tend to be irritated when they don't find the source packages and think that something is wrong, even though it isn't. I checked the download numbers of today, and the handful of people that have downloaded a source package today is neglectable; so it doesn't make sense to put this stuff out to mirrors. Anyway, sources are sensitive issues; and the tree is available now for everybody who's interested!
stage.opensuse.org should have it all, have it all in place, and give access to the topmost point.
Yes.
Yes, in a way this would make a lot things easier. I appreciate your thoughts. I lack the time to really do something about this right now, but I'll keep it noted.
One possibility that seems easy enough is to include sources in the larger modules, like full-with-factory, in stage. Looks like just removing some exclusions would do. Is it possible Peter?
That's not possible without telling all mirrors that use that module that it suddenly contains the additional stuff, and giving them the opportunity to exclude it. This is especially true since the module is already overly large. And it would be not easy to achieve, because I have no contact address for many of the mirrors. Note, there is a LOT of stuff we can't just add to full-with-factory. The module would be 1.4 TB in size. There would hardly any mirrors interested in this. We mirror too much already; what helps openSUSE *users* is *many* mirrors mirroring what's popular, not a few mirrors that have "everything". I believe that this is one reason why Ubuntu has more mirrors than openSUSE. It is simply easier to find mirrors for 30GB than for 300GB. Please also note that an rsync over the full tree (even with -n) takes a LONG time, and we can't just make available the full tree in a single rsync module for pragmatical reasons. It would be simple enough to do, but the stage server would not be able to deal with it (and still be useful for what we need it for); it's just totally impractical for us. The direction that is most useful to develop, in my humble opinion, is a push sync, in the way we already have in the /repositories tree. The same I want to implement for /factory and /update, for which we would benefit most from that because these trees are short-lived, and sync lags could be avoided. Also, unneeded pull syncs can be avoided in that way (which cost resources too). The push sync could start feeding the mirrors with the changed content right after it was changed on stage.opensuse.org. Of course, this makes matters even more complicated than what you have now - syncing from separate rsync modules. So for this to work, we need to make it easy for everybody and provide a working, ready solution. One way could be to implement it in the same way we implemented the push for /repositories - by getting write access to the mirror. That makes it easiest for the mirrors and actually saves them work. A different way could be to provide ssh triggers, and provide scripts for mirroring that are prepared to deal with the triggers (and already know and use the modules). Thanks! Peter -- Contact: admin@opensuse.org (a.k.a. ftpadmin@suse.com) #opensuse-mirrors on freenode.net Info: http://en.opensuse.org/Mirror_Infrastructure SUSE LINUX Products GmbH Research & Development
Peter Poeml (poeml@suse.de) wrote on 30 March 2009 20:49:
On Mon, Mar 30, 2009 at 02:45:22PM -0300, Carlos Carvalho wrote:
It's rather cumbersome to have to do separate syncs for parts of the same repository.
Yes, I see that. I am not sure though if it would be better if there was only one rsync module for the entire tree, because, you would still need to set up different syncs, because there are parts of the tree that change frequently (updates) and other parts that change nearly never (released products). It wouldn't make sense to sync the released products every four hours, and in addition to that, we would not be able to deal with this, with our resources.
I agree that putting them in full-with-factory is not the best idea. However sources are not different from the rest: part doesn't change, part changes often, for example in factory. So update frequency is not a reason to separate them. How about creating another module: full-with-factory-and-sources? This way you'll be sure that only those who *really* want them will bother you. Module contents and size are not a problem if there is choice and explanation of the tree architecture. Choice allows mirrors to use a module that fits their interest directly; explanation allows them to use any module that has the contents they want and exclude what they don't want. Therefore there's no conflict between having many mirrors and much content; let the mirrors decide. And it's not rocket science, it's standard practice for most mirrors of all distributions, particularly for hardware architectures. About update frequency, I usually sync a release only once, when it appears, and never again, because they should NOT change. What do you mean by "nearly never"? Aren't ALL changes done in updates????? Anyway, if changes do happen they should be announced here. This separation between releases and the rest needs manual intervention only at release times and should be enough to avoid overloading stage.
A trigger-based sync mechanism might be a way around this. I have some things in mind, and know some ways how other projects deal with this, but other than ideas there is not much resources to work on this.
Perhaps the easiest and most effective way is a social one: mirror tiering. You chose the bigger, better connected and better managed mirrors spread around the world, and ask them a commitment in being tier-1 mirrors for opensuse. They'd need to have at least full-with-factory-and-sources [oh!! :-)], plus factory ppc, and allow public access via rsync. Only these would have access to stage, the others would use the tier-1s, so that you keep the crowd off your machine. Tiering would give you a solid distribution network without consuming your resources. This is what most distros do. There'd be no changes in using mirror brain to monitor all mirrors and sending clients to them. You could perhaps also count on the tier-1s to implement some of the technical methods below. In the context of reducing rsync load on the master, triggering is only useful if it avoids full rsyncs. The only way to avoid it is to deal with the changes only. This can be done in several ways. One is what kernel.org does, emailing only the changes. We use it here, keeps us very close indeed to the master with negligible load. Another possibility, as you say, is to have write access, which is equivalent to having an account on the machine. We also do it here; sourceforge is a very big example. They're very good at keeping a 10 times bigger-than-a-distro tree in sync with minimal load. A third method is to use rsync in a better way. We don't do disk scanning here when we update; only the master is hammered :-) However if you give me a list of files in your site, such as the one created by find or rsync localhost::a-[hidden]module-with-everything > filelist then we'll do *no* disk scans at *either* end, and only pull the necessary files. We do this for another distro... Even better, if you give a list of checksums we'll use it both for updating and for verifying that our repo is correct. We also do it with another distro. The disadvantage of this method is that it needs a complicated script, so mirrors are unlikely to use it. -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi Carlos & hi mirror list, On Tue, Mar 31, 2009 at 04:06:22PM -0300, Carlos Carvalho wrote:
Peter Poeml (poeml@suse.de) wrote on 30 March 2009 20:49: How about creating another module: full-with-factory-and-sources? This way you'll be sure that only those who *really* want them will bother you.
Yes, that makes a lot of sense. You are right, choice is good and it is easy to arrange. However, a module with "everything" in is something I always tried to get rid of, to avoid load problems and save resources. You are a very considerate person and would not abuse the module, but it isn't obvious to everybody in the first place how to use it sensibly. In the past we sometimes had too many syncs on the complete "everything" module. That module contained both stuf that is huge but never chages, and other stuff that's even more huge and changes constantly. Such a module can only be sensibly used when the mirror accesses parts of it. You can use a module named opensuse-full-really-everything which isn't publicly documented, but it's there; I can't remember exactly if it has always existed or if I made it at one point in time, but never announced it. (I think the former.) I'm sure that problems can be avoided by proper documentation and good guidance; and I'm looking forward to implement a proper sync framework which automates these issues (or helps at least), instead of letting everybody solve this on his own.
About update frequency, I usually sync a release only once, when it appears, and never again, because they should NOT change. What do you mean by "nearly never"? Aren't ALL changes done in updates????? Anyway, if changes do happen they should be announced here.
With "nearly never" I meant: yes, it isn't supposed to happen that released products change; you should not need to resync later. However, I have seen cases where we needed to correct some problem so there might actually have been a change. Yes, that should be announced. On the other hand, since we host the alphas, betas and final product in the same tree, so far we have most mirrors syncing from /distribution daily I assume. So a change would propagate to them.
This separation between releases and the rest needs manual intervention only at release times and should be enough to avoid overloading stage.
The distribution tree gives us less headache than the buildservice tree, which is much much larger, but yes, you are right, the distribution tree could be split into a short-lived part for daily syncing, and a frozen part which needs to sync never. (Alternative to "never" could be "weekly", which allows for correction of problems without notice, and cleanups.)
A trigger-based sync mechanism might be a way around this. I have some things in mind, and know some ways how other projects deal with this, but other than ideas there is not much resources to work on this.
Perhaps the easiest and most effective way is a social one: mirror tiering. You chose the bigger, better connected and better managed mirrors spread around the world, and ask them a commitment in being tier-1 mirrors for opensuse. They'd need to have at least full-with-factory-and-sources [oh!! :-)], plus factory ppc, and allow public access via rsync. Only these would have access to stage, the others would use the tier-1s, so that you keep the crowd off your machine. Tiering would give you a solid distribution network without consuming your resources. This is what most distros do. There'd be no changes in using mirror brain to monitor all mirrors and sending clients to them. You could perhaps also count on the tier-1s to implement some of the technical methods below.
Yes. It's amazing that we have survived, bandwidth-wise, without tiering for so long. An issue that I see with tiering is a little give-up of control, which doesn't matter much (knowing who syncs; discovering new mirrors that would go unnoticed otherwise), but some things could be a loss (the possibility of updating the mirror database tied to rsync log parsing, which is something that I wanted to explore a little further, but which is probably better done by some form of triggering or pushing anyway).
In the context of reducing rsync load on the master, triggering is only useful if it avoids full rsyncs. The only way to avoid it is to deal with the changes only. This can be done in several ways. One is what kernel.org does, emailing only the changes. We use it here, keeps us very close indeed to the master with negligible load.
Dealing with the changes only is the way to go IMO. That's what we do for the buildservice tree, and nothing else would scale there. And the problem exists in more places than just mirror syncing. For example, we have buildservice -> stage.opensuse.org -> mirrors and buildservice -> stage.opensuse.org -> rsync.opensuse.org -> mirrors Similarly, we have the same problem with updates (/update): buildsystem -> internal stage host -> stage.o.o -> mirrors buildsystem -> internal stage host -> stage.o.o -> rsync.o.o -> mirrors Same for factory, and so on. (tier 1 could be inserted there ;) At each of these stages, the sync needs to be timely (triggered pull or push), and not overlapping a sync at the stage before or after. At the same time, at each stage load can be wasted, or saved by avoiding unneeded syncs. At each of the stages, the order of files synced could matter (packages before repodata is better...). Once content reaches the end points (the mirrors), the mirror database should be updated simultaneously, or soon afterwards. I don't know details about how kernel.org detects changes (it may depend on their tree), but I have two things in mind for that: 1) a generic way to detect changes in a file tree based on the inotify mechanism. I'm thinking of a recursive directory watch which notices when changes occur, awaits the end of the changes (by waiting for a long enough period of time without further changes happening), thereby detecting the moment when an incoming sync is finished. Then it could trigger the next-stage mirror, or start a push sync to it. 2) a more specialized mechanism for too large trees where inotify doesn't scale (that's the case with the buildservice with its 70.000 directories). This mechanism is specific to the build service tree and its layout. I am not sure if 1) and 2) can be combined; but I'm sure that I would like to have something that is not a hacky local solution but rather something that's reusable enough for other people to use, and last but not least also for myself to easily install on other systems where a "organized" sync is needed. Our publishing of the factory tree and update tree would greatly benefit from this. With 2) (and the buildservice tree), we have a very tight integration with the mirror database and redirection now. If a new build shows up on download.opensuse.org, it's pushed out directly (well, free slots provided...); creation of md5 and sha1 hashes is done at the same time, and once a project has been pushed to a mirror, the files are entered into the mirror database right away, so download.opensuse.org starts redirection to the mirror right away. And the latter is really essential for us -- we may not even survive for an hour, when we can't push out stuff timely. And for /update, where we only have "random" pull syncs, we often see periods "without mirrors" which can keep download.o.o very busy (and that's slow for customers).
Another possibility, as you say, is to have write access, which is equivalent to having an account on the machine. We also do it here; sourceforge is a very big example. They're very good at keeping a 10 times bigger-than-a-distro tree in sync with minimal load.
Sourceforge is also very good in selecting what to actually sync to mirrors. That's also a direction that I want to explore (and talk to them if we can share something).
A third method is to use rsync in a better way. We don't do disk scanning here when we update; only the master is hammered :-) However if you give me a list of files in your site, such as the one created by find or
rsync localhost::a-[hidden]module-with-everything > filelist
then we'll do *no* disk scans at *either* end, and only pull the necessary files. We do this for another distro...
That's nice idea that I haven't looked into so far. I need to familiarize a bit with that. So far, I tended to believe that it wouldn't work for us; not for the large trees. Collecting the filelist locally is a job that takes hours, which makes it a hopeless undertake for a tree that changes every 30 seconds. Nice for trees that don't change I guess... Sounds doable if the filelists are collected not from the filesystem, but e.g. from the buildservice (which already knows what it updates).
Even better, if you give a list of checksums we'll use it both for updating and for verifying that our repo is correct. We also do it with another distro. The disadvantage of this method is that it needs a complicated script, so mirrors are unlikely to use it.
I have a local tree of checksums for most files, but unfortunately that tree has as many directories and as many files as the actual file tree. So collecting them is also much too slow for anything which looks at the complete tree. Maybe there are smarter ways to use them for something that I haven't thought about yet. Thank you very much for the suggestions and the time to write them up, this is much appreciated! Peter -- Contact: admin@opensuse.org (a.k.a. ftpadmin@suse.com) #opensuse-mirrors on freenode.net Info: http://en.opensuse.org/Mirror_Infrastructure SUSE LINUX Products GmbH Research & Development
Hi, On Fri, 3 Apr 2009, Peter Poeml wrote:
On Tue, Mar 31, 2009 at 04:06:22PM -0300, Carlos Carvalho wrote:
Peter Poeml (poeml@suse.de) wrote on 30 March 2009 20:49:
How about creating another module: full-with-factory-and-sources? This way you'll be sure that only those who *really* want them will bother you.
Yes, that makes a lot of sense. You are right, choice is good and it is easy to arrange.
However, a module with "everything" in is something I always tried to get rid of, to avoid load problems and save resources. You are a very considerate person and would not abuse the module, but it isn't obvious to everybody in the first place how to use it sensibly. In the past we sometimes had too many syncs on the complete "everything" module. That module contained both stuf that is huge but never chages, and other stuff that's even more huge and changes constantly. Such a module can only be sensibly used when the mirror accesses parts of it.
You can use a module named opensuse-full-really-everything which isn't publicly documented, but it's there; I can't remember exactly if it has always existed or if I made it at one point in time, but never announced it. (I think the former.)
Very good (long appreciated), but: receiving file list ... rsync: The server is configured to refuse --hard-links (-H) rsync error: requested action not supported (code 4) at clientserver.c(839) [sender=3.0.5] You should allow -H, especially at this "top level".
I'm sure that problems can be avoided by proper documentation and good guidance; and I'm looking forward to implement a proper sync framework which automates these issues (or helps at least), instead of letting everybody solve this on his own.
About update frequency, I usually sync a release only once, when it appears, and never again, because they should NOT change. What do you mean by "nearly never"? Aren't ALL changes done in updates????? Anyway, if changes do happen they should be announced here.
With "nearly never" I meant: yes, it isn't supposed to happen that released products change; you should not need to resync later. However, I have seen cases where we needed to correct some problem so there might actually have been a change. Yes, that should be announced.
On the other hand, since we host the alphas, betas and final product in the same tree, so far we have most mirrors syncing from /distribution daily I assume. So a change would propagate to them.
This separation between releases and the rest needs manual intervention only at release times and should be enough to avoid overloading stage.
The distribution tree gives us less headache than the buildservice tree, which is much much larger, but yes, you are right, the distribution tree could be split into a short-lived part for daily syncing, and a frozen part which needs to sync never. (Alternative to "never" could be "weekly", which allows for correction of problems without notice, and cleanups.)
A trigger-based sync mechanism might be a way around this. I have some things in mind, and know some ways how other projects deal with this, but other than ideas there is not much resources to work on this.
Perhaps the easiest and most effective way is a social one: mirror tiering. You chose the bigger, better connected and better managed mirrors spread around the world, and ask them a commitment in being tier-1 mirrors for opensuse. They'd need to have at least full-with-factory-and-sources [oh!! :-)], plus factory ppc, and allow public access via rsync. Only these would have access to stage, the others would use the tier-1s, so that you keep the crowd off your machine. Tiering would give you a solid distribution network without consuming your resources. This is what most distros do. There'd be no changes in using mirror brain to monitor all mirrors and sending clients to them. You could perhaps also count on the tier-1s to implement some of the technical methods below.
Yes. It's amazing that we have survived, bandwidth-wise, without tiering for so long.
No wonder. SUSE could from the beginning rely on the german university net structure (DFN). It just was appropriate to the needs of the university data centers...
An issue that I see with tiering is a little give-up of control, which doesn't matter much (knowing who syncs; discovering new mirrors that would go unnoticed otherwise), but some things could be a loss (the possibility of updating the mirror database tied to rsync log parsing, which is something that I wanted to explore a little further, but which is probably better done by some form of triggering or pushing anyway).
You already have this tiering, by tradition. The german DFN network always was (and still is) the "technological leader" in bandwidth, so since ever it is best choice to use a tier-1 mirror within the DFN.
In the context of reducing rsync load on the master, triggering is only useful if it avoids full rsyncs. The only way to avoid it is to deal with the changes only. This can be done in several ways. One is what kernel.org does, emailing only the changes. We use it here, keeps us very close indeed to the master with negligible load.
Dealing with the changes only is the way to go IMO.
YES, YES, YES! Peter, please extend the push service soon. factory and update would win most, but in essence, every rsync target can get speeded up with a push service.
That's what we do for the buildservice tree, and nothing else would scale there. And the problem exists in more places than just mirror syncing. For example, we have buildservice -> stage.opensuse.org -> mirrors and buildservice -> stage.opensuse.org -> rsync.opensuse.org -> mirrors
Similarly, we have the same problem with updates (/update): buildsystem -> internal stage host -> stage.o.o -> mirrors buildsystem -> internal stage host -> stage.o.o -> rsync.o.o -> mirrors
Same for factory, and so on.
(tier 1 could be inserted there ;)
At each of these stages, the sync needs to be timely (triggered pull or push), and not overlapping a sync at the stage before or after. At the same time, at each stage load can be wasted, or saved by avoiding unneeded syncs. At each of the stages, the order of files synced could matter (packages before repodata is better...).
A good point - Peter, you are the only one who could help to make this better...
Once content reaches the end points (the mirrors), the mirror database should be updated simultaneously, or soon afterwards.
With a full push service, no problem.
I don't know details about how kernel.org detects changes (it may depend on their tree), but I have two things in mind for that:
1) a generic way to detect changes in a file tree based on the inotify mechanism. I'm thinking of a recursive directory watch which notices when changes occur, awaits the end of the changes (by waiting for a long enough period of time without further changes happening), thereby detecting the moment when an incoming sync is finished. Then it could trigger the next-stage mirror, or start a push sync to it.
Can't you watch the incoming syncs by inspecting the logs?
2) a more specialized mechanism for too large trees where inotify doesn't scale (that's the case with the buildservice with its 70.000 directories). This mechanism is specific to the build service tree and its layout.
I'm sure you already _do_ have a good mechanism for the buildservice tree.
I am not sure if 1) and 2) can be combined; but I'm sure that I would like to have something that is not a hacky local solution but rather something that's reusable enough for other people to use, and last but not least also for myself to easily install on other systems where a "organized" sync is needed.
Our publishing of the factory tree and update tree would greatly benefit from this. With 2) (and the buildservice tree), we have a very tight integration with the mirror database and redirection now. If a new build shows up on download.opensuse.org, it's pushed out directly (well, free slots provided...); creation of md5 and sha1 hashes is done at the same time, and once a project has been pushed to a mirror, the files are entered into the mirror database right away, so download.opensuse.org starts redirection to the mirror right away. And the latter is really essential for us -- we may not even survive for an hour, when we can't push out stuff timely. And for /update, where we only have "random" pull syncs, we often see periods "without mirrors" which can keep download.o.o very busy (and that's slow for customers).
Yes, the buildservice pushes are working as expected! Please try to extend this mechanism to _everything_!
Another possibility, as you say, is to have write access, which is equivalent to having an account on the machine. We also do it here; sourceforge is a very big example. They're very good at keeping a 10 times bigger-than-a-distro tree in sync with minimal load.
Sourceforge is also very good in selecting what to actually sync to mirrors. That's also a direction that I want to explore (and talk to them if we can share something).
Good. But for the biggest tree you already have your solution.
A third method is to use rsync in a better way. We don't do disk scanning here when we update; only the master is hammered :-) However if you give me a list of files in your site, such as the one created by find or
rsync localhost::a-[hidden]module-with-everything > filelist
then we'll do *no* disk scans at *either* end, and only pull the necessary files. We do this for another distro...
That's nice idea that I haven't looked into so far. I need to familiarize a bit with that. So far, I tended to believe that it wouldn't work for us; not for the large trees. Collecting the filelist locally is a job that takes hours, which makes it a hopeless undertake for a tree that changes every 30 seconds. Nice for trees that don't change I guess...
A full directory scan ("ls -lR") at ftp5.gwdg.de (7.7 TB, 11.5 million files) needs 38 minutes - can you add some more RAM to your machine? ftp5 has 32 GB, and the only bottleneck is network. The filespace is LVM'ed over a single Transtec/Infortrend raid array with two controllers (each serving a raid6 set of 11 disks). Filesystem type: xfs.
Sounds doable if the filelists are collected not from the filesystem, but e.g. from the buildservice (which already knows what it updates).
Surely the best way.
Even better, if you give a list of checksums we'll use it both for updating and for verifying that our repo is correct. We also do it with another distro. The disadvantage of this method is that it needs a complicated script, so mirrors are unlikely to use it.
I have a local tree of checksums for most files, but unfortunately that tree has as many directories and as many files as the actual file tree. So collecting them is also much too slow for anything which looks at the complete tree. Maybe there are smarter ways to use them for something that I haven't thought about yet.
In my experience, xfs is about "double fast" against ext3 regarding "ls -lR" with big directories. Viele Grüße Eberhard Mönkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Mönkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Fassberg 11, 37077 Göttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschäftsführer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Dipl.-Kfm. Markus Hoppe Sitz der Gesellschaft: Göttingen Registergericht: Göttingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Eberhard Moenkeberg (emoenke@gwdg.de) wrote on 3 April 2009 22:49:
On Fri, 3 Apr 2009, Peter Poeml wrote:
You can use a module named opensuse-full-really-everything which isn't publicly documented, but it's there; I can't remember exactly if it has always existed or if I made it at one point in time, but never announced it. (I think the former.)
Very good (long appreciated), but:
receiving file list ... rsync: The server is configured to refuse --hard-links (-H) rsync error: requested action not supported (code 4) at clientserver.c(839) [sender=3.0.5]
You should allow -H, especially at this "top level".
Certainly. It shows that it was indeed not done by Peter, because he obviously knows much better :-) I'm waiting for this to be fixed to pull the sources without having to do a separate sync.
Perhaps the easiest and most effective way is a social one: mirror tiering. You chose the bigger, better connected and better managed mirrors spread around the world, and ask them a commitment in being tier-1 mirrors for opensuse. They'd need to have at least full-with-factory-and-sources [oh!! :-)], plus factory ppc, and allow public access via rsync. Only these would have access to stage, the others would use the tier-1s, so that you keep the crowd off your machine. Tiering would give you a solid distribution network without consuming your resources. This is what most distros do. There'd be no changes in using mirror brain to monitor all mirrors and sending clients to them. You could perhaps also count on the tier-1s to implement some of the technical methods below.
Yes. It's amazing that we have survived, bandwidth-wise, without tiering for so long.
No wonder. SUSE could from the beginning rely on the german university net structure (DFN). It just was appropriate to the needs of the university data centers...
An issue that I see with tiering is a little give-up of control, which doesn't matter much (knowing who syncs; discovering new mirrors that would go unnoticed otherwise), but some things could be a loss (the possibility of updating the mirror database tied to rsync log parsing, which is something that I wanted to explore a little further, but which is probably better done by some form of triggering or pushing anyway).
I think this is unrelated. You can continue to request rsync access for the scanner from all mirrors, as you do now.
You already have this tiering, by tradition. The german DFN network always was (and still is) the "technological leader" in bandwidth, so since ever it is best choice to use a tier-1 mirror within the DFN.
Sure. However this isn't enough; the idea is that only tier-1 mirrors sync from stage to reduce its load, so more, geographically spread, tier-1 mirrors are necessary. It seems that the one(s) in DFN are enough for Europe but it'd be convenient to have others in the other side of the Atlantic and in Asia and Australia.
At each of these stages, the sync needs to be timely (triggered pull or push), and not overlapping a sync at the stage before or after. At the same time, at each stage load can be wasted, or saved by avoiding unneeded syncs. At each of the stages, the order of files synced could matter (packages before repodata is better...).
A good point - Peter, you are the only one who could help to make this better...
This is feasible with mirrors that have good connection with stage and are well managed, that is tier-1's. If you want to be very fancy you'll have to manage your account on the mirrors, like sourceforge does. Some of the features above we already do. About packages before repodata, I think this is largely unnecessary with --delay-updates and --delete-delay or --delete-after (depending on the mirror rsync version).
I don't know details about how kernel.org detects changes (it may depend on their tree), but I have two things in mind for that:
1) a generic way to detect changes in a file tree based on the inotify mechanism. I'm thinking of a recursive directory watch which notices when changes occur, awaits the end of the changes (by waiting for a long enough period of time without further changes happening), thereby detecting the moment when an incoming sync is finished. Then it could trigger the next-stage mirror, or start a push sync to it.
Can't you watch the incoming syncs by inspecting the logs?
2) a more specialized mechanism for too large trees where inotify doesn't scale (that's the case with the buildservice with its 70.000 directories). This mechanism is specific to the build service tree and its layout.
The master tree should only be a repository for the tier-1 mirrors. When you change it you should know what has changed without having to scan it all or using inotify.
I'm sure you already _do_ have a good mechanism for the buildservice tree.
I am not sure if 1) and 2) can be combined; but I'm sure that I would like to have something that is not a hacky local solution but rather something that's reusable enough for other people to use
I disagree. This is highly specific to the process used to change the master repository. Each software distribution has its own.
Another possibility, as you say, is to have write access, which is equivalent to having an account on the machine. We also do it here; sourceforge is a very big example. They're very good at keeping a 10 times bigger-than-a-distro tree in sync with minimal load.
Sourceforge is also very good in selecting what to actually sync to mirrors. That's also a direction that I want to explore (and talk to them if we can share something).
I don't understand. You have to send what's in the master tree, period. I suppose you already know how to build the master tree...
A third method is to use rsync in a better way. We don't do disk scanning here when we update; only the master is hammered :-) However if you give me a list of files in your site, such as the one created by find or
rsync localhost::a-[hidden]module-with-everything > filelist
then we'll do *no* disk scans at *either* end, and only pull the necessary files. We do this for another distro...
That's nice idea that I haven't looked into so far. I need to familiarize a bit with that. So far, I tended to believe that it wouldn't work for us; not for the large trees. Collecting the filelist locally is a job that takes hours, which makes it a hopeless undertake for a tree that changes every 30 seconds. Nice for trees that don't change I guess...
As I said above, you shouldn't need to scan the whole tree; you should know what has changed from the internal update process, and patch the filelist. And you don't need to patch it every 30 seconds; you may decide that changes will only be visible to mirrors in larger intervals, like 5 or 15 minutes.
A full directory scan ("ls -lR") at ftp5.gwdg.de (7.7 TB, 11.5 million files) needs 38 minutes - can you add some more RAM to your machine? ftp5 has 32 GB, and the only bottleneck is network. The filespace is LVM'ed over a single Transtec/Infortrend raid array with two controllers (each serving a raid6 set of 11 disks). Filesystem type: xfs.
ls -lR does measure the time to scan the disk, so I agree with you. However, it's always useful to observe that it's format is very inadequate for mirroring. With the right options ls can be improved but it's always cumbersome. rsync or find are always better. ls -lR was used more than a decade ago...
Sounds doable if the filelists are collected not from the filesystem, but e.g. from the buildservice (which already knows what it updates).
Surely the best way.
Exactly.
Even better, if you give a list of checksums we'll use it both for updating and for verifying that our repo is correct. We also do it with another distro. The disadvantage of this method is that it needs a complicated script, so mirrors are unlikely to use it.
I have a local tree of checksums for most files, but unfortunately that tree has as many directories and as many files as the actual file tree.
This is nearly useless. Put them all in a single file.
Maybe there are smarter ways
The smarter way is to put them in a single file, like other distros do.
to use them for something that I haven't thought about yet.
The master has no use for them; it's just a nice service you offer to your dear mirrors :-) I'd certainly much appreciate to have them. In retribution, [some of] your dear mirrors wouldn't scan you all out at every update :-) -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
Hi! On Mon, Mar 30, 2009 at 05:02:24PM +0200, Peter Poeml wrote:
So, I have created opensuse-source as rsync module at rsync://stage.opensuse.org/opensuse-source now.
I can't create the same on the public server, rsync.opensuse.org, because of lack of disk space on that host. But I made the module on stage.opensuse.org publicly readable without restrictions.
Likewise, I created stage.opensuse.org::opensuse-debug today. We have no space for these packages on our public rsync server, but this module on stage.opensuse.org is publicly available. The size is 23G. I have excluded the *-test update repositories, as well as the 10.3 and 10.2 debug packages of released updates. Altogether it would be 70G, if someone wishes to mirror this we can of course add it. Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development
Hi, On Wed, 13 May 2009, Peter Poeml wrote:
On Mon, Mar 30, 2009 at 05:02:24PM +0200, Peter Poeml wrote:
So, I have created opensuse-source as rsync module at rsync://stage.opensuse.org/opensuse-source now.
I can't create the same on the public server, rsync.opensuse.org, because of lack of disk space on that host. But I made the module on stage.opensuse.org publicly readable without restrictions.
Likewise, I created stage.opensuse.org::opensuse-debug today.
We have no space for these packages on our public rsync server, but this module on stage.opensuse.org is publicly available.
The size is 23G. I have excluded the *-test update repositories, as well as the 10.3 and 10.2 debug packages of released updates. Altogether it would be 70G, if someone wishes to mirror this we can of course add it.
ftp5.gwdg.de will carry now the directories /pub/opensuse/debug/ /pub/opensuse/source/ so please exclude these from your rsync runs if you are using ftp5.gwdg.de (--exclude /debug/ --exclude /source/ if you are mirroring rsync://ftp5.gwdg.de/pub/opensuse/). Viele Gruesse Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org) -- Eberhard Moenkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke@gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen (GWDG) Am Fassberg 11, 37077 Goettingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschaeftsfuehrer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Dipl.-Kfm. Markus Hoppe Sitz der Gesellschaft: Goettingen Registergericht: Goettingen Handelsregister-Nr. B 598 ------------------------------------------------------------------------- -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org
participants (7)
-
Bernd Leibing
-
carlos@fisica.ufpr.br
-
Eberhard Moenkeberg
-
Jaroslaw Zachwieja
-
Peter Czanik
-
Peter Poeml
-
Stephan Kulow