Peter Poeml (poeml@suse.de) wrote on 30 March 2009 20:49:
On Mon, Mar 30, 2009 at 02:45:22PM -0300, Carlos Carvalho wrote:
It's rather cumbersome to have to do separate syncs for parts of the same repository.
Yes, I see that. I am not sure though if it would be better if there was only one rsync module for the entire tree, because, you would still need to set up different syncs, because there are parts of the tree that change frequently (updates) and other parts that change nearly never (released products). It wouldn't make sense to sync the released products every four hours, and in addition to that, we would not be able to deal with this, with our resources.
I agree that putting them in full-with-factory is not the best idea. However sources are not different from the rest: part doesn't change, part changes often, for example in factory. So update frequency is not a reason to separate them. How about creating another module: full-with-factory-and-sources? This way you'll be sure that only those who *really* want them will bother you. Module contents and size are not a problem if there is choice and explanation of the tree architecture. Choice allows mirrors to use a module that fits their interest directly; explanation allows them to use any module that has the contents they want and exclude what they don't want. Therefore there's no conflict between having many mirrors and much content; let the mirrors decide. And it's not rocket science, it's standard practice for most mirrors of all distributions, particularly for hardware architectures. About update frequency, I usually sync a release only once, when it appears, and never again, because they should NOT change. What do you mean by "nearly never"? Aren't ALL changes done in updates????? Anyway, if changes do happen they should be announced here. This separation between releases and the rest needs manual intervention only at release times and should be enough to avoid overloading stage.
A trigger-based sync mechanism might be a way around this. I have some things in mind, and know some ways how other projects deal with this, but other than ideas there is not much resources to work on this.
Perhaps the easiest and most effective way is a social one: mirror tiering. You chose the bigger, better connected and better managed mirrors spread around the world, and ask them a commitment in being tier-1 mirrors for opensuse. They'd need to have at least full-with-factory-and-sources [oh!! :-)], plus factory ppc, and allow public access via rsync. Only these would have access to stage, the others would use the tier-1s, so that you keep the crowd off your machine. Tiering would give you a solid distribution network without consuming your resources. This is what most distros do. There'd be no changes in using mirror brain to monitor all mirrors and sending clients to them. You could perhaps also count on the tier-1s to implement some of the technical methods below. In the context of reducing rsync load on the master, triggering is only useful if it avoids full rsyncs. The only way to avoid it is to deal with the changes only. This can be done in several ways. One is what kernel.org does, emailing only the changes. We use it here, keeps us very close indeed to the master with negligible load. Another possibility, as you say, is to have write access, which is equivalent to having an account on the machine. We also do it here; sourceforge is a very big example. They're very good at keeping a 10 times bigger-than-a-distro tree in sync with minimal load. A third method is to use rsync in a better way. We don't do disk scanning here when we update; only the master is hammered :-) However if you give me a list of files in your site, such as the one created by find or rsync localhost::a-[hidden]module-with-everything > filelist then we'll do *no* disk scans at *either* end, and only pull the necessary files. We do this for another distro... Even better, if you give a list of checksums we'll use it both for updating and for verifying that our repo is correct. We also do it with another distro. The disadvantage of this method is that it needs a complicated script, so mirrors are unlikely to use it. -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org For additional commands, e-mail: mirror+help@opensuse.org