On Tue, Mar 06, 2007 at 04:17:07PM +0100, Marcus Rueckert wrote:
On 2007-03-02 15:14:43 +0100, Robert Schiele wrote:
Currently there is no way to link to a package built on the build service because if we do so this link will break on every rebuild since the release number will change.
Because of that I recommmend to automatically create symbolic links without the version number as we have in the update directories on ftp.suse.com. For example:
zypper.rpm -> zypper-0.6.15-0.1.i586.rpm
If we do so we can just link to http://software.opensuse.com/download/.../packagename.rpm. This link will never break due to an automatic rebuild.
but it i will make the redirector harder.
The redirector works fine with symlinks, as far as I can see. The scanner we use to update the redirector database simply ignores symlinks. The redirector canonicalizes every path before the database lookup. Thus, the database doesn't care about symlinks. We currenty have a lot of them. Most are in a "full-names-something" directory of some released products. Which may be interesting or not. Then, they are used in some convenience places like linking foo-current to something else, and the like. They could be replaced with Apache redirects, in some cases, I guess, but it would make them "invisible", which be against their purpose. Now, symlinks also start to occur now through the update/10.3/rpm tree, where we traditionally offer those "unversioned" symlinks, in exactly the way Robert proposes.
we aim for a redirector that doesnt need local file access anymore but works with a sql db as backend. to check what file is currently behind the symlink we would
But hoo... what a mess of work that would be :-) Veto. I neither aim, nor do I agree that we _should_ aim at such a setup, because I know that it'd be substantial amount of work. Here are my thoughts behind this: It would be required to come up with a database representing all needed data, and to maintain it, in a way that it stays pretty consistent with the backend file storage -- otherwise it'll only make things harder. A number of new tools will be needed to handle tasks which are now done by existing tools (like rsync). Of course, the whole matter _would_ be much easier if we would deal with a different kind of files, which doesn't change at such a fast pace, being so sensitive against small inconsistencies, at the same time. Like released distributions. But we have buildservice repositories and security updates, which keep on coming much faster than mirrors are fed. Running without file storage would btw prevent useful things like: * serving files which are on no mirror. Factory snapshots, for instance... * serving certain stuff (metadata) directly (thus, never redirecting) to prevent issues with inconsistent repositories and similar. * serving small files directly, because it is faster for us and for the clients than querying the database and doing redirection for them * debug and understand the system by someone else, in a reasonable time :-) I would compare the effort to reimplement the required functionality from scratch with, for instance, the Apache's mod_proxy* and mod_cache* rewrite. I watch this project since years, and it is interesting to see that it literally takes years to mature. HTTP may seem simple, but ...
need to read the link and than lookup the path to the real file in the DB. of course we could transfer that symlink -> file mapping into the DB aswell. but that would complicate the query. and for performance reasons i would like to keep the query count per request low (ideally 1, most likely it will be 2)
Performance-wise, I am pretty sure that everything which increases the complexity on the database end is prone to become a problem. I'm glad when we survive with what we have ;) Alas, when _I_ think about symlinks ;-) I have a totally different level of optimization in mind. I would rather like to get rid of them to avoid the necessity of the FollowSymlinks option, which requires Apache to stat all directories above the file. I _don't_ even want to think about multiple queries to the database, and things like that. We get by with a single query now (plus queries counting packages downloaded from the buildservice repositories). IMO, development time can be well spent in things like improved handling of failure conditions, and a better redirection scheme. There is still no way to maintain the mirror database than with a mysql commandline client. Some poor idiot has to live with that: me... We have a lot to do there. And: Synchronise repo pushes with rsync pull runs, so they don't stop on each other's toes. Implement fallback mirror redirection, mirror preference by network prefix, memcache query lookups for most freqently requested objects. Count redirects so we can analyze what's going on. Really, I don't see a major rewrite of the redirector and database backend feasible in the near future. High effort, low win. What would be the advantage, anyway, other than saving a few bucks for disks? I don't see it. Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development