Dr. Peter Poeml wrote:
It definitely needs discussion and further refinement -- that's why I posted it here -- and I'm thankful for your input.
I'm sorry i somehow overlooked this thread you started a month ago: http://lists.opensuse.org/zypp-devel/2008-03/msg00020.html
Just a quick thought. Two things that cross my mind are:
- the idea of downloading & parsing a mirror list for each file doesn't sound appealing to me.
Parsing the mirror list in the client is an affordable effort, in the context of a network-bound operation. Each file download involves an HTTP request anyway.
Just think that the download server itself is able to do the same a 1000 times per second. The client will never download more than a few files per second even from a local server.
Remember that a client's download request involves typically more than a simple HTTP request anyway. Typically it is a name lookup, HTTP request, which is parsed and typically consists of a HTTP redirect, which results in a second name lookup, and HTTP request.
What would be different to today is that the client chooses the mirror itself, instead of the server choosing it.
In addition, there is a number of reasons to do all this on file level. Only few mirrors are complete + up to date in all regards, and we are working with highly dynamic repositories (like KDE from the buildservice) as well as well with the classic, more static ones (like 10.3 repo). The static ones, that many of you guys are familiar with, are only one part of what we deal with today.
I could tell you pretty exactly which files have short turnaround times, and in which ways the client "breaks" if it gets outdated files (and therefore an inconsistent state. I have seen all the bugs resulting from it. And I have tuned the cache control headers which we server to take it into account.
Therefore, the cost of parsing a mirror list per request seems absolutely reasonable to me, considering what the client can do with it.
It isn't much more work than parsing the HTTP redirect, anyway ;)
Nice. I wasn't aware this was true :O) Additionaly, you said in some other mail in this thread that the mirror list would be sorted and that libzypp just needs to take the first one if everything goes well and fall back to the next on error (maybe i overlooked this in your proposal on the wiki?). *That* sounds really good.
I'm open to be convinced of anything else. And I'm grateful for your input!
- downloading a mirror list for files as small as $repo/media.1/media is pointless
I don't agree with this -- on the contrary, the client needs _all_ files for correct operation, and a way to fall back for each of them. It is independent of file size.
My point was that these files could always be fetched right from the donwloads.opensuse.org and never redirected/requested from mirror. The drawback would be that this wouldn't cover the outage of the downloads.o.o.
You may suggest to do all this on directory level. The problem though is manifold.
- not every mirror carries all parts of a repository. Think of a mirror that excludes debuginfos, ppc, or sources when mirroring. In fact, mirrors do that, will do it and must do that because our repositories are simply too large.
- repositories change over time -- and some do often. Only because rpm filenames change with each rebuild are we able to redirect for those at all. We would _not_ be able to redirect for the metadata at all -- and if fact we don't. There is no efficient way to make sure that we know when those files have been updated on a mirror.
- We like to keep file level requests to the download server because it gives us insight in repository usage (statistics)
In the presentation I gave on the FOSDEM I went into some more detail on this, and why it is important. http://www.poeml.de/~poeml/talks/redirector/
- it would be fine if the fetching of the mirror list happens only in case of error, BUT this is also not easy - an error can occur outside of the media back-end at various places (e.g. checksum failure is something which is handled outside of the media back-end
- in the Fetcher)
This is an interesting idea -- I need to think about it. I believe it would only make things more complicate.
Given the additional info you mentioned, i agree.
In addition, I believe we would lose some interesting possibilities that my proposal would give us. The idea is to save all base URLs (the part which points to the repository toplevel directory) would be saved by the client. Thereby it would accumulate a list of those base URLs. This can enable the client to try them autonomously, should the redirector itself be unreachable.
Being able to continue to work if the redirector can't be reached is an essential part of the proposal.
Your concern about the handling of checksums is valid and important.
I suggest that the client blacklists a mirror which returned a "broken" file, for the duration of the "session". (Every mirror has an ID and an identifier string, which could be attached to the locally cached object, which could be used for blacklisting the mirror on retrying.)
BTW: Checksumming *could* be done at lower level (with each file request) *if* the mirrorlist would be metalinks (http://metalinker.org), or have similar capabilities. Those contain checksums which can be used to ensure transfer integrity. I'm contemplating about adding metalink support to the redirector and whether that could be a way to achieve the goal we are discussing here. There is a number of clients out there which understand metalinks, and that would help for iso downloads just as well -- not only the specialized libzypp client.
This is an interesting area which calls for exploration.
Interesting indeed. Something like this could replace the need to store checksums into the metadata.
- the feature is specific to downloads.opensuse.org (for now)
- we would need to hardcode a is_download_opensuse_org condition to avoid useless requests for other URLs (or introduce a mechanism to query for availability of such capability and check that when starting zypp). This would not apply if the mirror list would be requested and processed only on errors.
It is not necessary to have such a hard-coded condition. The client can indicate (in the HTTP request) with an HTTP/1.1 Accept header that it is able to accept a mirror list, instead of file or a redirect to the file. (Older clients would continue to work.)
My idea was the other way around - the server would indicate that it can provide a mirror list. If not, the client would the old way of fetching files from that server throughout the session. Would this be possible? Cheers, jano
The server can then reply with a list to those clients that send that Accept header. The client will be able to tell by the MIME type if it got a mirror list or a file. (And of course it will still transparently follow redirects, regardless.)
-- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org