On Thu, Apr 03, 2008 at 01:07:26PM +0200, Jan Kupec wrote:
Therefore, the cost of parsing a mirror list per request seems absolutely reasonable to me, considering what the client can do with it.
It isn't much more work than parsing the HTTP redirect, anyway ;)
Nice. I wasn't aware this was true :O) Additionaly, you said in some other mail in this thread that the mirror list would be sorted and that libzypp just needs to take the first one if everything goes well and fall back to the next on error (maybe i overlooked this in your proposal on the wiki?). *That* sounds really good.
Exactly. I should update the proposal to make that clearer. (I'm doing so right now.)
- downloading a mirror list for files as small as $repo/media.1/media is pointless
I don't agree with this -- on the contrary, the client needs _all_ files for correct operation, and a way to fall back for each of them. It is independent of file size.
My point was that these files could always be fetched right from the donwloads.opensuse.org and never redirected/requested from mirror. The drawback would be that this wouldn't cover the outage of the downloads.o.o.
Ah, I see. That's true. In fact, the redirector is able to make an exception for those smaller files (ZrkadloMinSize), although we don't actually use it. For small files, it is as cheap to return the file than to look up mirrors in the database and return a redirect or a mirror list. Or cheaper. So far, ZrkadloMinSize is 0 in our setup, so this remains as headroom for further scalability. To be honest, I didn't use it so far, because I reckon that the consistency that the client is going to see might be marginally higher _without_ that exception. But thinking about it, we have reached a high level of correctness now, so we should be able to run with it just as fine (and should try it).
You may suggest to do all this on directory level. The problem though is manifold. [...] Given the additional info you mentioned, i agree.
Fine!
BTW: Checksumming *could* be done at lower level (with each file request) *if* the mirrorlist would be metalinks (http://metalinker.org), Interesting indeed. Something like this could replace the need to store checksums into the metadata.
I it could only supplement it. The metadata needs to be verifiable for other clients which are not metalink-enabled, so it needs to contain the checksums. And libzypp itself needs to be able to use it like today if the server doesn't reply with a metalink. The main advantage for libzypp would be that it is able to detect a "broken" transfer much earlier (and actually fix it already during download).
2) the feature is specific to downloads.opensuse.org (for now) - we would need to hardcode a is_download_opensuse_org condition to avoid useless requests for other URLs (or introduce a mechanism to query for availability of such capability and check that when starting zypp). This would not apply if the mirror list would be requested and processed only on errors.
It is not necessary to have such a hard-coded condition. The client can indicate (in the HTTP request) with an HTTP/1.1 Accept header that it is able to accept a mirror list, instead of file or a redirect to the file. (Older clients would continue to work.)
My idea was the other way around - the server would indicate that it can provide a mirror list. If not, the client would the old way of fetching files from that server throughout the session. Would this be possible?
Possible, yes, but since - HTTP is stateless and can be intercepted by intermediate caches - mirrorlists are valid only per file (don't forget this), - the client typically works with more than one repository, possibly hosted on different servers, - the server might have the _ability_ to send mirror lists, but it might not want to actually do that for files that the client will request (because it wants the client to deliver the file on its own, or there is no known mirror), I suggest to keep this per request. Not per session. It is most flexibel and also most simple in my opinion. Since the client is the one who initiates all communication, it also saves an additional request, if the client is the one who indicates its willingness to accept a mirror list. The client needs to be able to handle three possible cases: - 200 OK, Content-Type != application/mirrorlist: receive the file. - 200 OK, Content-Type == application/mirrorlist: follow first URL. - 302 Found: follow the Location header (standard redirect). That's assuming a healthy redirector. To handle a non-reachable redirector (or one returning garbage), it should: - in case of failure (timeout, garbage), use one of the cached baseurls Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development