Mailinglist Archive: zypp-devel (227 mails)
| < Previous | Next > |
Re: [zypp-devel] [SoC-student] libzypp HTTP download failover
- From: "Dr. Peter Poeml" <poeml@xxxxxxx>
- Date: Thu, 3 Apr 2008 14:02:35 +0200
- Message-id: <20080403120235.GM23636@xxxxxxx>
On Thu, Apr 03, 2008 at 01:07:26PM +0200, Jan Kupec wrote:
Exactly.
I should update the proposal to make that clearer. (I'm doing so right
now.)
Ah, I see.
That's true. In fact, the redirector is able to make an exception for
those smaller files (ZrkadloMinSize), although we don't actually use it.
For small files, it is as cheap to return the file than to look up
mirrors in the database and return a redirect or a mirror list. Or
cheaper.
So far, ZrkadloMinSize is 0 in our setup, so this remains as headroom
for further scalability. To be honest, I didn't use it so far, because I
reckon that the consistency that the client is going to see might be
marginally higher _without_ that exception. But thinking about it, we
have reached a high level of correctness now, so we should be able to
run with it just as fine (and should try it).
Fine!
I it could only supplement it. The metadata needs to be verifiable for
other clients which are not metalink-enabled, so it needs to contain the
checksums. And libzypp itself needs to be able to use it like today if
the server doesn't reply with a metalink.
The main advantage for libzypp would be that it is able to detect a
"broken" transfer much earlier (and actually fix it already during
download).
Possible, yes, but since
- HTTP is stateless and can be intercepted by intermediate caches
- mirrorlists are valid only per file (don't forget this),
- the client typically works with more than one repository, possibly
hosted on different servers,
- the server might have the _ability_ to send mirror lists, but it
might not want to actually do that for files that the client will
request (because it wants the client to deliver the file on its own,
or there is no known mirror),
I suggest to keep this per request. Not per session. It is most flexibel
and also most simple in my opinion.
Since the client is the one who initiates all communication, it also
saves an additional request, if the client is the one who indicates its
willingness to accept a mirror list.
The client needs to be able to handle three possible cases:
- 200 OK, Content-Type != application/mirrorlist: receive the file.
- 200 OK, Content-Type == application/mirrorlist: follow first URL.
- 302 Found: follow the Location header (standard redirect).
That's assuming a healthy redirector. To handle a non-reachable
redirector (or one returning garbage), it should:
- in case of failure (timeout, garbage), use one of the cached baseurls
Peter
--
"WARNING: This bug is visible to non-employees. Please be respectful!"
SUSE LINUX Products GmbH
Research & Development
Therefore, the cost of parsing a mirror list per request seems
absolutely reasonable to me, considering what the client can do with it.
It isn't much more work than parsing the HTTP redirect, anyway ;)
Nice. I wasn't aware this was true :O) Additionaly, you said in some
other mail in this thread that the mirror list would be sorted and that
libzypp just needs to take the first one if everything goes well and
fall back to the next on error (maybe i overlooked this in your proposal
on the wiki?). *That* sounds really good.
Exactly.
I should update the proposal to make that clearer. (I'm doing so right
now.)
- downloading a mirror list for files as small as $repo/media.1/media
is pointless
I don't agree with this -- on the contrary, the client needs _all_ files
for correct operation, and a way to fall back for each of them. It is
independent of file size.
My point was that these files could always be fetched right from the
donwloads.opensuse.org and never redirected/requested from mirror. The
drawback would be that this wouldn't cover the outage of the downloads.o.o.
Ah, I see.
That's true. In fact, the redirector is able to make an exception for
those smaller files (ZrkadloMinSize), although we don't actually use it.
For small files, it is as cheap to return the file than to look up
mirrors in the database and return a redirect or a mirror list. Or
cheaper.
So far, ZrkadloMinSize is 0 in our setup, so this remains as headroom
for further scalability. To be honest, I didn't use it so far, because I
reckon that the consistency that the client is going to see might be
marginally higher _without_ that exception. But thinking about it, we
have reached a high level of correctness now, so we should be able to
run with it just as fine (and should try it).
[...]You may suggest to do all this on directory level. The problem though is
manifold.
Given the additional info you mentioned, i agree.
Fine!
BTW: Checksumming *could* be done at lower level (with each fileInteresting indeed. Something like this could replace the need to store
request) *if* the mirrorlist would be metalinks (http://metalinker.org),
checksums into the metadata.
I it could only supplement it. The metadata needs to be verifiable for
other clients which are not metalink-enabled, so it needs to contain the
checksums. And libzypp itself needs to be able to use it like today if
the server doesn't reply with a metalink.
The main advantage for libzypp would be that it is able to detect a
"broken" transfer much earlier (and actually fix it already during
download).
2) the feature is specific to downloads.opensuse.org (for now)
- we would need to hardcode a is_download_opensuse_org condition to
avoid useless requests for other URLs (or introduce a mechanism to
query for availability of such capability and check that when
starting zypp). This would not apply if the mirror list would be
requested and processed only on errors.
It is not necessary to have such a hard-coded condition. The client can
indicate (in the HTTP request) with an HTTP/1.1 Accept header that it is
able to accept a mirror list, instead of file or a redirect to the file.
(Older clients would continue to work.)
My idea was the other way around - the server would indicate that it can
provide a mirror list. If not, the client would the old way of fetching
files from that server throughout the session. Would this be possible?
Possible, yes, but since
- HTTP is stateless and can be intercepted by intermediate caches
- mirrorlists are valid only per file (don't forget this),
- the client typically works with more than one repository, possibly
hosted on different servers,
- the server might have the _ability_ to send mirror lists, but it
might not want to actually do that for files that the client will
request (because it wants the client to deliver the file on its own,
or there is no known mirror),
I suggest to keep this per request. Not per session. It is most flexibel
and also most simple in my opinion.
Since the client is the one who initiates all communication, it also
saves an additional request, if the client is the one who indicates its
willingness to accept a mirror list.
The client needs to be able to handle three possible cases:
- 200 OK, Content-Type != application/mirrorlist: receive the file.
- 200 OK, Content-Type == application/mirrorlist: follow first URL.
- 302 Found: follow the Location header (standard redirect).
That's assuming a healthy redirector. To handle a non-reachable
redirector (or one returning garbage), it should:
- in case of failure (timeout, garbage), use one of the cached baseurls
Peter
--
"WARNING: This bug is visible to non-employees. Please be respectful!"
SUSE LINUX Products GmbH
Research & Development
| < Previous | Next > |