Mailinglist Archive: zypp-devel (227 mails)
| < Previous | Next > |
Re: [zypp-devel] [SoC-student] libzypp HTTP download failover
- From: Jan Kupec <jkupec@xxxxxxx>
- Date: Thu, 03 Apr 2008 13:07:26 +0200
- Message-id: <47F4BA6E.4070400@xxxxxxx>
Dr. Peter Poeml wrote:
I'm sorry i somehow overlooked this thread you started a month ago:
http://lists.opensuse.org/zypp-devel/2008-03/msg00020.html
Nice. I wasn't aware this was true :O) Additionaly, you said in some
other mail in this thread that the mirror list would be sorted and that
libzypp just needs to take the first one if everything goes well and
fall back to the next on error (maybe i overlooked this in your proposal
on the wiki?). *That* sounds really good.
My point was that these files could always be fetched right from the
donwloads.opensuse.org and never redirected/requested from mirror. The
drawback would be that this wouldn't cover the outage of the downloads.o.o.
Given the additional info you mentioned, i agree.
Interesting indeed. Something like this could replace the need to store
checksums into the metadata.
My idea was the other way around - the server would indicate that it can
provide a mirror list. If not, the client would the old way of fetching
files from that server throughout the session. Would this be possible?
Cheers,
jano
--
To unsubscribe, e-mail: zypp-devel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: zypp-devel+help@xxxxxxxxxxxx
It definitely needs discussion and further refinement -- that's why I
posted it here -- and I'm thankful for your input.
I'm sorry i somehow overlooked this thread you started a month ago:
http://lists.opensuse.org/zypp-devel/2008-03/msg00020.html
Just a quick thought. Two things that cross my mind are:
1) the idea of downloading & parsing a mirror list for each file doesn't
sound appealing to me.
Parsing the mirror list in the client is an affordable effort, in the
context of a network-bound operation. Each file download involves an
HTTP request anyway.
Just think that the download server itself is able to do the same a 1000
times per second. The client will never download more than a few files
per second even from a local server.
Remember that a client's download request involves typically more than a
simple HTTP request anyway. Typically it is a name lookup, HTTP request,
which is parsed and typically consists of a HTTP redirect, which results
in a second name lookup, and HTTP request.
What would be different to today is that the client chooses the mirror
itself, instead of the server choosing it.
In addition, there is a number of reasons to do all this on file level.
Only few mirrors are complete + up to date in all regards, and we are
working with highly dynamic repositories (like KDE from the
buildservice) as well as well with the classic, more static ones (like
10.3 repo). The static ones, that many of you guys are familiar with,
are only one part of what we deal with today.
I could tell you pretty exactly which files have short turnaround times,
and in which ways the client "breaks" if it gets outdated files (and
therefore an inconsistent state. I have seen all the bugs resulting from
it. And I have tuned the cache control headers which we server to take
it into account.
Therefore, the cost of parsing a mirror list per request seems
absolutely reasonable to me, considering what the client can do with it.
It isn't much more work than parsing the HTTP redirect, anyway ;)
Nice. I wasn't aware this was true :O) Additionaly, you said in some
other mail in this thread that the mirror list would be sorted and that
libzypp just needs to take the first one if everything goes well and
fall back to the next on error (maybe i overlooked this in your proposal
on the wiki?). *That* sounds really good.
I'm open to be convinced of anything else. And I'm grateful for your
input!
- downloading a mirror list for files as small as $repo/media.1/media
is pointless
I don't agree with this -- on the contrary, the client needs _all_ files
for correct operation, and a way to fall back for each of them. It is
independent of file size.
My point was that these files could always be fetched right from the
donwloads.opensuse.org and never redirected/requested from mirror. The
drawback would be that this wouldn't cover the outage of the downloads.o.o.
You may suggest to do all this on directory level. The problem though is
manifold.
- not every mirror carries all parts of a repository. Think of a mirror
that excludes debuginfos, ppc, or sources when mirroring. In fact,
mirrors do that, will do it and must do that because our repositories
are simply too large.
- repositories change over time -- and some do often. Only because rpm
filenames change with each rebuild are we able to redirect for those
at all. We would _not_ be able to redirect for the metadata at all --
and if fact we don't. There is no efficient way to make sure that we
know when those files have been updated on a mirror.
- We like to keep file level requests to the download server because it
gives us insight in repository usage (statistics)
In the presentation I gave on the FOSDEM I went into some more detail on
this, and why it is important.
http://www.poeml.de/~poeml/talks/redirector/
- it would be fine if the fetching of the mirror list happens only in
case of error, BUT this is also not easy - an error can occur
outside of the media back-end at various places (e.g. checksum
failure is something which is handled outside of the media back-end
- in the Fetcher)
This is an interesting idea -- I need to think about it. I believe it
would only make things more complicate.
Given the additional info you mentioned, i agree.
In addition, I believe we would lose some interesting possibilities that
my proposal would give us. The idea is to save all base URLs (the part
which points to the repository toplevel directory) would be saved by the
client. Thereby it would accumulate a list of those base URLs. This can
enable the client to try them autonomously, should the redirector itself
be unreachable.
Being able to continue to work if the redirector can't be reached is
an essential part of the proposal.
Your concern about the handling of checksums is valid and important.
I suggest that the client blacklists a mirror which returned a "broken"
file, for the duration of the "session". (Every mirror has an ID and an
identifier string, which could be attached to the locally cached object,
which could be used for blacklisting the mirror on retrying.)
BTW: Checksumming *could* be done at lower level (with each file
request) *if* the mirrorlist would be metalinks (http://metalinker.org),
or have similar capabilities. Those contain checksums which can be used
to ensure transfer integrity. I'm contemplating about adding metalink
support to the redirector and whether that could be a way to achieve the
goal we are discussing here. There is a number of clients out there
which understand metalinks, and that would help for iso downloads just
as well -- not only the specialized libzypp client.
This is an interesting area which calls for exploration.
Interesting indeed. Something like this could replace the need to store
checksums into the metadata.
2) the feature is specific to downloads.opensuse.org (for now)
- we would need to hardcode a is_download_opensuse_org condition to
avoid useless requests for other URLs (or introduce a mechanism to
query for availability of such capability and check that when
starting zypp). This would not apply if the mirror list would be
requested and processed only on errors.
It is not necessary to have such a hard-coded condition. The client can
indicate (in the HTTP request) with an HTTP/1.1 Accept header that it is
able to accept a mirror list, instead of file or a redirect to the file.
(Older clients would continue to work.)
My idea was the other way around - the server would indicate that it can
provide a mirror list. If not, the client would the old way of fetching
files from that server throughout the session. Would this be possible?
Cheers,
jano
The server can then reply with a list to those clients that send that
Accept header. The client will be able to tell by the MIME type if it
got a mirror list or a file. (And of course it will still transparently
follow redirects, regardless.)
--
To unsubscribe, e-mail: zypp-devel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: zypp-devel+help@xxxxxxxxxxxx
| < Previous | Next > |