[zypp-devel] Re: [SoC-student] Comments and new ideas about implementation of failover concept

27 May 2008

      Peter Poeml wrote:
...
...
Hello, good afternoon :-)
There are some new ideas and changes about implementation details of
the concept of failover [1].
...
...
Before, our idea was to modify MediaCurl class [2] to parse a list of
mirrors like this [3].
...
...
But now, new idea is to use only metalink files ([4] [5]) and don't
use anymore this type of mirror lists.
...
...
Libzypp could download files using an external program (ej. aria2c
[6]) to avoid 
I am not sure if it is a good idea. This would make sense only if you 
can reimplement all the progress callbacks, authorization callbacks, 
error handling and error callbacks MediaCurl right now provides.
In libzypp, it seems to be much more complicated.
It is, Media subsystem is not a simple backend. download.opensuse.org is 
an http only world. Media:: It is an abstraction layer for accesing 
http, ftp, iso, nfs, harddisks, local filesystem etc. Adding a http 
header is not even in the API!
...
Our question is:
How can we easily add a new handler (fetching files with an external
program) with as few intrusion into libzypp's media handling as
possible?
I would say the best way would be to reimplement MediaHandler just like 
MediaDISK and MediaCurl do. Then replace the code in MediaManager to use 
your new MediaHandler for http. However, as I said, you will need to 
take care of proxy, progress, etc. You may look into MediaCurl for 
examples on how to report that.
...
Is there a central place where one could hook into?
It would be great if it is possible to use e.g. aria2c as external
downloader, which already implements nearly everything that we need.
It meanwhile seems to me that implementing more stuff in libzypp will
not only reinvent the wheel in many regards but also increase the media
handling's complexity even further.
I also think so. However, it is aria2c a library or just a dummy command 
line tool?
...
An underlying assumption that I have is that there always is some
kind of package caching directory where files are downloaded (and used
later), so it wouldn't matter if the files are put there by libcurl or
by the external process. Is this assumption correct?
No.

There are multiple directories where files are copied once and over 
(because the layers). Media has always an attach point 
(MediaManager::localRoot() )

This is passed from the media manager to the specific media handler. 
Some implementaton omit them, for example, MediaDIR, which handles 
access to local files, just passes up url_r.getPathName() as the url is 
always a local path and there is no need to copy.

MediaDIR::MediaDIR( const Url &      url_r,
    			const Pathname & /*attach_point_hint_r*/ )
        : MediaHandler( url_r, url_r.getPathName(),

However this directory is only valid for the lifetime of the Media 
access. Once the media is closed the directory is deleted (not in the 
MediaDIR case).

The caching is handled at a different level. Right now the MediaCurl has 
support that, if the file it is downloading already exists in its attach 
point, then the right if-modified-since is used. That works if you 
download the file twice in the same session. However we don't use this 
feature, or in other words, usually no files are present when 
downloading files.

In the upper layer, MediaSetAccess handles different Media attachments 
for one url with different media numbers. And the Fetcher layer manages 
a queue of requests, taking care that either the complete queue is 
consistent or not.

Fetcher downloads to a tmp directory, and Media too (depending on the 
handler), then Media files are put into Fetcher's tmp directory. Fetcher 
takes care to look if there are files with same checksum in other 
Fetcher caches already. If yes, they get harlinked into Fetcher's tmp 
directory. At the end, the directories are swaped with the Fetcher 
target directory (which usually is a repo cache directory), so either 
the metadata is complete or not, there is no middle point.
...
...
Why this change?
- Because we think than we are reinventing the wheel in things like:
* Parse HTTP codes and act according to these.
      * To choose the fastest mirror.
How to implement?
I think we can do the implementation with some changes:
- Modify media/MediaAccess.cc [7] and check if there are available
tools like aria2c in the target system.
  - If we get a negative response we must use MediaCurl (like now).
  - If we get a positive response we can use a "new" class called
MediaArise (or something like this) which uses aria2c to download
files using Medialink from network.
What do you think about this idea? Any comments for this
implementation? Dr. Poeml, please, feel free to correct me if there
are any error.
Obviously, any suggestion or comment will be more than welcome :-).
Thanks :-)
Gerard
Overall I like the idea. We had a good experience using external tools 
for parsing repos already. And we do for rpm too.
Duncan

-- 
To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org
For additional commands, e-mail: zypp-devel+help@opensuse.org

[zypp-devel] Re: [SoC-student] Comments and new ideas about implementation of failover concept

Duncan Mac-Vicar Prett