Re: Fwd: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients

2 Mar 2020

      On 3/2/20 4:32 PM, Niklas Edmundsson wrote:
...
On Mon, 2 Mar 2020, Benjamin Zeller wrote:
...
Hi,
Sorry for top posting but I just subscribed to this ML for this
conversation.
You're forgiven :-)
Phew , it was a tough decision to do this ;).
...
With metalink files we get the metalink description over https ,
which includes all checksums for the several chunks and then can use
http connections as well because we can check if we really got what
we asked for.
You also have the checksum for the entire file if I read correctly, so
for smaller files it would make most sense to just get the entire file
and verify the file checksum only.
I'm not sure we always have a checksum for the entire file. Of course
once we have downloaded the index files we have the checksums for the
following downloads.
...
<snip>
...
Would be interesting to know here if we can configure mirrorbrains to
just redirect to HTTPS if that's what the incoming connection is using.
That would make sense. Most mirrors today supports https, and it would
be least surprising for users if a request initiated as https stayed
that way.
You could get the metalink file over https and then get all data over
http and just verify the checksum. For most users that are not
concerned with privacy issues of http os package downloads this is
good enough. For those that have privacy concerns there needs to be an
end-to-end solution that stays on the https track.
This is basically what happens now, the Metalink file is downloaded over
https, then depending on the URLs that are listed in the Metalink file
other means of connection can be used...
...
<snap>
...
Not sure if we could drop Metalinks completely, however it makes the
code much more complex supporting it
and if it actually does more harm then good we should think about
something else.
As to the suggestion to use dynamic or bigger chunks: The metalink
description file we download at the beginning
has the list of chunks included and they are fixed. We probably could
try and download multiple chunks in the same request though.
Multiple chunks in the same request would indeed alleviate the OS
read-ahead-is-not-used problem, from the OS point of view it would
look like a bigger chunk being read.
This would also be much more in sync with the planned zchunk download
support.
...
Clumping chunks would also alleviate the artificial speed limit for
those with high bandwidth connections that are hampered by the RTT of
getting one chunk at a time...
I need to look at the code, maybe we could support that somehow but its
a good point!
...
However, I'd recommend doing some experiments on various clients and
see what speeds you get with varying download strategies. I'm quite
convinced that a lot of the mirrors are perfectly capable of pushing
enough bandwidth to saturate whatever connection the downloading
client has without resorting to parallel downloads, chunking, etc.
Especially for bigger files and fast connections.
Depending on the clients bandwith that is probably right. But doing some
more experiments is always a good idea...
...
...
Cheers,
Benjamin
On 3/2/20 3:04 PM, Michael Andres wrote:
...
-------- Forwarded Message --------
Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse
clients
Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET)
From: Niklas Edmundsson 
To: mirror@opensuse.org
Hi,
am I the only mirror admin that finds the current behavior of opensuse
clients suboptimal?
Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc
seem to be done with 256 kb chunk size, always, as an example:
GET bytes=0-262143
/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=262144-524287
/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
That's a silly small size, since TCP won't be able to ramp window sizes
and get good speed before those 256k are done. Also, we get
int($filesize/256k) entries in our logs for each download.
To make matters worse, the thing seems to do some kind of round robin
between sites, with this pattern being the most ineffective looking
from
a mirror admin standpoint:
GET bytes=2097152-2359295
/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=2621440-2883583
/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
Since the OS normally does read-ahead on file system reads, it will
read-ahead after byte 2359295 in preparation for the next read(). In
this case though, that's in vain as the request never comes but the
next
data read is instead byte 2621440 and forward... OS read-ahead is most
commonly in the 64kB-1MB range, so it's not unlikely that the entire
256k gap inbetween is read from disk without being used...
Downloading files this way is just plain stupid, IMHO.
I don't know what problem this behavior is supposed so solve, but it's
definitely not beneficial for us as a mirror, and I think it's hurting
your end users as well.
If you want more bandwidth from us, request larger chunks (or whole
files). The TCP window will grow and you'll get the performance (within
the limits of 10 gigabit networking for one download).
If you want to spread the load between mirrors, use larger chunks, and
specifically avoid small chunks and striped access.
In any case, merge requests! If you're going to request a number of
consecutive chunks, do it in one request, preferrably as one range, to
make the most of the tcp connection you've set up.
My minimum suggestion would be to bump the chunk-size to multiple
megabytes at the minimum, possibly varying depending on download
performance, aiming for each GET taking at least a couple of seconds to
allow for TCP to ramp speed (and reduce the noise in our logs). In
extreme cases we're seeing multiple tens of GET:s each second for some
downloads, I'm guessing the rate throttles due to the RTT latency (ping
time) and not some real bandwidth limit...
/Nikke - admin of ftp.acc.umu.se
/Nikke
-- 
Benjamin Zeller 
Systems Programmer

SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/

(HRB 36809, AG Nürnberg)
Managing Director: Felix Imendörffer

-- 
To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org
To contact the owner, email: mirror+owner@opensuse.org