Mailinglist Archive: mirror (15 mails)

< Previous Next >
Re: Fwd: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients
On Mon, 2 Mar 2020, Benjamin Zeller wrote:

Clumping chunks would also alleviate the artificial speed limit for
those with high bandwidth connections that are hampered by the RTT of
getting one chunk at a time...
I need to look at the code, maybe we could support that somehow but its
a good point!

However, I'd recommend doing some experiments on various clients and
see what speeds you get with varying download strategies. I'm quite
convinced that a lot of the mirrors are perfectly capable of pushing
enough bandwidth to saturate whatever connection the downloading
client has without resorting to parallel downloads, chunking, etc.
Especially for bigger files and fast connections.
Depending on the clients bandwith that is probably right. But doing some
more experiments is always a good idea...

I can actually take my home setup as an example on why I think the current scheme is lacking:

I have GigE at home, whee!

Annoyance: The brilliant ISP has no peering in Umeå, so I have 17ms RTT via Stockholm to which is my closest high-bandwidth mirror (if multiple 10GigE is considered high-bandwidth nowadays that is).

Anyhow, if I do
wget -O /dev/null
I download that 77 MB file at a rate of approx 85 MB/s.

However, if I would download it using the current opensuse download method in 256k chunks one at a time with 17ms RTT I'd get approx 59 chunks per second, or around 15 MB/s. But, that's assuming that the GET will give you a response with file contents. redirects requests for large files to dedicated offload hosts, so you'll pay another 17 ms RTT and you're down to say 7 MB/s out of the 120 MB/s theoretical bandwidth.

Hence, my statement is that just doing http(s) get:s would likely land you with better performance for high-bandwidth connections.

For really low bandwidth connections the RTT penalty is drowned by the actual data transfer times, so the performance hit will be smaller.

In all honesty, I have a real hard time figuring out use cases for when the chunked download strategy would be a performance gain for package downloads since you already have redirector in mirrorbrain to do a rough load balancing between mirrors...

I could understand it for big .iso files and doing torrent style downloads from multiple slow/overloaded mirrors, but that's another usecase IMHO.

So, do some tests with various client setups and see where that takes you. Some times the best solution/default is actually the simplest, and my gut feeling is that this might be one of those times...



On 3/2/20 3:04 PM, Michael Andres wrote:

-------- Forwarded Message --------
Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse
Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET)
From: Niklas Edmundsson <nikke@xxxxxxxxxx>
To: mirror@xxxxxxxxxxxx


am I the only mirror admin that finds the current behavior of opensuse
clients suboptimal?

Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc
seem to be done with 256 kb chunk size, always, as an example:

GET bytes=0-262143

GET bytes=262144-524287

That's a silly small size, since TCP won't be able to ramp window sizes
and get good speed before those 256k are done. Also, we get
int($filesize/256k) entries in our logs for each download.

To make matters worse, the thing seems to do some kind of round robin
between sites, with this pattern being the most ineffective looking
a mirror admin standpoint:

GET bytes=2097152-2359295

GET bytes=2621440-2883583

Since the OS normally does read-ahead on file system reads, it will
read-ahead after byte 2359295 in preparation for the next read(). In
this case though, that's in vain as the request never comes but the
data read is instead byte 2621440 and forward... OS read-ahead is most
commonly in the 64kB-1MB range, so it's not unlikely that the entire
256k gap inbetween is read from disk without being used...

Downloading files this way is just plain stupid, IMHO.

I don't know what problem this behavior is supposed so solve, but it's
definitely not beneficial for us as a mirror, and I think it's hurting
your end users as well.

If you want more bandwidth from us, request larger chunks (or whole
files). The TCP window will grow and you'll get the performance (within
the limits of 10 gigabit networking for one download).

If you want to spread the load between mirrors, use larger chunks, and
specifically avoid small chunks and striped access.

In any case, merge requests! If you're going to request a number of
consecutive chunks, do it in one request, preferrably as one range, to
make the most of the tcp connection you've set up.

My minimum suggestion would be to bump the chunk-size to multiple
megabytes at the minimum, possibly varying depending on download
performance, aiming for each GET taking at least a couple of seconds to
allow for TCP to ramp speed (and reduce the noise in our logs). In
extreme cases we're seeing multiple tens of GET:s each second for some
downloads, I'm guessing the rate throttles due to the RTT latency (ping
time) and not some real bandwidth limit...

/Nikke - admin of


Niklas Edmundsson, Admin @ {acc,hpc2n} | nikke@xxxxxxxxxx
A victim of a prank, Geordi puts a banana over his eyes
To unsubscribe, e-mail: mirror+unsubscribe@xxxxxxxxxxxx
To contact the owner, email: mirror+owner@xxxxxxxxxxxx

< Previous Next >
List Navigation
Follow Ups