Re: Fwd: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients
Hi, Sorry for top posting but I just subscribed to this ML for this conversation. I'm currently working on the libzypp/zypper http/https media backend and have suspected something like that myself. Currently we do not download multiple files at once but download one file in multiple chunks. I have a new downloader implemented that would support multiple downloads in parallel but I suffer from the problem that mirrorbrains forwards me from HTTPS to HTTP if I disable the metalink downloads in favour of downloading full files only, curl errors out in that case. I know we could disable that error but I'm not really sure that this is what we want. With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for. Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using. Not sure if we could drop Metalinks completely, however it makes the code much more complex supporting it and if it actually does more harm then good we should think about something else. As to the suggestion to use dynamic or bigger chunks: The metalink description file we download at the beginning has the list of chunks included and they are fixed. We probably could try and download multiple chunks in the same request though. Cheers, Benjamin On 3/2/20 3:04 PM, Michael Andres wrote:
-------- Forwarded Message -------- Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET) From: Niklas Edmundsson <nikke@acc.umu.se> To: mirror@opensuse.org
Hi,
am I the only mirror admin that finds the current behavior of opensuse clients suboptimal?
Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc seem to be done with 256 kb chunk size, always, as an example:
GET bytes=0-262143 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm GET bytes=262144-524287 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
That's a silly small size, since TCP won't be able to ramp window sizes and get good speed before those 256k are done. Also, we get int($filesize/256k) entries in our logs for each download.
To make matters worse, the thing seems to do some kind of round robin between sites, with this pattern being the most ineffective looking from a mirror admin standpoint:
GET bytes=2097152-2359295 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm GET bytes=2621440-2883583 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
Since the OS normally does read-ahead on file system reads, it will read-ahead after byte 2359295 in preparation for the next read(). In this case though, that's in vain as the request never comes but the next data read is instead byte 2621440 and forward... OS read-ahead is most commonly in the 64kB-1MB range, so it's not unlikely that the entire 256k gap inbetween is read from disk without being used...
Downloading files this way is just plain stupid, IMHO.
I don't know what problem this behavior is supposed so solve, but it's definitely not beneficial for us as a mirror, and I think it's hurting your end users as well.
If you want more bandwidth from us, request larger chunks (or whole files). The TCP window will grow and you'll get the performance (within the limits of 10 gigabit networking for one download).
If you want to spread the load between mirrors, use larger chunks, and specifically avoid small chunks and striped access.
In any case, merge requests! If you're going to request a number of consecutive chunks, do it in one request, preferrably as one range, to make the most of the tcp connection you've set up.
My minimum suggestion would be to bump the chunk-size to multiple megabytes at the minimum, possibly varying depending on download performance, aiming for each GET taking at least a couple of seconds to allow for TCP to ramp speed (and reduce the noise in our logs). In extreme cases we're seeing multiple tens of GET:s each second for some downloads, I'm guessing the rate throttles due to the RTT latency (ping time) and not some real bandwidth limit...
/Nikke - admin of ftp.acc.umu.se
-- Benjamin Zeller <bzeller@suse.de> Systems Programmer SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On Mon, 2 Mar 2020, Benjamin Zeller wrote:
Hi,
Sorry for top posting but I just subscribed to this ML for this conversation.
You're forgiven :-)
With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for.
You also have the checksum for the entire file if I read correctly, so for smaller files it would make most sense to just get the entire file and verify the file checksum only. <snip>
Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using.
That would make sense. Most mirrors today supports https, and it would be least surprising for users if a request initiated as https stayed that way. You could get the metalink file over https and then get all data over http and just verify the checksum. For most users that are not concerned with privacy issues of http os package downloads this is good enough. For those that have privacy concerns there needs to be an end-to-end solution that stays on the https track. <snap>
Not sure if we could drop Metalinks completely, however it makes the code much more complex supporting it and if it actually does more harm then good we should think about something else.
As to the suggestion to use dynamic or bigger chunks: The metalink description file we download at the beginning has the list of chunks included and they are fixed. We probably could try and download multiple chunks in the same request though.
Multiple chunks in the same request would indeed alleviate the OS read-ahead-is-not-used problem, from the OS point of view it would look like a bigger chunk being read. Clumping chunks would also alleviate the artificial speed limit for those with high bandwidth connections that are hampered by the RTT of getting one chunk at a time... However, I'd recommend doing some experiments on various clients and see what speeds you get with varying download strategies. I'm quite convinced that a lot of the mirrors are perfectly capable of pushing enough bandwidth to saturate whatever connection the downloading client has without resorting to parallel downloads, chunking, etc. Especially for bigger files and fast connections.
Cheers,
Benjamin
On 3/2/20 3:04 PM, Michael Andres wrote:
-------- Forwarded Message -------- Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET) From: Niklas Edmundsson <nikke@acc.umu.se> To: mirror@opensuse.org
Hi,
am I the only mirror admin that finds the current behavior of opensuse clients suboptimal?
Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc seem to be done with 256 kb chunk size, always, as an example:
GET bytes=0-262143 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm GET bytes=262144-524287 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
That's a silly small size, since TCP won't be able to ramp window sizes and get good speed before those 256k are done. Also, we get int($filesize/256k) entries in our logs for each download.
To make matters worse, the thing seems to do some kind of round robin between sites, with this pattern being the most ineffective looking from a mirror admin standpoint:
GET bytes=2097152-2359295 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm GET bytes=2621440-2883583 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
Since the OS normally does read-ahead on file system reads, it will read-ahead after byte 2359295 in preparation for the next read(). In this case though, that's in vain as the request never comes but the next data read is instead byte 2621440 and forward... OS read-ahead is most commonly in the 64kB-1MB range, so it's not unlikely that the entire 256k gap inbetween is read from disk without being used...
Downloading files this way is just plain stupid, IMHO.
I don't know what problem this behavior is supposed so solve, but it's definitely not beneficial for us as a mirror, and I think it's hurting your end users as well.
If you want more bandwidth from us, request larger chunks (or whole files). The TCP window will grow and you'll get the performance (within the limits of 10 gigabit networking for one download).
If you want to spread the load between mirrors, use larger chunks, and specifically avoid small chunks and striped access.
In any case, merge requests! If you're going to request a number of consecutive chunks, do it in one request, preferrably as one range, to make the most of the tcp connection you've set up.
My minimum suggestion would be to bump the chunk-size to multiple megabytes at the minimum, possibly varying depending on download performance, aiming for each GET taking at least a couple of seconds to allow for TCP to ramp speed (and reduce the noise in our logs). In extreme cases we're seeing multiple tens of GET:s each second for some downloads, I'm guessing the rate throttles due to the RTT latency (ping time) and not some real bandwidth limit...
/Nikke - admin of ftp.acc.umu.se
/Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- "Chemistry is fun; it's a lot like witchcraft, only less newt." - Willow =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On 3/2/20 4:32 PM, Niklas Edmundsson wrote:
On Mon, 2 Mar 2020, Benjamin Zeller wrote:
Hi,
Sorry for top posting but I just subscribed to this ML for this conversation.
You're forgiven :-) Phew , it was a tough decision to do this ;).
With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for.
You also have the checksum for the entire file if I read correctly, so for smaller files it would make most sense to just get the entire file and verify the file checksum only.
I'm not sure we always have a checksum for the entire file. Of course once we have downloaded the index files we have the checksums for the following downloads.
<snip>
Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using.
That would make sense. Most mirrors today supports https, and it would be least surprising for users if a request initiated as https stayed that way.
You could get the metalink file over https and then get all data over http and just verify the checksum. For most users that are not concerned with privacy issues of http os package downloads this is good enough. For those that have privacy concerns there needs to be an end-to-end solution that stays on the https track.
This is basically what happens now, the Metalink file is downloaded over https, then depending on the URLs that are listed in the Metalink file other means of connection can be used...
<snap>
Not sure if we could drop Metalinks completely, however it makes the code much more complex supporting it and if it actually does more harm then good we should think about something else.
As to the suggestion to use dynamic or bigger chunks: The metalink description file we download at the beginning has the list of chunks included and they are fixed. We probably could try and download multiple chunks in the same request though.
Multiple chunks in the same request would indeed alleviate the OS read-ahead-is-not-used problem, from the OS point of view it would look like a bigger chunk being read.
This would also be much more in sync with the planned zchunk download support.
Clumping chunks would also alleviate the artificial speed limit for those with high bandwidth connections that are hampered by the RTT of getting one chunk at a time...
I need to look at the code, maybe we could support that somehow but its a good point!
However, I'd recommend doing some experiments on various clients and see what speeds you get with varying download strategies. I'm quite convinced that a lot of the mirrors are perfectly capable of pushing enough bandwidth to saturate whatever connection the downloading client has without resorting to parallel downloads, chunking, etc. Especially for bigger files and fast connections.
Depending on the clients bandwith that is probably right. But doing some more experiments is always a good idea...
Cheers,
Benjamin
On 3/2/20 3:04 PM, Michael Andres wrote:
-------- Forwarded Message -------- Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET) From: Niklas Edmundsson <nikke@acc.umu.se> To: mirror@opensuse.org
Hi,
am I the only mirror admin that finds the current behavior of opensuse clients suboptimal?
Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc seem to be done with 256 kb chunk size, always, as an example:
GET bytes=0-262143 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=262144-524287 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
That's a silly small size, since TCP won't be able to ramp window sizes and get good speed before those 256k are done. Also, we get int($filesize/256k) entries in our logs for each download.
To make matters worse, the thing seems to do some kind of round robin between sites, with this pattern being the most ineffective looking from a mirror admin standpoint:
GET bytes=2097152-2359295 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=2621440-2883583 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
Since the OS normally does read-ahead on file system reads, it will read-ahead after byte 2359295 in preparation for the next read(). In this case though, that's in vain as the request never comes but the next data read is instead byte 2621440 and forward... OS read-ahead is most commonly in the 64kB-1MB range, so it's not unlikely that the entire 256k gap inbetween is read from disk without being used...
Downloading files this way is just plain stupid, IMHO.
I don't know what problem this behavior is supposed so solve, but it's definitely not beneficial for us as a mirror, and I think it's hurting your end users as well.
If you want more bandwidth from us, request larger chunks (or whole files). The TCP window will grow and you'll get the performance (within the limits of 10 gigabit networking for one download).
If you want to spread the load between mirrors, use larger chunks, and specifically avoid small chunks and striped access.
In any case, merge requests! If you're going to request a number of consecutive chunks, do it in one request, preferrably as one range, to make the most of the tcp connection you've set up.
My minimum suggestion would be to bump the chunk-size to multiple megabytes at the minimum, possibly varying depending on download performance, aiming for each GET taking at least a couple of seconds to allow for TCP to ramp speed (and reduce the noise in our logs). In extreme cases we're seeing multiple tens of GET:s each second for some downloads, I'm guessing the rate throttles due to the RTT latency (ping time) and not some real bandwidth limit...
/Nikke - admin of ftp.acc.umu.se
/Nikke
-- Benjamin Zeller <bzeller@suse.de> Systems Programmer SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On Mon, 2 Mar 2020, Benjamin Zeller wrote:
Clumping chunks would also alleviate the artificial speed limit for those with high bandwidth connections that are hampered by the RTT of getting one chunk at a time... I need to look at the code, maybe we could support that somehow but its a good point!
However, I'd recommend doing some experiments on various clients and see what speeds you get with varying download strategies. I'm quite convinced that a lot of the mirrors are perfectly capable of pushing enough bandwidth to saturate whatever connection the downloading client has without resorting to parallel downloads, chunking, etc. Especially for bigger files and fast connections. Depending on the clients bandwith that is probably right. But doing some more experiments is always a good idea...
I can actually take my home setup as an example on why I think the current scheme is lacking: I have GigE at home, whee! Annoyance: The brilliant ISP has no peering in Umeå, so I have 17ms RTT via Stockholm to ftp.acc.umu.se which is my closest high-bandwidth mirror (if multiple 10GigE is considered high-bandwidth nowadays that is). Anyhow, if I do wget -O /dev/null http://ftp.acc.umu.se/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/kernel-... I download that 77 MB file at a rate of approx 85 MB/s. However, if I would download it using the current opensuse download method in 256k chunks one at a time with 17ms RTT I'd get approx 59 chunks per second, or around 15 MB/s. But, that's assuming that the GET will give you a response with file contents. ftp.acc.umu.se redirects requests for large files to dedicated offload hosts, so you'll pay another 17 ms RTT and you're down to say 7 MB/s out of the 120 MB/s theoretical bandwidth. Hence, my statement is that just doing http(s) get:s would likely land you with better performance for high-bandwidth connections. For really low bandwidth connections the RTT penalty is drowned by the actual data transfer times, so the performance hit will be smaller. In all honesty, I have a real hard time figuring out use cases for when the chunked download strategy would be a performance gain for package downloads since you already have redirector in mirrorbrain to do a rough load balancing between mirrors... I could understand it for big .iso files and doing torrent style downloads from multiple slow/overloaded mirrors, but that's another usecase IMHO. So, do some tests with various client setups and see where that takes you. Some times the best solution/default is actually the simplest, and my gut feeling is that this might be one of those times...
Cheers,
Benjamin
On 3/2/20 3:04 PM, Michael Andres wrote:
-------- Forwarded Message -------- Subject: [mirror] Suboptimal (for mirrors) download pattern by opensuse clients Date: Fri, 21 Feb 2020 14:36:26 +0100 (CET) From: Niklas Edmundsson <nikke@acc.umu.se> To: mirror@opensuse.org
Hi,
am I the only mirror admin that finds the current behavior of opensuse clients suboptimal?
Requests by "ZYpp 17.11.4 (curl 7.60.0) openSUSE-Leap-15.1-x86_64" etc seem to be done with 256 kb chunk size, always, as an example:
GET bytes=0-262143 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=262144-524287 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
That's a silly small size, since TCP won't be able to ramp window sizes and get good speed before those 256k are done. Also, we get int($filesize/256k) entries in our logs for each download.
To make matters worse, the thing seems to do some kind of round robin between sites, with this pattern being the most ineffective looking from a mirror admin standpoint:
GET bytes=2097152-2359295 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
GET bytes=2621440-2883583 /mirror/opensuse.org/tumbleweed/repo/oss/x86_64/libqt5-qtwebengine-5.14.1-1.5.x86_64.rpm
Since the OS normally does read-ahead on file system reads, it will read-ahead after byte 2359295 in preparation for the next read(). In this case though, that's in vain as the request never comes but the next data read is instead byte 2621440 and forward... OS read-ahead is most commonly in the 64kB-1MB range, so it's not unlikely that the entire 256k gap inbetween is read from disk without being used...
Downloading files this way is just plain stupid, IMHO.
I don't know what problem this behavior is supposed so solve, but it's definitely not beneficial for us as a mirror, and I think it's hurting your end users as well.
If you want more bandwidth from us, request larger chunks (or whole files). The TCP window will grow and you'll get the performance (within the limits of 10 gigabit networking for one download).
If you want to spread the load between mirrors, use larger chunks, and specifically avoid small chunks and striped access.
In any case, merge requests! If you're going to request a number of consecutive chunks, do it in one request, preferrably as one range, to make the most of the tcp connection you've set up.
My minimum suggestion would be to bump the chunk-size to multiple megabytes at the minimum, possibly varying depending on download performance, aiming for each GET taking at least a couple of seconds to allow for TCP to ramp speed (and reduce the noise in our logs). In extreme cases we're seeing multiple tens of GET:s each second for some downloads, I'm guessing the rate throttles due to the RTT latency (ping time) and not some real bandwidth limit...
/Nikke - admin of ftp.acc.umu.se
/Nikke
/Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- A victim of a prank, Geordi puts a banana over his eyes =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
Niklas Edmundsson wrote:
In all honesty, I have a real hard time figuring out use cases for when the chunked download strategy would be a performance gain for package downloads since you already have redirector in mirrorbrain to do a rough load balancing between mirrors...
My understanding is that the chunking is to enable optimal use of the client-side bandwidth. Much depends on the available mirrors and the client's connection. For instance, if I download the file you mentioned wget -O /dev/null http://ftp.acc.umu.se/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/kernel-... I get slightly less than 10MB/sec. (also 1Gbit fibre) If that was spread over 5 mirrors and chunked, I would likely see a significantly faster download. Of course, I wouldn't normally be given a mirror in Sweden, but we have countries that have no mirrors at all. If you are in Spain, your 1Gbit fibre won't do you much good because everything comes from Switzerland or Germany or Sweden. Or elsewhere: For instance - again for the package above, a Vodafone user in Spain (81.203.0.0) might be given the following mirrors: a) Cyprus b) Iran c) China d) China -- Per Jessen, Zürich (5.6°C) member, openSUSE Heroes. -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On 3/3/20 8:07 AM, Per Jessen wrote:
Niklas Edmundsson wrote:
In all honesty, I have a real hard time figuring out use cases for when the chunked download strategy would be a performance gain for package downloads since you already have redirector in mirrorbrain to do a rough load balancing between mirrors... My understanding is that the chunking is to enable optimal use of the client-side bandwidth. Much depends on the available mirrors and the client's connection. For instance, if I download the file you mentioned
wget -O /dev/null http://ftp.acc.umu.se/mirror/opensuse.org/tumbleweed/repo/oss/x86_64/kernel-...
I get slightly less than 10MB/sec. (also 1Gbit fibre) If that was spread over 5 mirrors and chunked, I would likely see a significantly faster download.
Of course, I wouldn't normally be given a mirror in Sweden, but we have countries that have no mirrors at all. If you are in Spain, your 1Gbit fibre won't do you much good because everything comes from Switzerland or Germany or Sweden. Or elsewhere:
For instance - again for the package above, a Vodafone user in Spain (81.203.0.0) might be given the following mirrors:
a) Cyprus b) Iran c) China d) China I'd think for those cases downloading multiple files at the same time would have a similar effect. For the chunked downloads a too small size for the chunks still would hurt your performance too though, so maybe we can start with optimizing there and generate Metalink files with a much bigger chunk size?
-- Benjamin Zeller <bzeller@suse.de> Systems Programmer SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On Tue, 3 Mar 2020, Benjamin Zeller wrote:
I'd think for those cases downloading multiple files at the same time would have a similar effect. For the chunked downloads a too small size for the chunks still would hurt your performance too though, so maybe we can start with optimizing there and generate Metalink files with a much bigger chunk size?
I think the checksum block/chunking should be disconnected from the download chunking. Checksum chunk size might well vary over time, and optimal download chunk size might vary depending on connections and conditions, so decoupling them is likely the best way forward if the chunking is to be kept. I'd say go for the easy way of downloading the file in parts that you see fit, and afterwards do the checksum. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- Picard to his Singer repairman: Make it sew. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
Benjamin Zeller wrote:
I have a new downloader implemented that would support multiple downloads in parallel but I suffer from the problem that mirrorbrains forwards me from HTTPS to HTTP if I disable the metalink downloads in favour of downloading full files only, curl errors out in that case.
I know we could disable that error but I'm not really sure that this is what we want. With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for.
Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using.
mirrorbrain does currently not support https. It simply does not know about it. -- Per Jessen, Zürich (7.1°C) member, openSUSE Heroes -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On 3/2/20 4:36 PM, Per Jessen wrote:
Benjamin Zeller wrote:
I have a new downloader implemented that would support multiple downloads in parallel but I suffer from the problem that mirrorbrains forwards me from HTTPS to HTTP if I disable the metalink downloads in favour of downloading full files only, curl errors out in that case.
I know we could disable that error but I'm not really sure that this is what we want. With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for.
Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using. mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? AFAIK when I request a file from opensuse.org I always get a redirect so currently all we could do if we want to download full files is to allow redirects to HTTP ... which as said is not exactly optimal if we do not have a checksum for the file..
-- Benjamin Zeller <bzeller@suse.de> Systems Programmer SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
Benjamin Zeller wrote:
On 3/2/20 4:36 PM, Per Jessen wrote:
Benjamin Zeller wrote:
I have a new downloader implemented that would support multiple downloads in parallel but I suffer from the problem that mirrorbrains forwards me from HTTPS to HTTP if I disable the metalink downloads in favour of downloading full files only, curl errors out in that case.
I know we could disable that error but I'm not really sure that this is what we want. With metalink files we get the metalink description over https , which includes all checksums for the several chunks and then can use http connections as well because we can check if we really got what we asked for.
Would be interesting to know here if we can configure mirrorbrains to just redirect to HTTPS if that's what the incoming connection is using.
mirrorbrain does currently not support https. It simply does not know about it.
That explains, can we somehow fix or work around that?
Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6. -- Per Jessen, Zürich (4.8°C) member, openSUSE Heroes. -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On 3/3/20 7:42 AM, Per Jessen wrote:
mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6.
Or you could switch from mirrorbrain to mirrorbits, which does support both, and to which most of the rest of the world switched by now, due to the complete lack of any mirrorbrain development (or even merging of existing patches) for years now. -- Michael Meier, FTP-Admin Friedrich-Alexander-Universitaet Erlangen-Nuernberg Regionales Rechenzentrum Erlangen Martensstrasse 1, 91058 Erlangen, Germany Tel.: +49 9131 85-28973, Fax: +49 9131 302941 rrze-ftp-admins@fau.de blogs.fau.de/ftp/ -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On Tue, 3 Mar 2020, Michael Meier (FTP-Admin) wrote:
On 3/3/20 7:42 AM, Per Jessen wrote:
mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6.
Or you could switch from mirrorbrain to mirrorbits, which does support both, and to which most of the rest of the world switched by now, due to the complete lack of any mirrorbrain development (or even merging of existing patches) for years now.
That's my impression as well, it's likely less effort than trying to beat mirrorbrain into shape. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- Picard to his Singer repairman: Make it sew. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On 3/3/20 8:58 AM, Niklas Edmundsson wrote:
On Tue, 3 Mar 2020, Michael Meier (FTP-Admin) wrote:
On 3/3/20 7:42 AM, Per Jessen wrote:
mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6.
Or you could switch from mirrorbrain to mirrorbits, which does support both, and to which most of the rest of the world switched by now, due to the complete lack of any mirrorbrain development (or even merging of existing patches) for years now.
That's my impression as well, it's likely less effort than trying to beat mirrorbrain into shape.
/Nikke From what I see there is no metalink support in mirrorbits. Might be the reason why it was not adopted yet.
-- Benjamin Zeller <bzeller@suse.de> Systems Programmer SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuremberg, Germany Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
Benjamin Zeller wrote:
On 3/3/20 8:58 AM, Niklas Edmundsson wrote:
On Tue, 3 Mar 2020, Michael Meier (FTP-Admin) wrote:
On 3/3/20 7:42 AM, Per Jessen wrote:
mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6.
Or you could switch from mirrorbrain to mirrorbits, which does support both, and to which most of the rest of the world switched by now, due to the complete lack of any mirrorbrain development (or even merging of existing patches) for years now.
That's my impression as well, it's likely less effort than trying to beat mirrorbrain into shape.
/Nikke
From what I see there is no metalink support in mirrorbits. Might be the reason why it was not adopted yet.
Personally I've never heard of mirrorbits, but I have also not had reason to look. Despite some minor flaws, mirrorbrain is actually running quite well :-) -- Per Jessen, Zürich (7.2°C) Member, openSUSE Heroes. -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
On Tue, 3 Mar 2020 08:58:00 +0100 (CET) Niklas Edmundsson <nikke@acc.umu.se> wrote:
On Tue, 3 Mar 2020, Michael Meier (FTP-Admin) wrote:
On 3/3/20 7:42 AM, Per Jessen wrote:
mirrorbrain does currently not support https. It simply does not know about it. That explains, can we somehow fix or work around that? Basically someone would need to do some work on mirrorbrain itself, the code. It would be very nice to support https as well as ipv6.
Or you could switch from mirrorbrain to mirrorbits, which does support both, and to which most of the rest of the world switched by now, due to the complete lack of any mirrorbrain development (or even merging of existing patches) for years now.
That's my impression as well, it's likely less effort than trying to beat mirrorbrain into shape.
1. we looked at mirrorbits and also at mirrormanager from fedora. both are not solving our main problem. 2. actually you want to turn off metalinks support or at least the partial fetching in zypper to get rid of the suboptimal pattern that started this thread. 3. https://github.com/openSUSE/mirrorbrain/wiki/Roadmap We have a working proof of concept for a much faster scanning process which allows us to implement other things from the roadmap. -- openSUSE - SUSE Linux is my linux openSUSE is good for you www.opensuse.org -- To unsubscribe, e-mail: mirror+unsubscribe@opensuse.org To contact the owner, email: mirror+owner@opensuse.org
participants (5)
-
Benjamin Zeller
-
Marcus Rückert
-
Michael Meier (FTP-Admin)
-
Niklas Edmundsson
-
Per Jessen