[opensuse-factory] Proposal for openSUSE zypper download cache support
Hi, in the light of current events, I would like to start to discuss a scheme for improving the regular Tumbleweed upgrade performance. Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server. I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. If ZYPPERCACHE is set and reachable, all files must be fetched though it. If ZYPPERCACHE is set and not reachable, zypper may (interactively) warn about it, and proceed as usual. Such a scheme would greatly relieve the load on the public infrastructure, and improve the upgrade performance of local systems significantly. I'm using a shared /var/cache/zypp/packages scheme already, that improves the situation already, but suffers from a couple of issues: two zypper processes are treading on each others feet, the repo definitions have to be synced carefully, it's hacky, etc... I think, that his great product deserves a decent solution to this problem. If it works well, other openSUSE distributions may profit as well. Cheers, Pete -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote: Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings. In the past I used wwwoffle to achieve such local caching because it was very easy to setup. The systems on the LAN used it for the updates. But wwwoffle is gone from the official repos, squid seems too much for this purpose, and I found no proper replacement yet. Regards, Dieter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am Samstag, 15. Juni 2019, 10:52:02 CEST schrieb dieter:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse.
I fully support the idea of a local caching mechanism.
Great, thanks for the feedback, Dieter.
But I am not convinced it is a good solution to pack this functionality into zypper.
Obviously, I didn't express myself correctly. The idea was to either * reuse the zypper download library * or recreate it in some high level programming language.
In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
In the past I used wwwoffle to achieve such local caching because it was very easy to setup. The systems on the LAN used it for the updates. But wwwoffle is gone from the official repos, squid seems too much for this purpose, and I found no proper replacement yet.
I even created a squid plugin for this purpose once, but with squid alone, this isn't the real McCoy either, because intercepting SSL connections is no fun.. But, given (I) is available, we could use squid as a proxy, and point ZYPPERCACHE to a local http address. BTW, what I don't like much about squid is the way, it stores the cached items. I would much prefer a directory layout like zypper is using itself. Cheers, Pete -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hans-Peter Jansen wrote:
I even created a squid plugin for this purpose once, but with squid alone, this isn't the real McCoy either, because intercepting SSL connections is no fun..
Very true, but there shouldn't be much SSL needed. Nothing worth storing I would say. -- Per Jessen, Zürich (20.4°C) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am Samstag, 15. Juni 2019, 10:52:02 CEST schrieb dieter:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse.
But I am not convinced it is a good solution to pack this functionality into zypper.
Obviously, I didn't express myself correctly. The idea was to either * reuse the zypper download library * or recreate it in some high level programming language. Sorry, I also was not precise with my answer, I meant to express zypper or the zypper (download) library. I have to admit I am not familiar with
On Sat, 15 Jun 2019 12:11:32 +0200 Hans-Peter Jansen wrote: the existing functionality of the zypper download library, it just seems to me that this would be a quite big additional functionality. Especially to serve its local zypper-cache directory to other hosts.
In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
I even created a squid plugin for this purpose once, but with squid alone, this isn't the real McCoy either, because intercepting SSL connections is no fun.. But, given (I) is available, we could use squid as a proxy, and point ZYPPERCACHE to a local http address. BTW, what I don't like much about squid is the way, it stores the cached items. I would much prefer a directory layout like zypper is using itself. In my experience up to now SSL interception is not necessary, the openSUSE download URLs are still accessible via http protocol.
Regards, Dieter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 15/06/2019 10.52, dieter wrote:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
In the past I used wwwoffle to achieve such local caching because it was very easy to setup. The systems on the LAN used it for the updates. But wwwoffle is gone from the official repos, squid seems too much for this purpose, and I found no proper replacement yet.
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult. We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts. It would be good for anyone maintaining several machines. -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
15.06.2019 13:46, Carlos E. R. пишет:
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts.
That is exactly what RMT (formerly SMT) does. I do not know whether it can serve arbitrary repositories from d.o.o, but sources are there for anyone to extend. https://github.com/SUSE/rmt
On 15/06/2019 10.52, dieter wrote:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult. Yes you are right, the hit rate is not 100%. But when the update of all systems happens within a limited time frame then most of
On Sat, 15 Jun 2019 12:46:14 +0200 Carlos E. R. wrote: the packages are still fetched from the same download URL and therefore downloaded only once and then taken from the proxy cache.
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts. Agreed, a specialized solution would probably have the best possible hit rate, but it would need a lot of functionality which already exists elsewhere, e.g. some cleanup or age out mechanism to delete packages which are no longer needed.
It would be good for anyone maintaining several machines. Definitely.
Regards, Dieter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 15/06/2019 18.08, dieter wrote:
On Sat, 15 Jun 2019 12:46:14 +0200 Carlos E. R. wrote:
On 15/06/2019 10.52, dieter wrote:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult. Yes you are right, the hit rate is not 100%. But when the update of all systems happens within a limited time frame then most of the packages are still fetched from the same download URL and therefore downloaded only once and then taken from the proxy cache.
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing. And to be of use, if I download the same kernel in two weeks I want it to be cached. I could get MirrorBrain pointing me to 3 other mirrors and downloading it again 3 times more, not saving any resources. As it is, a download proxy would be worse than nothing.
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts. Agreed, a specialized solution would probably have the best possible hit rate, but it would need a lot of functionality which already exists elsewhere, e.g. some cleanup or age out mechanism to delete packages which are no longer needed.
Per managed to convince Squid to do the job, but he said it was not easy. He wrote about it somewhere.
It would be good for anyone maintaining several machines. Definitely.--
Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
On Sat, 15 Jun 2019 22:34:31 +0200 Carlos E. R. wrote:
On 15/06/2019 18.08, dieter wrote:
On Sat, 15 Jun 2019 12:46:14 +0200 Carlos E. R. wrote:
On 15/06/2019 10.52, dieter wrote:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult. Yes you are right, the hit rate is not 100%. But when the update of all systems happens within a limited time frame then most of the packages are still fetched from the same download URL and therefore downloaded only once and then taken from the proxy cache.
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing. I have the impression that this can be avoided by setting ZYPP_MULTICURL=0 At least I am absolutely sure that the amount of upstream data transfer was significantly less than the data transfer between the proxy system and the updating system once the packages were cached after updating another/the first system. For me this solution was working very well.
And to be of use, if I download the same kernel in two weeks I want it to be cached. I could get MirrorBrain pointing me to 3 other mirrors and downloading it again 3 times more, not saving any resources. As it is, a download proxy would be worse than nothing. Maybe I was lucky and in my region there was one mirror with significantly better rating than others resulting that this one was the main source chosen by MirrorBrain.
Regards, Dieter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
dieter wrote:
On Sat, 15 Jun 2019 22:34:31 +0200 Carlos E. R. wrote:
On 15/06/2019 18.08, dieter wrote:
On Sat, 15 Jun 2019 12:46:14 +0200 Carlos E. R. wrote:
[snip]
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing.
I have the impression that this can be avoided by setting ZYPP_MULTICURL=0
Yes, I think this will prevent the chunked download, but with squid you would still be caching multiple mirrors. -- Per Jessen, Zürich (17.2°C) member, openSUSE Heroes -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 6/16/19 10:17 AM, Per Jessen wrote:
dieter wrote:
I have the impression that this can be avoided by setting ZYPP_MULTICURL=0
Yes, I think this will prevent the chunked download, but with squid you would still be caching multiple mirrors.
Is there a documentation describing ZYPP_MULTICURL=0? I'm asking because during the last months I'm testing with an Apache-based reverse proxy setup (mod_proxy, mod_cache). In this setup zypper does not use an HTTP proxy like squid but rather the repo URLs directly point to the Apache server. I do this because of reasons Per gave in [1] and I did not know Per's squid setup. The Apache-based approach somewhat works but the cache rate is less than optimal. Especially larger packages like kernel does not get cached although mod_cache directive is quite high. So my hope is that ZYPP_MULTICURL=0 helps with that. BTW: Having such a Apache setup allows to explicitly pull different update repos from different upstream servers like this: ProxyPass /repositories https://download.opensuse.org/repositories ProxyPass /tumbleweed https://ftp.uni-erlangen.de/opensuse/tumbleweed Ciao, Michael. [1] https://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid#T... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 6/16/19 11:30 AM, Michael Ströder wrote:
On 6/16/19 10:17 AM, Per Jessen wrote:
dieter wrote:
I have the impression that this can be avoided by setting ZYPP_MULTICURL=0
Yes, I think this will prevent the chunked download, but with squid you would still be caching multiple mirrors.
Is there a documentation describing ZYPP_MULTICURL=0?
I found https://doc.opensuse.org/projects/libzypp/HEAD/zypp-envars.html but this text is a bit terse: "Turn off multicurl (metalink and zsync) and fall back to plain libcurl.". Ciao, Michael. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 16/06/2019 15.23, Michael Ströder wrote:
On 6/16/19 11:30 AM, Michael Ströder wrote:
On 6/16/19 10:17 AM, Per Jessen wrote:
dieter wrote:
I have the impression that this can be avoided by setting ZYPP_MULTICURL=0
Yes, I think this will prevent the chunked download, but with squid you would still be caching multiple mirrors.
Is there a documentation describing ZYPP_MULTICURL=0?
I found https://doc.opensuse.org/projects/libzypp/HEAD/zypp-envars.html but this text is a bit terse: "Turn off multicurl (metalink and zsync) and fall back to plain libcurl.".
It is clear if you remember the history :-) -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
On Sun, 16 Jun 2019 10:17:57 +0200 Per Jessen wrote:
dieter wrote:
On Sat, 15 Jun 2019 22:34:31 +0200 Carlos E. R. wrote:
On 15/06/2019 18.08, dieter wrote:
On Sat, 15 Jun 2019 12:46:14 +0200 Carlos E. R. wrote:
[snip]
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing.
I have the impression that this can be avoided by setting ZYPP_MULTICURL=0
Yes, I think this will prevent the chunked download, but with squid you would still be caching multiple mirrors.
A caching solution as suggested by Pete would be really great. But I am a fan of the UNIX philosophy "do one thing and do it well". And in this respect to add the caching/serving functionality to the zypper download library does not feel right for me. I do not want so sell "my solution", I am only describing what worked for me so far. As I already mentioned I did not use squid but wwwoffle. For me it was not relevant to actually see the cached rpms in the cache directory as Pete would prefer. For Leap updates (my main use case) a single mirror can easily saturate my download link, so ZYPP_MULTICURL=0 is no limitation. This is also a difference to Pete's intended focus Tumbleweed were it could be more desirable to use several mirrors in parallel when everybody is downloading a new snapshot. Without setting ZYPP_MULTICURL wwwoffle served corrupted RPMs. Just some weeks ago I updated several installations, partly VMs, from Leap 15.0 to 15.1 by "zypper dup". I used a Leap 42.3 VM with wwwoffle (it was still part of the distro then) as caching proxy. These updates happened within some days. The given numbers are from memory, so I may be wrong. But I checked them then because I was interested in the data and found them satisfying. The updated systems are not identical installations, but similar. According to zypper dup the "Overall download size" varied between 1.7GB and 2.4GB. For the first update the ifconfig output of transferred data (RX/TX) in the 42.3-wwwoffle-VM was very close to the amount predicted by zypper (slightly higher, possibly IP overhead and the like). For the following updates the TX amount was again close to the amount predicted by zypper (e.g. 2.3GB) and the RX amount something like 380MB, which could be package differences (themes, fonts, ...) or repeated downloads of packages from different mirrors. For me it was good enough not to invest time to find out exactly. Also watching the download process of zypper dup most packages were received by the updating system with LAN speed, not with download speed. And looking now in my wwwoffle cache directory which was not purged yet: yes, not all packages were downloaded from the same mirror, but at least for the bigger files I checked they exist only once. But this approach is dated, Leap 42.3 will be out of maintenance soon and as soon as the openSUSE download links become https it will probably stop working anyway. When wwwoffle was removed with Leap 15.0 I checked whether squid could be a replacement for my use case, and decided it is way beyond what I need and not worth the hassle. Regards, Dieter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Carlos E. R. wrote:
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing.
And to be of use, if I download the same kernel in two weeks I want it to be cached. I could get MirrorBrain pointing me to 3 other mirrors and downloading it again 3 times more, not saving any resources. As it is, a download proxy would be worse than nothing.
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts. Agreed, a specialized solution would probably have the best possible hit rate, but it would need a lot of functionality which already exists elsewhere, e.g. some cleanup or age out mechanism to delete packages which are no longer needed.
Per managed to convince Squid to do the job, but he said it was not easy. He wrote about it somewhere.
Yes, I still think my solution is a little more complex than it ought to be, but for anyone doing regular installations on a slow(ish) link, I think it's worth the hassle. Back then I saw a 60% improvement, and we were still on 100Mbit ethernet. -- Per Jessen, Zürich (17.4°C) https://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am Sonntag, 16. Juni 2019, 10:12:42 CEST schrieb Per Jessen:
Carlos E. R. wrote:
Not good enough. Say the kernel, it is downloaded simultaneously from 3 mirrors. A proxy cache would end storing 3 identical copies of the kernel, wasting space and download pipe. It does not recognize that they are the same thing.
And to be of use, if I download the same kernel in two weeks I want it to be cached. I could get MirrorBrain pointing me to 3 other mirrors and downloading it again 3 times more, not saving any resources. As it is, a download proxy would be worse than nothing.
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts.
Agreed, a specialized solution would probably have the best possible hit rate, but it would need a lot of functionality which already exists elsewhere, e.g. some cleanup or age out mechanism to delete packages which are no longer needed.
Per managed to convince Squid to do the job, but he said it was not easy. He wrote about it somewhere.
Yes, I still think my solution is a little more complex than it ought to be, but for anyone doing regular installations on a slow(ish) link, I think it's worth the hassle. Back then I saw a 60% improvement, and we were still on 100Mbit ethernet.
I need to analyse these processes again, but as far as I remember, the problem with this setup is, that the more mirrors are active, the less effective it is. You end up with a lot of redundancy and will miss any connections via https (e.g. gwdg), which is a global trend. I've attempted to remove some redundancy with: https://github.com/frispete/squid_dedup where the idea is to relocate CDN URLs to some (internal) common name. Subsequent accesses will find the objects, no matter which CDN URL they use (given, all possible CDN URLs are configured correctly). The major unsolved issue here is https again. Again, with the ZYPPERCACHE/ZYPPERRELAY idea, we would be able to raise the hit rate to 100% (after downloaded once), no matter which proxy technology we finally use. Cheers, Pete -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
16.06.2019 13:32, Hans-Peter Jansen пишет:
Again, with the ZYPPERCACHE/ZYPPERRELAY idea, we would be able to raise the hit rate to 100% (after downloaded once), no matter which proxy technology we finally use.
There is --keep-packages option; is it not sufficient? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 16/06/2019 12.41, Andrei Borzenkov wrote:
16.06.2019 13:32, Hans-Peter Jansen пишет:
Again, with the ZYPPERCACHE/ZYPPERRELAY idea, we would be able to raise the hit rate to 100% (after downloaded once), no matter which proxy technology we finally use.
There is --keep-packages option; is it not sufficient?
That is what I use (keep packages and share /var/cache/zypp/packages via nfs), but it has issues - Hans-Peter Jansen mentioned them: * You can not update two systems at the same time, there can be crashes. * You can not have any install not using --keep-packages while sharing, because then the entire cache is deleted. * You need to have care that all distribution use the exact same "alias" name for the repository, or use symlinks when not. * It is a hack, and as such gets ignored on bugzilla. * Some zypper operations can delete the cache, such as repo name change. -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
Hans-Peter Jansen wrote:
Am Sonntag, 16. Juni 2019, 10:12:42 CEST schrieb Per Jessen:
Carlos E. R. wrote:
Per managed to convince Squid to do the job, but he said it was not easy. He wrote about it somewhere.
Yes, I still think my solution is a little more complex than it ought to be, but for anyone doing regular installations on a slow(ish) link, I think it's worth the hassle. Back then I saw a 60% improvement, and we were still on 100Mbit ethernet.
I need to analyse these processes again, but as far as I remember, the problem with this setup is, that the more mirrors are active, the less effective it is. You end up with a lot of redundancy and will miss any connections via https (e.g. gwdg), which is a global trend.
I wonder if you may have misunderstood my solution. (apologies for going off-topic, maybe this is better places elsewhere). Redundancy - yes, when a complete file is fetched (to get it cached), that is redundant. With a clean cache, that does lead to up to twice the amount of traffic. On install#2, you will recoup that, and any subsequent install is for free. Less effective? No, I don't think so. All mirrors are mapped to one, never mind how many there might be. How do you see it becoming less effective? https - well, as long as our mirror infrastructure doesn't support https, it doesn't matter much what the global trend might be :-)
I've attempted to remove some redundancy with: https://github.com/frispete/squid_dedup
where the idea is to relocate CDN URLs to some (internal) common name. Subsequent accesses will find the objects, no matter which CDN URL they use (given, all possible CDN URLs are configured correctly).
So exactly what I do ? -- Per Jessen, Zürich (18.4°C) https://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Carlos E. R. wrote:
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult.
We would need some kind of special proxy cache that gets the requests from zypper directly, then does the download in the same manner that zypper would do it, save it locally in an structure that mimics the upstream directories, and serve the requests to the local LAN zyppers or yasts.
AFAIR, I have mentioned this to Hans-Peter already, but just in case - https://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid My company is since running a public openSUSE mirror, but when we used the above, it worked really well. The squid setup is still running - once it is setup, no maintenance required. -- Per Jessen, Zürich (21.9°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 15/06/2019 20:16, Carlos E. R. wrote:
On 15/06/2019 10.52, dieter wrote:
On Sat, 15 Jun 2019 09:06:18 +0200 Hans-Peter Jansen wrote:
Hi,
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse. I fully support the idea of a local caching mechanism.
But I am not convinced it is a good solution to pack this functionality into zypper. In principle "some" local caching (http) proxy on one of the systems would be sufficient for this purpose, and zypper on the different systems accesses it just by proxy settings.
In the past I used wwwoffle to achieve such local caching because it was very easy to setup. The systems on the LAN used it for the updates. But wwwoffle is gone from the official repos, squid seems too much for this purpose, and I found no proper replacement yet.
The problem is that the actual download URL changes, depending on what the MirrorBrain answers each time is the best mirror. This makes using a proxy server more difficult.
A solution could be to just pick the best local mirror, for years I had my ISP's mirror hard coded rather then the official ones because data from there didn't count toward my usage quota. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
W dniu 15.06.2019 o 09:06, Hans-Peter Jansen pisze:
Hi,
in the light of current events, I would like to start to discuss a scheme for improving the regular Tumbleweed upgrade performance.
Given the following scenario: a local LAN with a couple of Tumbleweed installations, and optionally a server.
I) a modification of the zypper download lib to honor an environment variable, e.g. ZYPPERCACHE, and relaying the downloads to that system (server), if set II) a zypper caching server, that uses the zypper download lib, but implicitly keeps the downloads for later reuse.
If ZYPPERCACHE is set and reachable, all files must be fetched though it. If ZYPPERCACHE is set and not reachable, zypper may (interactively) warn about it, and proceed as usual.
Such a scheme would greatly relieve the load on the public infrastructure, and improve the upgrade performance of local systems significantly.
I'm using a shared /var/cache/zypp/packages scheme already, that improves the situation already, but suffers from a couple of issues: two zypper processes are treading on each others feet, the repo definitions have to be synced carefully, it's hacky, etc...
I think, that his great product deserves a decent solution to this problem. If it works well, other openSUSE distributions may profit as well.
Cheers, Pete
I'd like to remind everyone about this: https://lizards.opensuse.org/2019/04/03/experimental-opensuse-mirror-via-ipf... It's currently limited to only main Tumbleweed OSS repo and sometimes data is not immediately available, but if further developed and improved I think it could be an interesting alternative.
participants (8)
-
Adam Mizerski
-
Andrei Borzenkov
-
Carlos E. R.
-
dieter
-
Hans-Peter Jansen
-
Michael Ströder
-
Per Jessen
-
Simon Lees