Feature changed by: Jim Henderson (hendersj) Feature #306896, revision 27 Title: Zypp-proxy - A proxy cache server for zypper updates Hackweek IV: Unconfirmed Priority Requester: Important Requested by: Alex Tsariounov (tsariounov) Product Manager: Federico Lucifredi (flucifredi) Developer: Alex Tsariounov (tsariounov) Partner organization: openSUSE.org Description: This project will create zypp-proxy which is a server proxy used for caching update packages that are used by machines on the local network on a locally designated host machine that acts as a proxy to the openSUSE updates repositories. The project is similar in function and requirements to the Debian project apt-proxy, details of which can be found here: http://apt-proxy.sourceforge.net/ This project is useful for those who run many local (both physical and virtual) openSUSE machines and like to keep them up to date with updates; however, they do not wish to waste bandwidth for downloading the same updates over and over to all local machines whether when keeping exisiting machines up to date, or building new machines and having to re-download all updates yet again. For some people, simply mirroring the entire openSUSE updates repository is sufficient to provide local network updates; however, for most, since they do not use near as many packages as that repository provides, doing this simply wastes disk space. These people will find zypp-proxy most useful. Both server and client setup will be quite simple. The server will use the public openSUSE updates repositories to check for updates. The clients will point to the local server (the proxy) machine for updates rather than the public servers. When a client requests an update, the zypp-proxy server first checks if the public server has a more up to date package that what it has cached locally. If the public server doesn't, then zypp-proxy serves the locally cached package. If the public server does have a more up to date package, then zypp-proxy first downloads it to its local cache and then serves it to the client. How many old versions of packages to keep will be a configurable. The first implementation will support openSUSE 11.1 only, with support for other openSUSE releases following suit. Discussion: #3: Alex Tsariounov (tsariounov) (2009-07-16 18:25:40) I believe, and I could be wrong, that SMT actually creates a complete mirror of the updates repo locally. This may be ok for a datacenter SLES customer or install, but since openSUSE's repos are so much bigger, this will trade network bandwidth wastage for disk space wastage. Either one does not sit well with the primary target of openSUSE who is the Linux enthusiast. Secondarily, SMT's name is "Subscription Management Tool", for openSUSE there are no subscriptions, so the name becomes misleading. Third, SMT is built and installed as an Add-on product, this complexity is not needed for a simple proxy server. A simple rpm install is all that should happen. Having said that, perhaps there is some code that can be shared. Does SMT use libzypp? I was planning on using libzypp and hence implementing zypp-proxy in either C++ or python. Python is preferred but I don't know the status of libzypp's python bindings. Perhaps SMT can stand some modificaitons to not creaet a complete mirror of the updates repository, but only mirror the updates that are actcually used? #9: Peter Bowen (pzb) (2009-07-19 09:04:00) (reply to #3) How would this be different that just squid? If you are only opportunisticly (or passively) caching the data, then this seems just like a normal HTTP cache. #10: Alex Tsariounov (tsariounov) (2009-07-19 22:27:37) (reply to #9) There are many reasons why zypp-proxy is different from squid. Most of them hinge on the fact that zypp-proxy understands packages. First, squid caches all http objects, not just packages. If you clean out the cache for privacy, you'll lose your pacakge cache. Zypp-proxy caches only packages, so there's not need to clean out the cache, it keeps it clean automatically as per the next item. Second, since squid does not know anything about packages, you cannot keep for example the last 3 versions of packages in the cache. Zypp- proxy does that automatically, it shoud default to keeping the last 3 versions, but you can set that as a confgurable to only keep the latest version or the last 10 versions around. You can potentially also do things like freeze a pacakge or a set of package or even a pattern at a specific version level, or a pattern of version levels. This last bit is out of scope for this hack week project though. Third, squid is hard to set up. How do you specify how much disk space to use, how often to clean out the cache, what to cache, etc? Zypp- proxy's goal is to be a zero-conf app in that you will only need to install it and start using it. It can be such because it's purpose is so specific, unlike squid. #4: Federico Lucifredi (flucifredi) (2009-07-17 21:43:15) We have considered and are planning to open up SMT more to the community, and as such to be able to leverage it for openSUSE as well. SMT has always been entirely GPL, so there are no licensing issues at all. SMT-11 has mirror filtering, so the full-repo question is no longer relevant. A proxy re-implementation from scratch is a waste of time, to be perfectly honest, and certainly one that we as Novell should not spend time on. If you want to work on a cache for openSUSE, you should really speak to the SMT team on how to best contribute to make SMT useful for the community distribution as well. Duncan is probably your best bet for guidance there. #5: Alex Tsariounov (tsariounov) (2009-07-17 23:01:51) Hi Frederico, I have a couple of questions for you. How does "mirror filtering" work? What I have in mind for zypp-proxy is that only updates that are actually used by clients are cached. This minimizes disk usage. This is also has the nice property of having an automatic configuration, so for example, the admin does not have to set up any kind of "mirroring rules" for the server. How are you going to address that SMT stands for "Subscription," and on openSUSE there are no subscriptions? This will create user confusion. Are you going to remove the burden of SMT being an Add-on product? IMHO, there really is no need to go to that extent to install a caching server. Simply making the package (in the case of zypp-proxy it would only be one package), or a pattern of packages if you use more than one package, as you do for SMT, would be sufficient to install the server. For example, if I want to install squid, i simply say "# zypper in squid", that's all, and possibly squid is more complex than SMT, and for sure it is more complex than what zypp-proxy would be. I have waited for a long time for a caching updates server to become availabe for openSUSE. This type of function is fundamental to a disto, and I am confused somewhat that it still does not exist. Apt- proxy was in Debian from the beginning because there was and continues to be a need for it. The same with openSUSE. Even yum has a caching mechanism for Fedora. Just search online for others looking for this functionality on openSUSE, you will find a lot of emails, just as I did. I think SMT as a very nice addition for our SLES/SLED product lines. However, the zypp-proxy project is my itch and I do not see how SMT can solve it utill I have understanding of the questions I posed above. Thanks. #6: Federico Lucifredi (flucifredi) (2009-07-17 23:49:27) (reply to #5) Alex, I cannot stop you from creating more duplication, that's the way the community works - but to do so internally, with Novell-sponsored time, itch or not, is simply nonsense. I would *strongly* encourage that you use your ITO for something actually useful, and since Duncan wants to get community involvement in SMT, that would be something where you can scratch your itch in a constructive way. The naming is a minor question. Packaging SMT so that it can be used for openSUSE as well, that is the interesting bit we need to tackle. Marketing or naming is not a valid reason to start something else. Filtering works that you select patterns or severity levels for what needs to be mirrored. If you want to look into automating selection of dependancies, that may be interesting as well -- if you can make it happen. Proxy caches are fundamental to a distro used in production. As a company, we try to have distros used in production to be our paid for offering, since the business unit both you and I work for still has to break even. That is why SMT for the openSUSE community has been something that has had to wait... but if you want to help on this topic, we can definitely use a hand! #7: Federico Lucifredi (flucifredi) (2009-07-17 23:50:24) (reply to #6) select patterns meaning selecting *name* patterns. Not zypper patterns. #8: Alex Tsariounov (tsariounov) (2009-07-18 00:50:08) (reply to #6) Seems that the wind has let down on the zypp-proxy sails. However, I don't see that SMT's mirror filtering is close to the cache-proxy model. I suppose I don't see the use case. The use case for the cache- proxy is as follows: I have two identical virtual machines on a fresh proxy server, I update one of the VM's and all the updates get cached, I update the next virtual machine and no external network bandwidth gets used, and so on. A configurable on the server sets how many old versions of packages survive the periodic clean up thread. Do you have a preliminary schedule for the openSUSE release of SMT? Would your team be open to implentation of the cache-proxy model? And, finally, Duncan, do you have a git tree somewhere with the SMT code so I can take a peek? Thanks. #12: Ján Kupec (jkupec) (2009-07-20 15:45:21) (reply to #8) Hi Alex, you're right that current SMT repo filtering does not suite your use case. It still mirrors the full repository and creates a new, filtered one, based on current admin's update selections. I agree that this is not very usefull for small home networks or a few virtual machines. Maybe SMT could be improved to set up filters automatically based on packages on clients, and avoid mirroring the unneeded packages. Also, SMT does not keep older versions, but this would also be very nice addition to SMT! Do you have some ideas how to do the caching that you could share with us? E.g. how would the updates be published to clients, how would they be installed, what about the differences between individual clients? BTW, i would not worry about changing of the naming and packaging of SMT to fit openSUSE. Based on what i know from colleagues i believe we're all open to this. After all SMT (the enterprise repo caching thingy which wants to talk to NCC), can be just a layer on top of this openSUSE thingy, for example. I, for instance, planned to pull the repo mirroring and filtering code out of SMT and make a GPL Perl library out of it during the hack week (i plan to put it on git.opensuse.org). Maybe we should join forces. #13: Alex Tsariounov (tsariounov) (2009-07-20 19:53:35) (reply to #12) Hi Jan, it does seem that mirroring will not fill the bill. One could set up such automatic filtering; however, proxy-caching is much simpler. The basic idea is to set up a proxy for updates. This simplifies a number of things and the server simply caches whatever packages that it has to serve. One sets up the clients to point to the local proxy server instead of http://download.opensuse.org/update/11.1/, the server points to the real update site. The server then: 1. On a request for available packages from a client will forward the package list that the server downloads from the updates site. 2. On a request for a set of packages, the server will first check that these packages exist it its cache. If they do not, then the packages are downloaded from the update site and cached by the server. The set of packages is then sent to the client. 3. Periodically, a thread or process runs on the server and "cleans" out the cache. It does this by making a list of packages cached and their versions. Any version that is "X" or more older than current version gets deleted. I was thinking that 3 would be the default for "X"; however, perhaps 2 makes a better default. This strategy is simplistic; however its all one really needs. If there are for example two clients with completely different packages installed, then the server will simply cache all updates for both clients. This will increase disk space requirements, but that's ok, because you need all those packages. The overall goal for the proxy- caching server is to reduce network bandwidth by removing the need to re-download the same stuff over and over again. I would think that libzypp and libsatsolver would have a lot of functions that can be made use of profitably in implementing such a proxy-server. I'm not familiar with any language bindings for libzypp. Satsolver seems to have a nice set of bindings however. #11: Alex Tsariounov (tsariounov) (2009-07-19 22:56:10) (reply to #6) Frederico, I do not see zypp-proxy as duplication. But even it if is, we have a number of projects in suse that "duplicate" each other to some extent, and that's ok since they usually cater to different audiences. The audience for zypp-proxy is different that for SMT. SMT caters toward the enterprise subscription customer. Zypp-proxy caters more toward the individual user and developer. Zypp-proxy is different enough from SMT to be very useful indeed, and certainly it is not "nonsense," as you say. Just look at Debian's apt-proxy; just look at people asking for it online and being puzzled why it's not available and why no one is working on it. Why did you not set up SMT from the beginning with this type of functionality, after all, the need was known a long time ago. Naming is actually an important question, it is not minor. And while naming or marketing may not be a valid reason for starting something else, the technical reason usually is, at least for engineering. So far, you have not shown that SMT, even for the public openSUSE release, will contain the functionality that I described for zypp-proxy. #14: Alex Tsariounov (tsariounov) (2009-07-22 18:43:36) I have other Novell customer commitments that are taking my time during hack week, so I will not be participating in it. Thanks. #15: Peter Poeml (poeml) (2009-07-29 20:29:47) Sounds basically like IntelligentMirror (https://fedorahosted.org/intelligentmirror/wiki/IntelligentMirror). Yes, this is something that we lack for openSUSE. It would be very good to have it in the future. And yes, HTTP 1.1 caching semantics are perfectly fine for this purpose. And no, SMT server is something completely different. (It's just a mirror.) And yes, this can also be done as Squid plugin (called "redirector" in Squid lingo). I know somebody who has worked on a metalink redirector for Squid, and it might be good to connect with him, because our downloads will be metalink-based beginning with 11.2. #16: Rudi Pittman (famewolf) (2009-10-19 06:56:53) I would like to know how this can be done with squid/squirm. I installed intelligentmirror in squid with yum and it appeared to work but always considered the package "new". #17: Carlos Robinson (robin_listas) (2014-04-16 01:17:57) I would find this feature quite useful. About squid, see this link http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid (http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid) , but it is quite complicated to setup. 2014-04-16 01:15:50 + #18: Jim Henderson (hendersj) (2016-10-14 23:58:23) + I realize this is a very old request, but this is something I'm going + to have a need for myself starting fairly shortly. I just started + looking into using Squid for this purpose myself - download speed isn't + a problem, but my ISP has announced they're introducing bandwidth caps + shortly, and as I have several systems that I update locally, I'd + prefer to only download each update once. rsync mirroring isn't + practical, since I don't install all packages, and downloading the full + openSUSE 42.1 repo set, Packman, and other repos I use (some of which, + like Chrome, seem more difficult to mirror) would likely take more data + than opportunistic caching of packages. The use case here is different + than what SMT would provide (which basically is full repo mirrors, + which we can do today with rsync anyways). + Changing ISPs is not really an option - there's no other provider where + I live that can deliver the speeds I get with my current provider. So I + have to live with a bandwidth cap and manage my traffic more + proactively. -- openSUSE Feature: https://features.opensuse.org/306896