Mailinglist Archive: opensuse-features (28 mails)

< Previous Next >
[openFATE 306896] Zypp-proxy - A proxy cache server for zypper updates
Feature changed by: Carlos Robinson (robin_listas)
Feature #306896, revision 26
Title: Zypp-proxy - A proxy cache server for zypper updates

Hackweek IV: Unconfirmed
Priority
Requester: Important

Requested by: Alex Tsariounov (tsariounov)
Product Manager: Federico Lucifredi (flucifredi)
Developer: Alex Tsariounov (tsariounov)
Partner organization: openSUSE.org

Description:
This project will create zypp-proxy which is a server proxy used for
caching update packages that are used by machines on the local network
on a locally designated host machine that acts as a proxy to the
openSUSE updates repositories.  The project is similar in function and
requirements to the Debian project apt-proxy, details of which can be
found here: http://apt-proxy.sourceforge.net/
This project is useful for those who run many local (both physical and
virtual) openSUSE machines and like to keep them up to date with
updates; however, they do not wish to waste bandwidth for downloading
the same updates over and over to all local machines whether when
keeping exisiting machines up to date, or building new machines and
having to re-download all updates yet again.
For some people, simply mirroring the entire openSUSE updates
repository is sufficient to provide local network updates; however, for
most, since they do not use near as many packages as that repository
provides, doing this simply wastes disk space.  These people will find
zypp-proxy most useful.
Both server and client setup will be quite simple.  The server will use
the public openSUSE updates repositories to check for updates.  The
clients will point to the local server (the proxy) machine for updates
rather than the public servers.
When a client requests an update, the zypp-proxy server first checks if
the public server has a more up to date package that what it has cached
locally.  If the public server doesn't, then zypp-proxy serves the
locally cached package.  If the public server does have a more up to
date package, then zypp-proxy first downloads it to its local cache and
then serves it to the client.  How many old versions of packages to
keep will be a configurable.  The first implementation will support
openSUSE 11.1 only, with support for other openSUSE releases following
suit.

Discussion:
#3: Alex Tsariounov (tsariounov) (2009-07-16 18:25:40)
I believe, and I could be wrong, that SMT actually creates a complete
mirror of the updates repo locally.  This may be ok for a datacenter
SLES customer or install, but since openSUSE's repos are so much
bigger, this will trade network bandwidth wastage for disk space
wastage.  Either one does not sit well with the primary target of
openSUSE who is the Linux enthusiast. Secondarily, SMT's name is
"Subscription Management Tool", for openSUSE there are no
subscriptions, so the name becomes misleading.
Third, SMT is built and installed as an Add-on product, this complexity
is not needed for a simple proxy server. A simple rpm install is all
that should happen.
Having said that, perhaps there is some code that can be shared.  Does
SMT use libzypp?  I was planning on using libzypp and hence
implementing zypp-proxy in either C++ or python.  Python is preferred
but I don't know the status of libzypp's python bindings.
Perhaps SMT can stand some modificaitons to not creaet a complete
mirror of the updates repository, but only mirror the updates that are
actcually used?

#9: Peter Bowen (pzb) (2009-07-19 09:04:00) (reply to #3)
How would this be different that just squid? If you are only
opportunisticly (or passively) caching the data, then this seems just
like a normal HTTP cache.

#10: Alex Tsariounov (tsariounov) (2009-07-19 22:27:37) (reply to #9)
There are many reasons why zypp-proxy is different from squid. Most of
them hinge on the fact that zypp-proxy understands packages.
First, squid caches all http objects, not just packages.  If you clean
out the cache for privacy, you'll lose your pacakge cache.  Zypp-proxy
caches only packages, so there's not need to clean out the cache, it
keeps it clean automatically as per the next item.
Second, since squid does not know anything about packages, you cannot
keep for example the last 3 versions of packages in the cache.  Zypp-
proxy does that automatically, it shoud default to keeping the last 3
versions, but  you can set that as a confgurable to only keep the
latest version or the last 10 versions around.  You can potentially
also do things like freeze a pacakge or a set of package or even a
pattern at a specific version level, or a pattern of version
levels.  This last bit is out of scope for this hack week project
though.
Third, squid is hard to set up.  How do you specify how much disk space
to use, how often to clean out the cache, what to cache, etc?  Zypp-
proxy's goal is to be a zero-conf app in that you will only need to
install it and start using it.  It can be such because it's purpose is
so specific, unlike squid.

#4: Federico Lucifredi (flucifredi) (2009-07-17 21:43:15)
We have considered and are planning to open up SMT more to the
community, and as such to be able to leverage it for openSUSE as
well.  SMT has always been entirely GPL, so there are no licensing
issues at all.
SMT-11 has mirror filtering, so the full-repo question is no longer
relevant.
A proxy re-implementation from scratch is a waste of time, to be
perfectly honest, and certainly one that we as Novell should not spend
time on. If you want to work on a cache for openSUSE, you should really
speak to the SMT team on how to best contribute to make SMT useful for
the community distribution as well. Duncan is probably your best bet
for guidance there.

#5: Alex Tsariounov (tsariounov) (2009-07-17 23:01:51)
Hi Frederico,
I have a couple of questions for  you.
How does "mirror filtering" work?  What I have in mind for zypp-proxy
is that only updates that are actually used by clients are
cached.  This minimizes disk usage.  This is also has the nice property
of having an automatic configuration, so for example, the admin does
not have to set up any kind of "mirroring rules" for the server.
How are you going to address that SMT stands for "Subscription," and on
openSUSE there are no subscriptions?  This will create user confusion.
Are you going to remove the burden of SMT being an Add-on
product?  IMHO, there really is no need to go to that extent to install
a caching server.  Simply making the package (in the case of zypp-proxy
it would only be one package), or a pattern of packages if you use more
than one package, as you do for SMT, would be sufficient to install the
server.  For example, if I want to install squid, i simply say "#
zypper in squid", that's all, and possibly squid is more complex than
SMT, and for sure it is more complex than what zypp-proxy would be.
I have waited for a long time for a caching updates server to become
availabe for openSUSE.  This type of function is fundamental to a
disto, and I am confused somewhat that it still does not exist.  Apt-
proxy was in Debian from the beginning because there was and continues
to be a need for it.  The same with openSUSE.  Even yum has a caching
mechanism for Fedora.  Just search online for others looking for this
functionality on openSUSE, you will find a lot of emails, just as I
did.
I think SMT as a very nice addition for our SLES/SLED product
lines.  However, the zypp-proxy project is my itch and I do not see how
SMT can solve it utill I have understanding of the questions I posed
above.
Thanks.

#6: Federico Lucifredi (flucifredi) (2009-07-17 23:49:27) (reply to
#5)
Alex, I cannot stop you from creating more duplication, that's the way the
community works - but to do so internally, with Novell-sponsored time,
itch or not, is simply nonsense. I would  *strongly* encourage that you
use your ITO for something actually useful, and since Duncan wants to
get community involvement in SMT, that would be something where you can
scratch your itch in a constructive way.
The naming is a minor question. Packaging SMT so that it can be used
for openSUSE as well, that is the interesting bit we need to tackle.
Marketing or naming is not a valid reason to start something else.
Filtering works that you select patterns or severity levels for what
needs to be mirrored. If you want to look into automating selection of
dependancies, that may be interesting as well -- if you can make it
happen.
Proxy caches are fundamental to a distro used in production. As a
company, we try to have distros used in production to be our paid for
offering, since the business unit both you and I work for still has to
break even. That is why SMT for the openSUSE community has been
something that has had to wait... but if you want to help on this
topic, we can definitely use a hand!


#7: Federico Lucifredi (flucifredi) (2009-07-17 23:50:24) (reply to
#6)
select patterns meaning selecting *name* patterns. Not zypper patterns.

#8: Alex Tsariounov (tsariounov) (2009-07-18 00:50:08) (reply to #6)
Seems that the wind has let down on the zypp-proxy sails.  However, I
don't see that SMT's mirror filtering is close to the cache-proxy
model.  I suppose I don't see the use case.  The use case for the cache-
proxy is as follows: I have two identical virtual machines on a fresh
proxy server, I update one of the VM's and all the updates get cached,
I update the next virtual machine and no external network bandwidth
gets used, and so on.  A configurable on the server sets how many old
versions of packages survive the periodic clean up thread.
Do you have a preliminary schedule for the openSUSE release of
SMT?  Would your team be open to implentation of the cache-proxy
model?  And, finally, Duncan, do you have a git tree somewhere with the
SMT code so I can take a peek?
Thanks.

#12: Ján Kupec (jkupec) (2009-07-20 15:45:21) (reply to #8)
Hi Alex, you're right that current SMT repo filtering does not suite
your use case. It still mirrors the full repository and creates a new,
filtered one, based on current admin's update selections. I agree that
this is not very usefull for small home networks or a few virtual
machines. Maybe SMT could be improved to set up filters automatically
based on packages on clients, and avoid mirroring the unneeded
packages.
Also, SMT does not keep older versions, but this would also be very
nice addition to SMT!
Do you have some ideas how to do the caching that you could share with
us? E.g. how would the updates be published to clients, how would they
be installed, what about the differences between individual clients?
BTW, i would not worry about changing of the naming and packaging of
SMT to fit openSUSE. Based on what i know from colleagues i believe
we're all open to this. After all SMT (the enterprise repo caching
thingy which wants to talk to NCC), can be just a layer on top of this
openSUSE thingy, for example. I, for instance, planned to pull the repo
mirroring and filtering code out of SMT and make a GPL Perl library out
of it during the hack week (i plan to put it on git.opensuse.org).
Maybe we should join forces.

#13: Alex Tsariounov (tsariounov) (2009-07-20 19:53:35) (reply to #12)
Hi Jan, it does seem that mirroring will not fill the bill.  One could
set up such automatic filtering; however, proxy-caching is much
simpler.
The basic idea is to set up a proxy for updates.  This simplifies a
number of things and the server simply caches whatever packages that it
has to serve.  One sets up the clients to point to the local proxy
server instead of http://download.opensuse.org/update/11.1/, the server
points to the real update site. 
The server then:
1. On a request for available packages from a client will forward the
package list that the server downloads from the updates site.
2. On a request for a set of packages, the server will first check that
these packages exist it its cache.  If they do not, then the packages
are downloaded from the update site and cached by the server.  The set
of packages is then sent to the client.
3. Periodically, a thread or process runs on the server and "cleans"
out the cache.  It does this by making a list of packages cached and
their versions.  Any version that is "X" or more older than current
version gets deleted.  I was thinking that 3 would be the default for
"X"; however, perhaps 2 makes a better default.
This strategy is simplistic; however its all one really needs.  If
there are for example two clients with completely different packages
installed, then the server will simply cache all updates for both
clients.  This will increase disk space requirements, but that's ok,
because you need all those packages.  The overall goal for the proxy-
caching server is to reduce network bandwidth by removing the need to
re-download the same stuff over and over again.
I would think that libzypp and libsatsolver would have a lot of
functions that can be made use of profitably in implementing such a
proxy-server.  I'm not familiar with any language bindings for
libzypp.  Satsolver seems to have a nice set of bindings however.

#11: Alex Tsariounov (tsariounov) (2009-07-19 22:56:10) (reply to #6)
Frederico, I do not see zypp-proxy as duplication.  But even it if is,
we have a number of projects in suse that "duplicate" each other to
some extent, and that's ok since they usually cater to different
audiences.  The audience for zypp-proxy is different that for SMT.  SMT
caters toward the enterprise subscription customer.  Zypp-proxy caters
more toward the individual user and developer. Zypp-proxy is different
enough from SMT to be very useful indeed, and certainly it is not
"nonsense," as you say.  Just look at Debian's apt-proxy; just look at
people asking for it online and being puzzled why it's not available
and why no one is working on it.  Why did you not set up SMT from the
beginning with this type of functionality, after all, the need was
known a long time ago.
Naming is actually an important question, it is not minor.  And while
naming or marketing may not be a valid reason for starting something
else, the technical reason usually is, at least for engineering.  So
far, you have not shown that SMT, even for the public openSUSE release,
will contain the functionality that I described for zypp-proxy.

#14: Alex Tsariounov (tsariounov) (2009-07-22 18:43:36)
I have other Novell customer commitments that are taking my time during
hack week, so I will not be participating in it.  Thanks.

#15: Peter Poeml (poeml) (2009-07-29 20:29:47)
Sounds basically like IntelligentMirror
(https://fedorahosted.org/intelligentmirror/wiki/IntelligentMirror). Yes,
this is something that we lack for openSUSE. It would be very good to
have it in the future.
And yes, HTTP 1.1 caching semantics are perfectly fine for this
purpose. And no, SMT server is something completely different. (It's
just a mirror.)
And yes, this can also be done as Squid plugin (called "redirector" in
Squid lingo). I know somebody who has worked on a metalink redirector
for Squid, and it might be good to connect with him, because our
downloads will be metalink-based beginning with 11.2.

#16: Rudi Pittman (famewolf) (2009-10-19 06:56:53)
I would like to know how this can be done with squid/squirm.  I
installed intelligentmirror in squid with yum and it appeared to work
but always considered the package "new".

+ #17: Carlos Robinson (robin_listas) (2014-04-16 01:17:57)
+ I would find this feature quite useful.
+ About squid, see this link
http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid
+ (http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid)
+ , but it is quite complicated to setup.
+ 2014-04-16 01:15:50




--
openSUSE Feature:
https://features.opensuse.org/306896

< Previous Next >
List Navigation
This Thread
  • No further messages