[Bug 307249] New: zypper "refresh" is a misnomer
https://bugzilla.novell.com/show_bug.cgi?id=307249 Summary: zypper "refresh" is a misnomer Product: openSUSE 10.3 Version: Beta 2 Platform: Other OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: libzypp AssignedTo: kkaempf@novell.com ReportedBy: poeml@novell.com QAContact: kkaempf@novell.com Found By: --- This bug report is twofold. 1) zypper refresh doesn't work in any useful way for me. 2) it severely wastes bandwidth on download.o.o root@linux-103:~ # zypper sl # | Enabled | Refresh | Type | Alias | Name --+---------+---------+--------+------------+----------- 1 | Yes | Yes | rpm-md | home:poeml | home:poeml I have just rebuilt the repository. Thus, zypper cannot have an uptodate copy. It has a command with the promising name "refresh", which can be expected to download new files. However, it doesn't do that: root@linux-103:~ # zypper refresh Repository 'home:poeml' is up to date. All repositories have been refreshed. Lie. Nothing downloaded! The repo is not up do date! root@linux-103:~ # zypper refresh -b Repository 'home:poeml' is up to date. Forcing building of repository cache * Cleaning repository 'home:poeml' cache * Building repository 'home:poeml' cache All repositories have been refreshed. Lie. Nothing downloaded! The repo is not up do date! Let's try force... root@linux-103:~ # zypper refresh -f Forcing raw metadata refresh Forcing building of repository cache * Cleaning repository 'home:poeml' cache * Building repository 'home:poeml' cache All repositories have been refreshed. root@linux-103:~ # 84.44.185.90 - - [03/Sep/2007:22:04:34 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml HTTP/1.1" 200 1198 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:34 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 206 2 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:35 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 206 2 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:35 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 200 893 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:36 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 200 189 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:36 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 200 893 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:36 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 200 189 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:36 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml HTTP/1.1" 200 1198 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:38 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/filelists.xml.gz HTTP/1.1" 200 1859 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:38 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/primary.xml.gz HTTP/1.1" 200 5295 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:04:38 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/patterns.test2.xml.gz HTTP/1.1" 200 99 "-" "Novell ZYPP Installer" "-" It downloaded something, finally. Now, let's do the same again. This time, metadata has _not_ changed in between. So let's see what the "refresh" command deals with this situation... root@linux-103:~ # zypper refresh Repository 'home:poeml' is up to date. All repositories have been refreshed. Oh no, how can it know? It didn't check. root@linux-103:~ # zypper refresh -b Repository 'home:poeml' is up to date. Forcing building of repository cache * Cleaning repository 'home:poeml' cache * Building repository 'home:poeml' cache All repositories have been refreshed. It didn't check either. It claims to have refreshed something? Wtf? How should one force it to check it whether the locally cached files are up to date? Let's try force... root@linux-103:~ # zypper refresh -f Forcing raw metadata refresh Forcing building of repository cache * Cleaning repository 'home:poeml' cache * Building repository 'home:poeml' cache All repositories have been refreshed. root@linux-103:~ # 84.44.185.90 - - [03/Sep/2007:22:13:12 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml HTTP/1.1" 200 1198 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:12 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 206 2 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 206 2 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 200 893 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 200 189 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.key HTTP/1.1" 200 893 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml.asc HTTP/1.1" 200 189 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:13 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/repomd.xml HTTP/1.1" 200 1198 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:15 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/filelists.xml.gz HTTP/1.1" 200 1859 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:15 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/primary.xml.gz HTTP/1.1" 200 5295 "-" "Novell ZYPP Installer" "-" 84.44.185.90 - - [03/Sep/2007:22:13:15 +0200] "GET /repositories/home:/poeml/openSUSE_Factory/repodata/patterns.test2.xml.gz HTTP/1.1" 200 99 "-" "Novell ZYPP Installer" "-" Gosh, it did download the whole stuff again. Although the files are the same. It didn't even care to check whether the cached copies are still up to date. Even though it has hashes over all files and can validate that they are unchanged... So, first, this behaviour doesn't make sense to me. 'refresh' doesn't seem to do anything useful to the user, and 'refresh -f' doesn't refresh at all, but 're-download' everything again, and should be named like that, or fixed... Second, I assume there must be some kind of rule behind this behaviour, and presumably sometimes the equivalent of a "refresh -f" will happen. After all, somehow zypp must be able to notice that my home:poeml repo offeres new packages after a while, no!? So, if yast/zypp/zypper then goes ahead and bluntly downloads all _unchanged_ metadata again, with each "refresh", this is becoming a severe problem for our infrastructure. Our resources are expensive and we must use them economically. Downloading unchanged files again and again increases the likelyhood that we run in to serious problems that affect all customers. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=307249#c2
Klaus Kämpf
https://bugzilla.novell.com/show_bug.cgi?id=307249#c3
Klaus Kämpf
https://bugzilla.novell.com/show_bug.cgi?id=307249#c4
--- Comment #4 from Klaus Kämpf
https://bugzilla.novell.com/show_bug.cgi?id=307249#c5
--- Comment #5 from Stanislav Visnovsky
https://bugzilla.novell.com/show_bug.cgi?id=307249#c6
--- Comment #6 from Peter Poeml
Please explain the waste of bandwith on download.o.o in more detail. Doesn't the redirector point clients to mirrors instead of the main host ?
We do redirect requests to metadata of released products, but we don't redirect for metadata for repositories with high turnover rate like 1) update tree 2) buildservice tree 3) factory tree because metadata changes frequently and needs to be revalidated often. There is no way to reliably provide a consistent state to the clients, since mirrors do not set appropriate HTTP headers. Inappropriate or missing cache control headers lead to arbitrary caching on the way. See http://en.opensuse.org/Build_Service/Mirror_List for more information about the rationale. Then, please note that for files that we _do_ redirect for, zypp burdens the bandwidth load on the mirrors instead, _and_ causes a hit on out redirector, and result in geolocation of the IP address, database lookup of available mirrors, choice of best mirrors, ... you see? Here is a explanation how the redirection is done: http://en.opensuse.org/Build_Service/Redirector The way the redirector works is highly optimized and scalable. But nevertheless, it is dangerous to waste these resources. They are never endless. It pays to use them with caution! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=307249#c7
--- Comment #7 from Peter Poeml
I suggest to add a proper information in case of 'refresh within the don't check period', e.g. "The repository was checked already, skipping the check. Use -f to override this behavior."
That would give a valuable hint about the behaviour, and address the "end-user" aspect of this bug (misleading feedback). (I heard from Klaus that the documentation has undergone improvements after Beta2. All the above refered to Beta2 level.)
I like the fact that forced refresh will again download everything. We need a robust mechanism if the raw cache is in inconsistent state so badly that even libzypp is unable to detect its status. This will prevent from instructing user to remove and re-add the repo.
This behaviour may make sense in some extreme cases, but something in-between is missing: Currently, there seem to be only two possible behaviours: "don't check at all" or "re-download everything". Which results into a bandwidth waste (or into outdated repo data which refers to packages which don't exist anymore...). Missing is: "validate freshness of cached files, and replace stale files with up-to-date copies" for every-day operation. That would address my other issue. The --force mode should be reserved as last reserve for error conditions. Please do not make the mistake to do this by default. If this is/was ever needed to work around a problem on the server end (redirector), like inappropriate HTTP headers or redirection or whatever, it is a problem to be fixed on the _server_ end. Working around in the client, e.g. re-downloading files, will _not_scale_! In fact, it can lead to more problems on the server end. Let's play together... ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=307249#c8
Klaus Kämpf
https://bugzilla.novell.com/show_bug.cgi?id=307249#c9
--- Comment #9 from Peter Poeml
The main problem from my POV is that there is no distinction between automatic and user-initiated refresh:
- automatic: check if we're within the 'dont check' period. If yes, do nothing. If no, update as needed
Yes. That would match the yum behaviour. - If age of cached metadata is lower than $metadata_expire, regard it as fresh. Don't validate it. - if cached metadata is older than $metadata_expire, validate freshness by downloading repomd.xml. If hashes in repomd.xml still match cached files, don't download anything else... $metadata_expire is configurable (globally and per repo) and is set to 30 minutes by default, and can be set to smaller values for repositories which are known to change often. (I find that, in practice, lowering of this value is only required for developers in some situation. Like trying out a buildservice package which you just built, _and_ having updated less than 30 minutes before.)
- user-initiated: update as needed, regardless of the 'dont check' period
That's what I expect, as a user, if the documentation talks about "forced refresh". There is no direct yum equivalent for this, although it would definitely be useful. With yum, this can be achieved by rm /var/cache/yum/$repo/cachecookie
- forced: download everything
Yes. Needed for the unwanted case that this is needed. That matches the behaviour of the following yum operations: yum clean all yum <some command> But, this option should be renamed from "forced refresh" to "re-cache metadata", or similar, to avoid the ambiguity with the other option, see above. Note, there is already a commandline switch in zypper named "--force-download" which would fit perfectly well for this purpose. (The behaviour of --force-download which is described in the (Beta2) man page should probably be named --only-download instead.)
Would that fulfill everyone's requirement ?
There is one more. In addition, there should be a "use-only-cache" mode. Like yum -C. If there is no internet connection, or for people who don't want their dialin connection go online :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=307249#c10
--- Comment #10 from Ján Kupec
- automatic: check if we're within the 'dont check' period. If yes, do nothing. If no, update as needed <snip> zypper works that way now, we only don't have the '$metadata_expire' configurable per repo.
- user-initiated: update as needed, regardless of the 'dont check' period
That's what I expect, as a user, if the documentation talks about "forced refresh".
Not for me. A forced refresh is a full download and parse, regardless of what libzypp's or zypper's suggestion is. However, that's just wording, i think we agree on functionality. So as i suggested above, this is the only thing (apart from that improper status message) to change in zypper.
- forced: download everything
But, this option should be renamed from "forced refresh" to "re-cache metadata", or similar, to avoid the ambiguity with the other option, see above.
I don't agree here. See above.
Note, there is already a commandline switch in zypper named "--force-download" which would fit perfectly well for this purpose. (The behaviour of --force-download which is described in the (Beta2) man page should probably be named --only-download instead.)
It does not work that way. --force-download forces the download, but not parsing (building the sqlite cache). So the metadata will be downloaded, but parsing takes place only if needed. You can argue how useful is this feature, but this is the way it works right now.
Would that fulfill everyone's requirement ?
There is one more. In addition, there should be a "use-only-cache" mode. Like yum -C. If there is no internet connection, or for people who don't want their dialin connection go online :-)
The 'refresh dealy' can be used to achieve this. Also the repos can be set to autorefresh=no - then they can be refreshed only on user request. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=307249
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249#c12
Stanislav Visnovsky
https://bugzilla.novell.com/show_bug.cgi?id=307249#c13
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249#c14
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249
Ján Kupec
https://bugzilla.novell.com/show_bug.cgi?id=307249
Josef Reidinger
https://bugzilla.novell.com/show_bug.cgi?id=307249
User jreidinger@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=307249#c16
--- Comment #16 from Josef Reidinger
https://bugzilla.novell.com/show_bug.cgi?id=307249
User jreidinger@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=307249#c17
Josef Reidinger
participants (1)
-
bugzilla_noreply@novell.com