[opensuse] a broad appeal to fix opensuse's package-management's repository/redirector failure & recovery mechasnism
In my experience, package management's routine failure to find/use a 'healthy' repository is a long-standing production problem. I'm requesting a fix. Here's a summary: I run Opensuse 13.1 on numerous machines lsb_release -rd Description: openSUSE 13.1 (Bottle) (x86_64) Release: 13.1 Current count is ~200. The machines are installed at multiple locations around the globe. They're connected to the 'net via a variety of different networks providers. Some of the machines are directly connected to the 'net, some are behind LAN routers, switches & firewalls. Package management for all of the machines is handled exclusively via zypper cli. Each machine has a common core of repositories defined in /etc/zypp/repos.d, and frequently has a number of additional @openSUSE dev (!'home') repos defined. In ALL cases, the default install of repos sets have been installed with the meta-director as baseurl, baseurl=http://download.opensuse.org/... Regular package maintenance consists of zypper clean --all zypper (d)up The maintenance frequency is nominally 1/wk, often 1/dy, and in devs' cases, often more frequent. In virtually ALL cases, the update process regularly fails @ retrieving/refreshing the repos' (meta)data. For example, a typical result is: ... Checking whether to refresh metadata for KDE4-Extra-Unstable Retrieving: repomd.xml .......................................................................................[error] File '/repodata/repomd.xml' not found on medium 'http://download.opensuse.org/repositories/KDE:/Unstable:/Extra/KDE_Current_o...' Abort, retry, ignore? [a/r/i/? shows all options] (a): ... This occurs occassionally for any/all repos, whether the standard distribution repos (security, update, etc), core DM (e.g. KDE*) additional repos, or the more 'esoteric' !home OBS-hosted repos (e.g., security:netfilter). The failure rate for overall update/upgrade process attempts is, very roughly, ~15%. The error is NON-recoverable. 'Abort' & 'retry' *never* work. Chats @ IRC re: the issue typically result in the same '(non)responses' : "wait", "works for me", "prove it", etc. The ONLY solution(s) that work are: (1) wait some random amount of time -- typically hours, occassionally days -- until the system magically heals itself, (2) visit the download.opensuse.org link for the repo, click 'details' for a target page, identify a specific working/available repo for the package(s) of interest, and manually edit baseurl= for the problematic repo. Neither is tenable for a reliable operating environment. It is simply unmanageable in either a single, local or widely-distributed environment. (2) is further confounded by the fact that, at any given time, a previously-working, manually-selected repo may, itself, fail, requiring -- yet again -- another manual intervention. Within the scope of our environment, no other distro's package management system has anywhere near the failure rate demonstrated here. (We've ~600+ other machines running a mix of Centos, Fedora, Debian & Ubuntu). This has been occurring for literally years, across multiple openSUSE versions, and remains unaddressed. I know, without any doubt, that others experience similar/frequent failures -- it's been a frequent discussion with our partners, as well as in openSUSE* IRC channels. This needs a fix. As to what, specifically, that fix can/should be -- I'm unclear. If a solution already exists, I'm unaware. One idea -- a fallback mechanism *within* a repos' definition would be useful For example, allow in a given repo's def'n, having multiple, numbered baseurls baseurl1=http://direct/url/to/specific/site/1/... baseurl2=http://direct/url/to/specific/site/2/... baseurl3=http://download.opensuse.org/... ... baseurlN=http://direct/url/to/specific/site/3/... and add fuction to zypper so that for each repo, the baseurls would be tried in order for any given failure. By adding, e.g., a failcount2abort=X to either/both a given repo's defn, or /etc/zypp(er).conf, the overall process could be terminated if there were "X" # of subsequent fails, indicating a likely systemic problem requiring further intervention. I'd appreciate hearing from "those responsible for keeping the redirector & repos working" re: * acknowledgement, or refusal thereof, of the failure issue * clarification as to why it occurs in the first place * ideas/suggestions as to what can/should be done to fix it Thanks. Grant -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Perhaps your 15% failure rate is due to selecting the same repos for all your 200 machines scattered around the globe, conflicting with scheduled down time of the repositories or even down-time on each of your site's own networks. Maybe you would be better off running your own cloned repositories on your own network, or preceding your update attempt with a simple wget to see if the network is up and running and the XML file exists. I've never been comfortable running anything from @openSUSE dev (!'home') repros, because the existance of those depends on the whim of that particular developer. On 10/1/2014 7:10 AM, grantksupport@operamail.com wrote:
In my experience, package management's routine failure to find/use a 'healthy' repository is a long-standing production problem.
I'm requesting a fix.
Here's a summary:
I run Opensuse 13.1 on numerous machines
lsb_release -rd Description: openSUSE 13.1 (Bottle) (x86_64) Release: 13.1
Current count is ~200.
The machines are installed at multiple locations around the globe.
They're connected to the 'net via a variety of different networks providers.
Some of the machines are directly connected to the 'net, some are behind LAN routers, switches & firewalls.
Package management for all of the machines is handled exclusively via zypper cli.
Each machine has a common core of repositories defined in /etc/zypp/repos.d, and frequently has a number of additional @openSUSE dev (!'home') repos defined.
In ALL cases, the default install of repos sets have been installed with the meta-director as baseurl,
baseurl=http://download.opensuse.org/...
Regular package maintenance consists of
zypper clean --all zypper (d)up
The maintenance frequency is nominally 1/wk, often 1/dy, and in devs' cases, often more frequent.
In virtually ALL cases, the update process regularly fails @ retrieving/refreshing the repos' (meta)data.
For example, a typical result is:
... Checking whether to refresh metadata for KDE4-Extra-Unstable Retrieving: repomd.xml .......................................................................................[error] File '/repodata/repomd.xml' not found on medium 'http://download.opensuse.org/repositories/KDE:/Unstable:/Extra/KDE_Current_o...'
Abort, retry, ignore? [a/r/i/? shows all options] (a): ...
This occurs occassionally for any/all repos, whether the standard distribution repos (security, update, etc), core DM (e.g. KDE*) additional repos, or the more 'esoteric' !home OBS-hosted repos (e.g., security:netfilter).
The failure rate for overall update/upgrade process attempts is, very roughly, ~15%.
The error is NON-recoverable. 'Abort' & 'retry' *never* work.
Chats @ IRC re: the issue typically result in the same '(non)responses' : "wait", "works for me", "prove it", etc.
The ONLY solution(s) that work are:
(1) wait some random amount of time -- typically hours, occassionally days -- until the system magically heals itself, (2) visit the download.opensuse.org link for the repo, click 'details' for a target page, identify a specific working/available repo for the package(s) of interest, and manually edit baseurl= for the problematic repo.
Neither is tenable for a reliable operating environment. It is simply unmanageable in either a single, local or widely-distributed environment.
(2) is further confounded by the fact that, at any given time, a previously-working, manually-selected repo may, itself, fail, requiring -- yet again -- another manual intervention.
Within the scope of our environment, no other distro's package management system has anywhere near the failure rate demonstrated here. (We've ~600+ other machines running a mix of Centos, Fedora, Debian & Ubuntu).
This has been occurring for literally years, across multiple openSUSE versions, and remains unaddressed. I know, without any doubt, that others experience similar/frequent failures -- it's been a frequent discussion with our partners, as well as in openSUSE* IRC channels.
This needs a fix. As to what, specifically, that fix can/should be -- I'm unclear. If a solution already exists, I'm unaware.
One idea -- a fallback mechanism *within* a repos' definition would be useful
For example, allow in a given repo's def'n, having multiple, numbered baseurls
baseurl1=http://direct/url/to/specific/site/1/... baseurl2=http://direct/url/to/specific/site/2/... baseurl3=http://download.opensuse.org/... ... baseurlN=http://direct/url/to/specific/site/3/...
and add fuction to zypper so that for each repo, the baseurls would be tried in order for any given failure.
By adding, e.g., a
failcount2abort=X
to either/both a given repo's defn, or /etc/zypp(er).conf, the overall process could be terminated if there were "X" # of subsequent fails, indicating a likely systemic problem requiring further intervention.
I'd appreciate hearing from "those responsible for keeping the redirector & repos working" re:
* acknowledgement, or refusal thereof, of the failure issue * clarification as to why it occurs in the first place * ideas/suggestions as to what can/should be done to fix it
Thanks.
Grant
-- _____________________________________ ---This space for rent--- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Oct 1, 2014, at 02:31 PM, John Andersen wrote:
Perhaps your 15% failure rate is due to selecting the same repos for all your 200 machines scattered around the globe, conflicting with scheduled down time of the repositories or even down-time on each of your site's own networks.
The machines are configured with http://download.opensuse.org, with ZYPP_ARIA2C=1, not (necessarily) with a given repo URL. zypper's supposed to query the redirector links, and find/use a 'best' (by some criterion) repo. There's no downtime logged of any of my networks at any of the failures, also, each has fully redundant connectivity. If there were a network failure at the time of zypper (d)up, it'd fail for ALL the repos in, e.g. a refresh, not just one/some.
Maybe you would be better off running your own cloned repositories on your own network, or preceding your update attempt with a simple wget to see if the network is up and running and the XML file exists.
Yes, there are alternatives. Including other distros. I'm interested in the proper, standard function of zypper on openSUSE. In general, when repos are 'up', it works fine. It does NOT recover well, or at all, when an individual repo fails for whatever reason.
I've never been comfortable running anything from @openSUSE dev (!'home') repros, because the existance of those depends on the whim of that particular developer.
That's a choice. Not one that we make. I don't consider security:netfilter, nor any of the other non-'home' repos we use, to be managed 'on a whim'. As I'd previously mentioned this issue is NOT limited to non-distro repositories. In any case, it's irrelevant. How zypper fails/recovers should have absolutely no dependency on which repo it's failing on. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-10-02 00:19, grantksupport@operamail.com wrote:
On Wed, Oct 1, 2014, at 02:31 PM, John Andersen wrote:
The machines are configured with http://download.opensuse.org, with ZYPP_ARIA2C=1, not (necessarily) with a given repo URL.
zypper's supposed to query the redirector links, and find/use a 'best' (by some criterion) repo.
...
I'm interested in the proper, standard function of zypper on openSUSE.
Well, for that, you should first remove the "ZYPP_ARIA2C=1", because it is non-standard. As I mentioned to you on another list, it was something they tried years ago, I think before the redirector was set up. An experiment (it is a hidden setting, as you can see, not on a configuration file).
In general, when repos are 'up', it works fine. It does NOT recover well, or at all, when an individual repo fails for whatever reason.
Yes, there is no local failover method - except what MirrorBrain does, server side. I think that aria2c, used with zypper, was not working right, so the default is not use it, but curl.
I've never been comfortable running anything from @openSUSE dev (!'home') repros, because the existance of those depends on the whim of that particular developer. ... In any case, it's irrelevant. How zypper fails/recovers should have absolutely no dependency on which repo it's failing on.
It has, in a way: home repos are simply removed by their owners any time they wish and without telling any one. They are private playgrounds, they can do whatever they wish. More or less. If zypper is configured to query a particular repo and it fails, you can not failover to another repository. They could add to zypper code to switch to an alternative URL for the SAME repository, never for another one. That's completely impossible. So, if a particular home repo disappears you have to manually choose an adequate replacement. That's intentional and I don't expect it to change (hint: vendor stickiness). - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlQsi7oACgkQtTMYHG2NR9VPxACeOaeNAht6yGXLsccNZwPJohvs wR0AnRBrU33FEbD4PejC5ur2gmZSbhuC =/o1o -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Well, for that, you should first remove the "ZYPP_ARIA2C=1"
Testing now on a small subset of machines.
It has, in a way: home repos (snip)
We don't use any 'home' repos. They're just not relevant here.
They could add to zypper code to switch to an alternative URL for the SAME repository
Agreed. As suggested at the end of the OP. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-10-02 02:59, grantksupport@operamail.com wrote:
Well, for that, you should first remove the "ZYPP_ARIA2C=1"
Testing now on a small subset of machines.
It has, in a way: home repos (snip)
We don't use any 'home' repos. They're just not relevant here.
Ah, ok. Your post was unclear on that respect: we thought you were using them. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlQsptgACgkQtTMYHG2NR9Vj/wCeKi6CEgElRJPKoiOY5+YqJWrC iAgAn0Ir0uP3S5JlBQRTsccTQDsKWfsl =6rRG -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Oct 1, 2014, at 06:14 PM, Carlos E. R. wrote:
We don't use any 'home' repos. They're just not relevant here.
Ah, ok. Your post was unclear on that respect: we thought you were using them.
"... and frequently has a number of additional @openSUSE dev (!'home') repos defined. ..." "... nor any of the other non-'home' repos we use ..." Again, what type of repo is irrelevant. If the repo exists and is populated at a published-as-available URL for the repo -- and zypper fails -- it should fail reliably. Ideally, with useful fallback. That behavior has no dependency on the type of repo, it's location, etc; only that it exists, is accessible, and is populated with correct/complete data. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-10-02 03:35, grantksupport@operamail.com wrote:
On Wed, Oct 1, 2014, at 06:14 PM, Carlos E. R. wrote:
We don't use any 'home' repos. They're just not relevant here.
Ah, ok. Your post was unclear on that respect: we thought you were using them.
"... and frequently has a number of additional @openSUSE dev (!'home') repos defined. ..."
Please don't. That's programmesse.
Again, what type of repo is irrelevant. If the repo exists and is populated at a published-as-available URL for the repo -- and zypper fails -- it should fail reliably. Ideally, with useful fallback. That behavior has no dependency on the type of repo, it's location, etc; only that it exists, is accessible, and is populated with correct/complete data.
It is relevant: home repos dissapear without notice. There is no graceful fail mode for that, except abort. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlQsrPIACgkQtTMYHG2NR9WPBACcDgLts557UNMAfVRB7I30w2Ql NU0An2n84r3okV0+5xKmOJRmCz5hJzqp =c0RT -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Oct 1, 2014, at 06:40 PM, Carlos E. R. wrote:
It is relevant: home repos dissapear without notice. There is no graceful fail mode for that, except abort.
If you'd like to have a discussion about zypper's behavior in that case, feel free to start a new topic. I'm not interested in it, and haven't asked for any changes regarding it. zypper's current abort/retry behvaior is completely sufficient in the case of a a repo -- any repo -- that disappears without notice. Not only is fallback in that case not needed, it's nonsensical. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
* grantksupport@operamail.com
On Wed, Oct 1, 2014, at 06:40 PM, Carlos E. R. wrote:
It is relevant: home repos dissapear without notice. There is no graceful fail mode for that, except abort.
If you'd like to have a discussion about zypper's behavior in that case, feel free to start a new topic.
I'm not interested in it, and haven't asked for any changes regarding it. zypper's current abort/retry behvaior is completely sufficient in the case of a a repo -- any repo -- that disappears without notice. Not only is fallback in that case not needed, it's nonsensical.
Well, *you* cannot have it only one way. There happen to be many packages duplicated with differing version numbers (and build numbers which appear to be different versions). Your first-stated problem probably comes from mirrors not being updated when you request the packages. Perhaps your interest will move you to investigate the possibility of automagically searching for another mirror or applicable package in an alternate repo to fulfill your requests. I too have observed occasions where packages show published but aren't available which usually is solved by some patience. In fact, I cannot install/upgrade to the just released digikam 4.2 because the updated kipi-plugins are not yet available, at least to the mirrors my system is utilizing. patience, ma-man... -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://linuxcounter.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-10-02 03:46, grantksupport@operamail.com wrote:
On Wed, Oct 1, 2014, at 06:40 PM, Carlos E. R. wrote:
It is relevant: home repos dissapear without notice. There is no graceful fail mode for that, except abort.
If you'd like to have a discussion about zypper's behavior in that case, feel free to start a new topic.
You do not need to tell me how to do that - should I wish to have such a discussion :-| Look, I'm trying to help. But I'm getting the feeling that you are somewhat aggressive towards me, and in that case I would not like to continue. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlQsyBEACgkQtTMYHG2NR9VfugCfamfk0fjUZ/0eHRBFeRXOmvfo cTsAoI796lIGD8myf4eKwnTJG+QzG7q5 =KkZY -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu, 2014-10-02 at 01:18 +0200, Carlos E. R. wrote:
It has, in a way: home repos are simply removed by their owners any time they wish and without telling any one. They are private playgrounds, they can do whatever they wish. More or less.
Generally speaking, if you want to use anything from any playgrounds (and there might be very good reasons for it), just make a snapshot on a local http-server, that can act as local repository. And once in a while, re-rsync the lot _after_ you verified that the bits-and-pieces still exists. There are gems out there, free gems, but one should not become dependent on it. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wednesday 01 October 2014 16:10:32 grantksupport@operamail.com wrote:
One idea -- a fallback mechanism *within* a repos' definition would be useful
For example, allow in a given repo's def'n, having multiple, numbered baseurls
baseurl1=http://direct/url/to/specific/site/1/... baseurl2=http://direct/url/to/specific/site/2/... baseurl3=http://download.opensuse.org/... ... baseurlN=http://direct/url/to/specific/site/3/...
and add fuction to zypper so that for each repo, the baseurls would be tried in order for any given failure.
By adding, e.g., a
failcount2abort=X
to either/both a given repo's defn, or /etc/zypp(er).conf, the overall process could be terminated if there were "X" # of subsequent fails, indicating a likely systemic problem requiring further intervention.
This should not be too hard to achieve. Since libzypp-8.8.0 (openSUSE 11.4) using a 'mirrorlist' url instead of 'baseurl' within the .repo file is already supported: #baseurl=http://direct/url/to/specific/site mirrorlist=url://server/path/to/mirrorlist.file Defining multiple URLs for a repo this way is possible. I can't remember any feedback related to mirrorlist, so this feature either works or isn't used. Probably the later, as a quick check reveals that a local file can't be used as mirrorlist (file:/localpath/to/mirrorlist.file) and zypper does not switch non-interactively between the URLs on error. I filed a bugreport to track this. [https://bugzilla.suse.com/show_bug.cgi?id=899510] -- cu, Michael Andres +------------------------------------------------------------------+ Key fingerprint = 2DFA 5D73 18B1 E7EF A862 27AC 3FB8 9E3A 27C6 B0E4 +------------------------------------------------------------------+ Michael Andres SUSE LINUX Products GmbH, Development, ma@suse.de GF:Jeff Hawn,Jennifer Guild,Felix Imendörffer, HRB16746(AG Nürnberg) Maxfeldstrasse 5, D-90409 Nuernberg, Germany, ++49 (0)911 - 740 53-0 +------------------------------------------------------------------+ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Carlos E. R.
-
grantksupport@operamail.com
-
Hans Witvliet
-
John Andersen
-
Michael Andres
-
Patrick Shanahan