Mailinglist Archive: opensuse-buildservice (105 mails)

< Previous Next >
[opensuse-buildservice] a broad appeal to fix opensuse's package-management's repository/redirector failure & recovery mechasnism
In my experience, package management's routine failure to find/use a 'healthy'
repository is a long-standing production problem.

I'm requesting a fix.

Here's a summary:

I run Opensuse 13.1 on numerous machines

lsb_release -rd
Description: openSUSE 13.1 (Bottle) (x86_64)
Release: 13.1

Current count is ~200.

The machines are installed at multiple locations around the globe.

They're connected to the 'net via a variety of different networks providers.

Some of the machines are directly connected to the 'net, some are behind LAN
routers, switches & firewalls.

Package management for all of the machines is handled exclusively via zypper
cli.

Each machine has a common core of repositories defined in /etc/zypp/repos.d,
and frequently has a number of additional @openSUSE dev (!'home') repos defined.

In ALL cases, the default install of repos sets have been installed with the
meta-director as baseurl,

baseurl=http://download.opensuse.org/...

Regular package maintenance consists of

zypper clean --all
zypper (d)up

The maintenance frequency is nominally 1/wk, often 1/dy, and in devs' cases,
often more frequent.

In virtually ALL cases, the update process regularly fails @
retrieving/refreshing the repos' (meta)data.

For example, a typical result is:

...
Checking whether to refresh metadata for KDE4-Extra-Unstable
Retrieving: repomd.xml
.......................................................................................[error]
File '/repodata/repomd.xml' not found on medium
'http://download.opensuse.org/repositories/KDE:/Unstable:/Extra/KDE_Current_openSUSE_13.1'

Abort, retry, ignore? [a/r/i/? shows all options] (a):
...

This occurs occassionally for any/all repos, whether the standard distribution
repos (security, update, etc), core DM (e.g. KDE*) additional repos, or the
more 'esoteric' !home OBS-hosted repos (e.g., security:netfilter).

The failure rate for overall update/upgrade process attempts is, very roughly,
~15%.

The error is NON-recoverable. 'Abort' & 'retry' *never* work.

Chats @ IRC re: the issue typically result in the same '(non)responses' :
"wait", "works for me", "prove it", etc.

The ONLY solution(s) that work are:

(1) wait some random amount of time -- typically hours, occassionally
days -- until the system magically heals itself,
(2) visit the download.opensuse.org link for the repo, click 'details'
for a target page, identify a specific working/available repo for the
package(s) of interest, and manually edit baseurl= for the problematic repo.

Neither is tenable for a reliable operating environment. It is simply
unmanageable in either a single, local or widely-distributed environment.

(2) is further confounded by the fact that, at any given time, a
previously-working, manually-selected repo may, itself, fail, requiring -- yet
again -- another manual intervention.

Within the scope of our environment, no other distro's package management
system has anywhere near the failure rate demonstrated here. (We've ~600+ other
machines running a mix of Centos, Fedora, Debian & Ubuntu).

This has been occurring for literally years, across multiple openSUSE versions,
and remains unaddressed. I know, without any doubt, that others experience
similar/frequent failures -- it's been a frequent discussion with our partners,
as well as in openSUSE* IRC channels.

This needs a fix. As to what, specifically, that fix can/should be -- I'm
unclear. If a solution already exists, I'm unaware.

One idea -- a fallback mechanism *within* a repos' definition would be useful

For example, allow in a given repo's def'n, having multiple, numbered baseurls

baseurl1=http://direct/url/to/specific/site/1/...
baseurl2=http://direct/url/to/specific/site/2/...
baseurl3=http://download.opensuse.org/...
...
baseurlN=http://direct/url/to/specific/site/3/...


and add fuction to zypper so that for each repo, the baseurls would be tried in
order for any given failure.

By adding, e.g., a

failcount2abort=X

to either/both a given repo's defn, or /etc/zypp(er).conf, the overall process
could be terminated if there were "X" # of subsequent fails, indicating a
likely systemic problem requiring further intervention.

I'd appreciate hearing from "those responsible for keeping the redirector &
repos working" re:

* acknowledgement, or refusal thereof, of the failure issue
* clarification as to why it occurs in the first place
* ideas/suggestions as to what can/should be done to fix it

Thanks.

Grant
--
To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse-buildservice+owner@xxxxxxxxxxxx

< Previous Next >
List Navigation
Follow Ups