[opensuse-factory] SL-Factory vs. SL-Factory-debug

Hi, on popular request, we separated the debuginfo packages from Factory into a separated repository. We will have SL-OSS-Factory and SL-OSS-Factory-debug directories with the next sync. Users of the opensuse-full or opensuse-full-with-factory modules do not have to change anything. I hope this is fine with everybody. bye adrian -- Adrian Schroeter SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany email: adrian@suse.de --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

On Thu, Aug 24, 2006 at 06:01:14PM +0200, Adrian Schroeter wrote:
I wonder why I didn't catch a single one of these "popular requests" on this mailing list.
Well, actually in my opinion it is just more unconvenient to have all these repository splits if you want to setup installation sources. I mean I can understand the reason to separate the non-oss stuff to have an oss-clean distribution. (Actually I still can't understand why non-oss stuff for factory must be still hosted on suse.com whereas all other non-oss stuff is now hosted on opensuse.org.) But what was the _reason_ for the debuginfo split? Just that some people wanted to have it without having a reason? Or didn't they understand how to use --exclude with rsync? Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de "Quidquid latine dictum sit, altum sonatur."

Hi, Robert Schiele schrieb:
But what was the _reason_ for the debuginfo split? Just that some people wanted to have it without having a reason?
this was my proposal and is therefore my "fault". https://bugzilla.novell.com/show_bug.cgi?id=197823 So I'd like to defend myself. There are currently more than 6000 binary packages in Factory-x86_64, 2000 of which are debuginfo packages. The noarch packages are less than 1000. I guess that the situation is nearly the same on all architectures. So there are about 7000 binary packages per architecture, 2000 of which are debuginfo packages. Someone has to parse all this stuff. I mean, the metadata. It is well known that zypp parses the repository metadata slowly. It has already become faster and it will become even better, but it's still slow. And it's not just zypp. Yum, with the new(!) C metadata parser written by Tambet Ingo, needs half a minute to parse primary.xml and again half a minute to parse filelists.xml on my laptop. I don't even want to know how slow it would be with the old python parser. While parsing, it shows me that it parses the metadata for about 22000 packages, probably because it has to parse the metadata for all architectures. Among the 22000 packages there are 6000 debuginfo packages, 2000 per architecture. My idea was that this can be reduced between 1/3 and 1/4 "for free" by separating the debuginfo packages. It is of course not entirely "for free" because it is less convenient for those people who want to have the debuginfo packages. But these packages are a rather specialized use case. There are people who need them every day and there are people who never need them, maybe there are even people who are confused by them. For Factory, the ratio of people who need them is probably larger than for a released version. The proposal was primarily intended for the released versions. OK, now we have it for Factory, too. The debuginfo packages contribute quite a lot to the repository metadata because they contain the complete source code, every single file unpacked into /usr/src/debug. For a rather small library like zlib, these are 26 files while the main and devel packages just have 7 files each. And there are larger packages with more source files and still just 7 binary files, where the ratio is even worse. Andreas Hanke --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke <andreas.hanke@gmx-topmail.de> writes:
The horror! ;) Someone out there who has tried to store the data in an XML database system (dbxml, idzebra, etc.) and to access it from there? I do not understand why build the objects again and again. -- Karl Eichwalder R&D / Documentation SUSE Linux Products GmbH Key fingerprint = B2A3 AF2F CFC8 40B1 67EA 475A 5903 A21B 06EB 882E --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

On Fri, Aug 25, 2006 at 05:54:18AM +0200, Andreas Hanke wrote:
OK, so you are proposing a *workaround* for a known and very severe problem. Especially with factory, we should *not* concentrate on workaround but on *fixes*! So as long as factory is a development branch, this *should not* be done. ciao Joerg -- Joerg Mayer <jmayer@loplof.de> We are stuck with technology when what we really want is just stuff that works. Some say that should read Microsoft instead of technology. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Hi, Joerg Mayer schrieb:
Joerg, this is a valid point if it were a workaround, but it is not entirely a workaround. Do you know any other distro that has such a huge package base as openSUSE and uses rpm-md? I don't. Before making the proposal, I looked at Fedora's repository layout and found out the following: - Fedora Core has significantly fewer packages than openSUSE. - Fedora already splits the debuginfo packages out, into a separate directory that doesn't have any repository metadata at all. - Fedora also separates the repositories per architecture even though rpm-md supports multiarch repositories very well. - Fedora splits the source packages into a separate repository from the binary packages. In numbers. openSUSE: 4000 "real" i586 binary packages. 2000 i586 debuginfo packages. 4000 "real" x86_64 binary packages. 2000 x86_64 debuginfo packages. 4000 "real" ppc binary packages. 2000 ppc debuginfo packages. 1000 noarch packages. 3000 source packages. ---- 22000 total packages in a SINGLE repository. Fedora: 2200 packages in the repository most people are interested in (i586 binary + noarch, no debuginfo, no source, no other architectures). 22000/2200 is a factor of 10! This makes me seriously doubt that rpm-md is designed or even suitable for such huge repositories. It's not surprising that parsing this beast is slow, even with a fast parser. It also makes me doubt that improving the parser is the only way of approaching the problem. So I thought how to reduce the number of packages: - Separate repositories per architecture - not possible because SUSE repositories have always been multiarch. - Separate repositories for source packages - bad idea IMHO. - Reducing the number of packages - not possible, people want to have more software and not less. The debuginfo packages sounded like a reasonable candidate to me because their number is always proportional to the number of binary packages. If we get a new architecture like ia64, we get more debuginfo packages as well, which means we can also save proportionally by splitting them from the rest. Note also that we already have performance workarounds. Using the old susetags metadata instead of rpm-md during initial installation is one of them. Andreas Hanke --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke <andreas.hanke@gmx-topmail.de> writes:
I agree with this conclusion. xml is nice but it shows it limits here.
We could - no problem at all. It would increase the space since all source and noarch packages would need to be duplicated. We love to have one ;-)
Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj/ SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126

* Andreas Hanke <andreas.hanke@gmx-topmail.de> [Aug 25. 2006 15:24]: [...]
Yes, this is one of the problems we're facing.
It not impossible, but needs extra work. Currently its also nice to publish only one repo URL without the need to distinguish between different architectures.
- Separate repositories for source packages - bad idea IMHO.
Why do you think this is a bad idea ?
- Reducing the number of packages - not possible, people want to have more software and not less.
Reducing the number of packages _per_ repository is easily possible. If you want more software, add more repositories. The OpenSUSE build service is supposed to address this in the future. What we need is some kind of 'meta repository' which points to other repositories. Its planned but we do not know if its doable in the OpenSUSE 10.2 timeframe. Klaus --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Hi, Klaus Kaempf schrieb:
And there would have to be a solution for biarch - because x86_64 needs to know about the i586 packages. Fedora seems to solve this by duplicating even the i386 packages in addition to the noarch packages: http://download.fedora.redhat.com/pub/fedora/linux/core/development/x86_64/o... I see a lot of i386 RPMs in that directory. It's not exactly a nicer solution than ours, however. In other words, it's ugly ;-) Furthermore, yum is able to expand the $basearch variable automatically, so there is still a single repository URL using this variable.
- Separate repositories for source packages - bad idea IMHO.
Why do you think this is a bad idea ?
Because they would be harder to find, resulting in fake GPL violation discussions on this list :-( I thought about proposing that the source packages are handled together with the debuginfo packages, because they are both interesting for developers. But the source packages are a special case because some users tend to get angry if they don't find them easily. And the number of source packages does not increase when adding new architectures, but the number of debuginfo packages does. Are the metadata of source packages faster or slower or equal to parse? Do they have equally verbose dependency information? At least they don't seem to have equally verbose filelist information. There is filelist information for source packages in filelists.xml, but most source packages don't have that many sources and patches in them. Andreas Hanke --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

* Andreas Hanke <andreas.hanke@gmx-topmail.de> [Aug 25. 2006 16:24]:
All of them ? Probably not. But it might be too much work to calculate needed (like 32 bit libraries) one.
Are the metadata of source packages faster or slower or equal to parse? Do they have equally verbose dependency information?
They are faster to parse because they usually have no dependency information. Klaus --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke schrieb:
Another point to consider: If it becomes possible to mirror the distribution without the source RPMs, someone will do it; and if someone does it, people will come back here and ask why there are just binaries on a mirror and not the sources. Brute-forcing mirrors to have the source RPMs by tying them together with the binary RPMs is a desirable effect IMHO. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Hi, On Fri, 25 Aug 2006, Andreas Hanke wrote:
Andreas Hanke schrieb:
Serving fools just creates a new fool. So this should not be an "originary" aspect. Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org) --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

forget my mail if it's stupid, but why couldn't you have _one_ repository and _several_ metadata files? clients only parse metadata file? jdd -- http://www.dodin.net http://dodin.org/galerie_photo_web/expo/index.html http://lucien.dodin.net http://fr.susewiki.org/index.php?title=Gérer_ses_photos --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

jdd schrieb:
but why couldn't you have _one_ repository and _several_ metadata files?
That's not possible with rpm-md. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

On Saturday 26 August 2006 19:25, Andreas Hanke wrote:
why not? it looks like you can have as much primary's, other, filelists etc as you want. <?xml version="1.0" encoding="UTF-8"?> <repomd xmlns="http://linux.duke.edu/metadata/repo"> <data type="patches">... </data> <data type="primary">..</data> <data type="filelists">...</data> </repomd> I already sugested this, split primary by letter primary-a,xml primary-b.xml and/or by arch. Does not save parse time, but allows smarter chaching on slow connections, in bg repos like factory. Duncan --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke <andreas.hanke@gmx-topmail.de> writes:
You have it for factory first ;-) Future repositories will have it, so expect it for 10.2 ... Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj/ SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126

Am Friday 25 August 2006 04:14 schrieb Robert Schiele:
there were a number of bugzilla reports around this.
Basicaly two reasons: 1. Mirrors can skip the debuginfo packages, without an exclude rule and without to "break" the repository meta data. 2. The installers have less meta data to handle by default (when you ignore the -debug repo), this let them take less memory, faster downloads of the meta data and faster solving. bye adrian -- Adrian Schroeter SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany email: adrian@suse.de --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

On Fri, Aug 25, 2006 at 10:45:56AM +0200, Adrian Schroeter wrote:
Thanks. I'd appreciate to see short explanations like this one in the first place when making announcements like this. Just saying "on popular request" did not sound as if the decission was really based on a reason. Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de "Quidquid latine dictum sit, altum sonatur."

On Thu, Aug 24, 2006 at 06:01:14PM +0200, Adrian Schroeter wrote:
I wonder why I didn't catch a single one of these "popular requests" on this mailing list.
Well, actually in my opinion it is just more unconvenient to have all these repository splits if you want to setup installation sources. I mean I can understand the reason to separate the non-oss stuff to have an oss-clean distribution. (Actually I still can't understand why non-oss stuff for factory must be still hosted on suse.com whereas all other non-oss stuff is now hosted on opensuse.org.) But what was the _reason_ for the debuginfo split? Just that some people wanted to have it without having a reason? Or didn't they understand how to use --exclude with rsync? Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de "Quidquid latine dictum sit, altum sonatur."

Hi, Robert Schiele schrieb:
But what was the _reason_ for the debuginfo split? Just that some people wanted to have it without having a reason?
this was my proposal and is therefore my "fault". https://bugzilla.novell.com/show_bug.cgi?id=197823 So I'd like to defend myself. There are currently more than 6000 binary packages in Factory-x86_64, 2000 of which are debuginfo packages. The noarch packages are less than 1000. I guess that the situation is nearly the same on all architectures. So there are about 7000 binary packages per architecture, 2000 of which are debuginfo packages. Someone has to parse all this stuff. I mean, the metadata. It is well known that zypp parses the repository metadata slowly. It has already become faster and it will become even better, but it's still slow. And it's not just zypp. Yum, with the new(!) C metadata parser written by Tambet Ingo, needs half a minute to parse primary.xml and again half a minute to parse filelists.xml on my laptop. I don't even want to know how slow it would be with the old python parser. While parsing, it shows me that it parses the metadata for about 22000 packages, probably because it has to parse the metadata for all architectures. Among the 22000 packages there are 6000 debuginfo packages, 2000 per architecture. My idea was that this can be reduced between 1/3 and 1/4 "for free" by separating the debuginfo packages. It is of course not entirely "for free" because it is less convenient for those people who want to have the debuginfo packages. But these packages are a rather specialized use case. There are people who need them every day and there are people who never need them, maybe there are even people who are confused by them. For Factory, the ratio of people who need them is probably larger than for a released version. The proposal was primarily intended for the released versions. OK, now we have it for Factory, too. The debuginfo packages contribute quite a lot to the repository metadata because they contain the complete source code, every single file unpacked into /usr/src/debug. For a rather small library like zlib, these are 26 files while the main and devel packages just have 7 files each. And there are larger packages with more source files and still just 7 binary files, where the ratio is even worse. Andreas Hanke --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke <andreas.hanke@gmx-topmail.de> writes:
The horror! ;) Someone out there who has tried to store the data in an XML database system (dbxml, idzebra, etc.) and to access it from there? I do not understand why build the objects again and again. -- Karl Eichwalder R&D / Documentation SUSE Linux Products GmbH Key fingerprint = B2A3 AF2F CFC8 40B1 67EA 475A 5903 A21B 06EB 882E --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

On Fri, Aug 25, 2006 at 05:54:18AM +0200, Andreas Hanke wrote:
OK, so you are proposing a *workaround* for a known and very severe problem. Especially with factory, we should *not* concentrate on workaround but on *fixes*! So as long as factory is a development branch, this *should not* be done. ciao Joerg -- Joerg Mayer <jmayer@loplof.de> We are stuck with technology when what we really want is just stuff that works. Some say that should read Microsoft instead of technology. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Hi, Joerg Mayer schrieb:
Joerg, this is a valid point if it were a workaround, but it is not entirely a workaround. Do you know any other distro that has such a huge package base as openSUSE and uses rpm-md? I don't. Before making the proposal, I looked at Fedora's repository layout and found out the following: - Fedora Core has significantly fewer packages than openSUSE. - Fedora already splits the debuginfo packages out, into a separate directory that doesn't have any repository metadata at all. - Fedora also separates the repositories per architecture even though rpm-md supports multiarch repositories very well. - Fedora splits the source packages into a separate repository from the binary packages. In numbers. openSUSE: 4000 "real" i586 binary packages. 2000 i586 debuginfo packages. 4000 "real" x86_64 binary packages. 2000 x86_64 debuginfo packages. 4000 "real" ppc binary packages. 2000 ppc debuginfo packages. 1000 noarch packages. 3000 source packages. ---- 22000 total packages in a SINGLE repository. Fedora: 2200 packages in the repository most people are interested in (i586 binary + noarch, no debuginfo, no source, no other architectures). 22000/2200 is a factor of 10! This makes me seriously doubt that rpm-md is designed or even suitable for such huge repositories. It's not surprising that parsing this beast is slow, even with a fast parser. It also makes me doubt that improving the parser is the only way of approaching the problem. So I thought how to reduce the number of packages: - Separate repositories per architecture - not possible because SUSE repositories have always been multiarch. - Separate repositories for source packages - bad idea IMHO. - Reducing the number of packages - not possible, people want to have more software and not less. The debuginfo packages sounded like a reasonable candidate to me because their number is always proportional to the number of binary packages. If we get a new architecture like ia64, we get more debuginfo packages as well, which means we can also save proportionally by splitting them from the rest. Note also that we already have performance workarounds. Using the old susetags metadata instead of rpm-md during initial installation is one of them. Andreas Hanke --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org

Andreas Hanke <andreas.hanke@gmx-topmail.de> writes:
I agree with this conclusion. xml is nice but it shows it limits here.
We could - no problem at all. It would increase the space since all source and noarch packages would need to be duplicated. We love to have one ;-)
Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj/ SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
participants (10)
-
Adrian Schröter
-
Andreas Hanke
-
Andreas Jaeger
-
Duncan Mac-Vicar Prett
-
Eberhard Moenkeberg
-
jdd
-
Joerg Mayer
-
Karl Eichwalder
-
Klaus Kaempf
-
Robert Schiele