[opensuse-buildservice] Extreme waste of space by source RPMs
Hi, we waste extreme amounts of space on our download server (and on the mirrors) by the way we publish source RPMs. The source package is published in every repository, so it can be duplicated considerably. For instance, an Apache httpd source package is 5.3 MB, but altogether they occupy more than the ten-fold: # du -sch repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M total # du -sch repositories/Apache/*/src/apache2-*.src.rpm 5.3M repositories/Apache/CentOS_5/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/Fedora_10/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/Fedora_9/src/apache2-2.2.11-10.1.src.rpm 5.2M repositories/Apache/Mandriva_2008/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_10.3/src/apache2-2.2.11-10.1.src.rpm 5.2M repositories/Apache/openSUSE_11.0/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_Factory/src/apache2-2.2.11-10.2.src.rpm 5.3M repositories/Apache/RHEL_5/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLE_10_server_database_postgresql/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLE_10/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLE_11/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLES_9/src/apache2-2.2.11-10.1.src.rpm 68M total (This probably being a rather harmless example, but illustrative.) To set this in perspective, I estimate that 25-50% of space might be wasted. Let's look at some numbers: 98M repositories/Apache/*/src 342M repositories/Apache 9M repositories/home:/poeml/*/src 18M repositories/home:/poeml 22G repositories/games/*/src 50G repositories/games Wow, it's even worse than I thought... What can we do about this? From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?) The source RPMs should be published once only, if at all. What do you think? Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development
Hi Dne Fri, 26 Jun 2009 15:39:53 +0200 Peter Poeml <poeml@suse.de> napsal(a):
we waste extreme amounts of space on our download server (and on the mirrors) by the way we publish source RPMs.
Yes.
From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?)
Unfortunately source rpms do not have to be same for all architectures (for example if you have Patch hidden by some %ifs). Not sure if it case for some existing package though. Also different rpms from different distributions might compress source rpm in a different way.
The source RPMs should be published once only, if at all. What do you think?
If they are all same, there is surely no need to duplicate them. -- Michal Čihař | http://cihar.com | http://blog.cihar.com
On Fri, 2009-06-26 at 15:44 +0200, Michal Čihař wrote:
Dne Fri, 26 Jun 2009 15:39:53 +0200 Peter Poeml <poeml@suse.de> napsal(a):
we waste extreme amounts of space on our download server (and on the mirrors) by the way we publish source RPMs.
Yes.
From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?)
Unfortunately source rpms do not have to be same for all architectures (for example if you have Patch hidden by some %ifs). Not sure if it case for some existing package though.
Also different rpms from different distributions might compress source rpm in a different way.
The source RPMs should be published once only, if at all. What do you think?
If they are all same, there is surely no need to duplicate them.
Unfortunately using an external checksum will always fail, as at least a little distro metadata is included in each package (plus the randomization related to the GPG signature). It would be much better if the build service could do a first pass for building the source RPM then rebuild that for all platforms. One can still conditionalize patch application and use distro specific variables in spec files, as the spec file is re-expanded during the rebuild. Would this be possible? Thanks, Peter -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-buildservice+help@opensuse.org
On Fri, Jun 26, 2009 at 03:44:01PM +0200, Michal Čihař wrote:
From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?)
Unfortunately source rpms do not have to be same for all architectures (for example if you have Patch hidden by some %ifs). Not sure if it case for some existing package though.
Also different rpms from different distributions might compress source rpm in a different way.
Bummer. Would nosrc RPMs be an alternative? Or referring to the buildservice source server right away (much more convenient, although the interface is lacking yet)? Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development
On Jun 26, 09 15:52:22 +0200, Peter Poeml wrote:
On Fri, Jun 26, 2009 at 03:44:01PM +0200, Michal ??iha?? wrote:
From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?)
Unfortunately source rpms do not have to be same for all architectures (for example if you have Patch hidden by some %ifs). Not sure if it case for some existing package though.
Also different rpms from different distributions might compress source rpm in a different way.
Bummer.
Would nosrc RPMs be an alternative? Or referring to the buildservice source server right away (much more convenient, although the interface is lacking yet)?
All your source RPMs should be extremely similar, no? How about providing only one in full and the rest as deltas? cheers, JW- -- o \ Juergen Weigert paint it green! __/ _=======.=======_ <V> | jw@suse.de back to ascii! __/ _---|____________\/ \ | 0911 74053-508 __/ (____/ /\ (/) | _____________________________/ _/ \_ vim:set sw=2 wm=8 SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-buildservice+help@opensuse.org
Am Freitag, 26. Juni 2009 15:39:53 schrieb Peter Poeml:
Hi,
we waste extreme amounts of space on our download server (and on the mirrors) by the way we publish source RPMs.
The source package is published in every repository, so it can be duplicated considerably. For instance, an Apache httpd source package is 5.3 MB, but altogether they occupy more than the ten-fold:
# du -sch repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M total # du -sch repositories/Apache/*/src/apache2-*.src.rpm 5.3M repositories/Apache/CentOS_5/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/Fedora_10/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/Fedora_9/src/apache2-2.2.11-10.1.src.rpm 5.2M repositories/Apache/Mandriva_2008/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_10.3/src/apache2-2.2.11-10.1.src.rpm 5.2M repositories/Apache/openSUSE_11.0/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_11.1/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/openSUSE_Factory/src/apache2-2.2.11-10.2.src.rpm 5.3M repositories/Apache/RHEL_5/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLE_10_server_database_postgresql/src/apache2-2.2.11-10 .1.src.rpm 5.3M repositories/Apache/SLE_10/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLE_11/src/apache2-2.2.11-10.1.src.rpm 5.3M repositories/Apache/SLES_9/src/apache2-2.2.11-10.1.src.rpm 68M total
(This probably being a rather harmless example, but illustrative.)
To set this in perspective, I estimate that 25-50% of space might be wasted. Let's look at some numbers:
98M repositories/Apache/*/src 342M repositories/Apache
9M repositories/home:/poeml/*/src 18M repositories/home:/poeml
22G repositories/games/*/src 50G repositories/games
Wow, it's even worse than I thought...
What can we do about this?
From my understanding, each of the source rpms could be used to achieve the same build result. (Or isn't that the case?)
The source RPMs should be published once only, if at all. What do you think?
Actually no, since they differ. OBS is adapting spec files to be usable on that particular plattform. But the plan was to move them under /pub/opensuse/source/repositories/... to be able to exclude them. Just dropping the source rpms and point people to checkout sources with osc tool was discarded in earlier discussions, because people wanted to have source rpms as they are used to. bye adrian -- Adrian Schroeter SUSE Linux Products GmbH email: adrian@suse.de -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-buildservice+help@opensuse.org
On Sat, Jun 27, 2009 at 09:36:16AM +0200, Adrian Schröter wrote:
The source RPMs should be published once only, if at all. What do you think?
Actually no, since they differ. OBS is adapting spec files to be usable on that particular plattform.
So we have: - differing spec files => hm, hard to do something about that. - different header checksums due to tiny bits differing per build (build time. => I'm sure we could ignore these - different compression => We could simply use the same compression for all source RPMs, least common denominator that works on all platforms
But the plan was to move them under /pub/opensuse/source/repositories/... to be able to exclude them.
That would also be useful. It'd eventually allow us to keep them in a separate, more manageable tree, on a separate disk and treat them with lower priority.
Just dropping the source rpms and point people to checkout sources with osc tool was discarded in earlier discussions, because people wanted to have source rpms as they are used to.
Once we have *direct* access to the sources (without a login), we can make generation of source RPMs optional. That'll help very much I think. For many people it will just be more valuable to point to expanded sources (that everybody can look at), than at a source RPM (that in many cases is inconvenient). I added this comment https://features.opensuse.org/306192#comment_9 to the feature #306192: "Make BuildService accessible for anonymous users". (So far, I repeatedly find myself copying sources to some private server to make them accessible. Source RPMs are of little help to most people on other platforms.) Peter -- "WARNING: This bug is visible to non-employees. Please be respectful!" SUSE LINUX Products GmbH Research & Development
Peter Poeml wrote:
On Sat, Jun 27, 2009 at 09:36:16AM +0200, Adrian Schröter wrote:
The source RPMs should be published once only, if at all. What do you think? Actually no, since they differ. OBS is adapting spec files to be usable on that particular plattform.
So we have: - differing spec files => hm, hard to do something about that.
- different header checksums due to tiny bits differing per build (build time. => I'm sure we could ignore these
- different compression => We could simply use the same compression for all source RPMs, least common denominator that works on all platforms
But the plan was to move them under /pub/opensuse/source/repositories/... to be able to exclude them.
That would also be useful. It'd eventually allow us to keep them in a separate, more manageable tree, on a separate disk and treat them with lower priority.
Just dropping the source rpms and point people to checkout sources with osc tool was discarded in earlier discussions, because people wanted to have source rpms as they are used to.
Once we have *direct* access to the sources (without a login), we can make generation of source RPMs optional. That'll help very much I think. For many people it will just be more valuable to point to expanded sources (that everybody can look at), than at a source RPM (that in many cases is inconvenient).
I will say that on debian based systems things like: apt-get build-deps <pkg> require a decent source repository. Git and OBS has been discussed in the past - how about a 404 handler that generates them on-demand using something like pristine-tar? http://kitenet.net/~joey/code/pristine-tar/ This would allow lots of minor variations on a tarball to be stored efficiently. Then delete them a week after they're last downloaded. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-buildservice+help@opensuse.org
On 2009-06-27T09:36:16, Adrian Schröter <adrian@suse.de> wrote:
The source RPMs should be published once only, if at all. What do you think? Actually no, since they differ. OBS is adapting spec files to be usable on that particular plattform.
This may seem like a bit of a silly question, but why doesn't that happen as part of the build? i.e., the src.rpm is the same for all platforms/archs, and the modifications get applied as part of building the binary package. Why does the actual src.rpm have to be different _before_ the build? I assume that is because of adjustments that are only done by the BS, not when people build locally. Can these possibly be done via some ifdefs instead, which would work locally as well? Regards, Lars -- SuSE Labs, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-buildservice+help@opensuse.org
participants (7)
-
Adrian Schröter
-
David Greaves
-
Juergen Weigert
-
Lars Marowsky-Bree
-
Michal Čihař
-
Peter Bowen
-
Peter Poeml