[opensuse-factory] compressing rpms
Hi, I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip. This is likely so, because all the metadata is uncompressed and only the payload (file content) is compressed, so that rpm -qp $RPMFILE can easily and quickly access all metadata. If you want to test yourself, you can use this: cat ~/bin/xzcurl #!/bin/sh url=$1 len=$(curl -s -L -I $url | awk 'BEGIN {RS="\r\n"} /^Content-Length: /{print $2}') comprlen=$(curl -s -L $url | xz | wc -c) echo "Len:$len Compr:$comprlen Saved:"$((len-$comprlen))\ " Ratio:$((100*$comprlen/$len))%" xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/apache-pdfbox-javad... Len:216328 Compr:65144 Saved:151184 Ratio:30% xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/apache-pdfbox-javad... Len:1379932 Compr:1224584 Saved:155348 Ratio:89% xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/kernel-devel-4.12.1... Len:4766820 Compr:1229368 Saved:3537452 Ratio:26% xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/kernel-devel-4.12.1... Len:14664108 Compr:11124512 Saved:3539596 Ratio:76% xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/kernel-docs-html-4.... Len:3115424 Compr:957256 Saved:2158168 Ratio:31% xzcurl http://download.opensuse.org/update/leap/15.0/oss/noarch/kernel-docs-html-4.... Len:7771320 Compr:5609056 Saved:2162264 Ratio:72% xzcurl http://download.opensuse.org/update/leap/15.0/oss/x86_64/zutils-1.7-lp150.2.... Len:89184 Compr:79980 Saved:9204 Ratio:90% As you can see, the amount of saved bytes is nearly the same for the .drpm and .rpm files because they contain the same metadata. Now I was wondering: 1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files. Maybe not all of them need a patch - e.g. if libzypp uncompresses files on the user side before passing them on to further processing, then libsolv and librpm dont need any change. 2) if it is worth the effort to save 10-70% of bandwidth and mirror storage space One simpler, yet efficient way to save bandwidth, would be to enable gzip compression during transfer via apache's mod_deflate (can be limited to the .drpm extension) E.g. kernel-docs-html-4.12.14-lp150.11.2_lp150.12.16.1.noarch.drpm still gives -64% savings with gzip => https://en.wikipedia.org/wiki/HTTP_compression With https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#precompressed or https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#inflate this can even save disk-space by storing only .drpm.gz files on the server but delivering to clients whatever is requested. But then again, synchronizing such a mirror with rsync is harder, unless the rsync-source has .drpm.gz files. And that only works if all mirrors have mod_deflate configured properly... or libzypp is smartly patched I tested a bit with this on-the-fly compression: curl -v --compressed http://aw.zq1.de/rpm/test.rpm | wc -c curl -v -H "Accept-Encoding: gzip" http://aw.zq1.de/rpm/test.rpm | wc -c curl -v --compressed http://aw.zq1.de/rpm/kernel-docs-html.drpm | wc -c but both rpm -qpi http://aw.zq1.de/rpm/test.rpm zypper in -d http://aw.zq1.de/rpm/test.rpm request the uncompressed version atm https://build.opensuse.org/package/show/home:bmwiedemann:branches:zypp:Head/... has a tiny PoC compr.patch to allow it to save bandwidth towards servers that support it. But then, it would be nice to find a way to transfer compressed rpms that does not require mirror servers to be reconfigured. I guess, our distribution maintainers would not mind fitting 10-20% extra content on DVD / USB images. Would be possible with .rpm.xz or .rpm.gz files. Another place to save space + bandwidth is the repodata - those are currently compressed with gzip -9 --rsyncable but since xml file names change every time, rsyncable does not give a benefit there and we could just use xz instead. e.g. we get 35% savings there: wget -Oprimary.xml.gz http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/71f232... gzip -cd primary.xml.gz | time xz -9 | wc -c 7071384/10883249 = 0.65 Any opinions on which way to go? Ciao Bernhard M. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 2018-09-19 10:47, Bernhard M. Wiedemann wrote:
Hi,
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433 The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip. Updating %_binary_payload in prjconf or rpm-config-SUSE should do the job, at least for the plain BRPMs. %_source_payload for SRPMS. Not sure about drpms - AFAIU, they count as BRPMs.
Any opinions on which way to go? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 2018-09-19 10:53, Jan Engelhardt wrote:
On Wednesday 2018-09-19 10:47, Bernhard M. Wiedemann wrote:
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433
The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip.
Updating %_binary_payload in prjconf or rpm-config-SUSE should do the job, at least for the plain BRPMs. %_source_payload for SRPMS. Not sure about drpms - AFAIU, they count as BRPMs.
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am 19.09.18 um 10:59 schrieb Bernhard M. Wiedemann:
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different).
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt) Most of the metadata is probably changelog, isn't it? -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
Am 19.09.18 um 10:59 schrieb Bernhard M. Wiedemann:
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different).
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine. Ciao, Michael.
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
Am 19.09.18 um 10:59 schrieb Bernhard M. Wiedemann:
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different).
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever. The kernel-default (and kernel-source,...) all have changelog since 2009. Would also help with the ridiculously big RPM database: seife@strolchi:~> du -sh /usr/lib/sysimage/rpm/ 149M /usr/lib/sysimage/rpm/ seife@strolchi:~> du -sb /usr/lib/sysimage/rpm/ 155553792 /usr/lib/sysimage/rpm/ seife@strolchi:~> for i in `rpm -qa;do rpm -q --changes $i;done|wc -c 80379915 More than half of the RPM database is just changelogs. ...and it would improve the ridiculously bloated repository metadata that needs to be downloaded, even though these are at least compressed. IMNSHO this would be a huge gain with relatively little effort: just enhance format_spec service to trim & archive %{name}.changes ;-) Bonus: this would make format_spec do at least one useful thing ;-P Alternatively this could be done inside build.rpm, by /usr/lib/build/changelog2spec -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 9/19/18 10:12 PM, Stefan Seyfried wrote:
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
Am 19.09.18 um 10:59 schrieb Bernhard M. Wiedemann:
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different).
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever.
I'm not against your proposal in general. So let me rephrase my answer: It's probably hard to reach consensus on the "right" N. Ciao, Michael.
Am 19.09.18 um 22:25 schrieb Michael Ströder:
On 9/19/18 10:12 PM, Stefan Seyfried wrote:
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever.
I'm not against your proposal in general.
So let me rephrase my answer: It's probably hard to reach consensus on the "right" N.
I'd guess that most people won't care too much. And in with my proposal, full .changes will stay in obs so developers (who are most likely interested in old changes can easily find them), maybe the last line of %changelog could be "375 additional changelog lines can be found at /usr/share/doc/packages/%name/rpm-full-changelog" so even "normal" users have an easy way of finding these. I did some more (obviously non representative) statistics: seife@strolchi:/dev/shm/stat> rpm -qa | wc 2758 2758 92345 seife@strolchi:/dev/shm/stat> ls -l total 151468 -rw-r--r-- 1 seife users 9007410 Sep 19 22:25 10-changes.txt -rw-r--r-- 1 seife users 16146498 Sep 19 22:24 20-changes.text -rw-r--r-- 1 seife users 18865190 Sep 19 22:26 250-lines.txt -rw-r--r-- 1 seife users 30687951 Sep 19 22:21 500-lines.txt -rw-r--r-- 1 seife users 80379915 Sep 19 22:19 all.txt "all.txt" is all "rpm -q --changelog" concatenated. the *-lines.txt are only n lines of changelog the *-changes.txt are only n changelog entries. keeping at most 20 Changelog entries per package would save quite some space and (to me at least :-) ) would be quite sufficient. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 19, 2018 at 4:33 PM Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:
Am 19.09.18 um 22:25 schrieb Michael Ströder:
On 9/19/18 10:12 PM, Stefan Seyfried wrote:
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever.
I'm not against your proposal in general.
So let me rephrase my answer: It's probably hard to reach consensus on the "right" N.
I'd guess that most people won't care too much. And in with my proposal, full .changes will stay in obs so developers (who are most likely interested in old changes can easily find them), maybe the last line of %changelog could be "375 additional changelog lines can be found at /usr/share/doc/packages/%name/rpm-full-changelog" so even "normal" users have an easy way of finding these.
I did some more (obviously non representative) statistics:
seife@strolchi:/dev/shm/stat> rpm -qa | wc 2758 2758 92345 seife@strolchi:/dev/shm/stat> ls -l total 151468 -rw-r--r-- 1 seife users 9007410 Sep 19 22:25 10-changes.txt -rw-r--r-- 1 seife users 16146498 Sep 19 22:24 20-changes.text -rw-r--r-- 1 seife users 18865190 Sep 19 22:26 250-lines.txt -rw-r--r-- 1 seife users 30687951 Sep 19 22:21 500-lines.txt -rw-r--r-- 1 seife users 80379915 Sep 19 22:19 all.txt
"all.txt" is all "rpm -q --changelog" concatenated. the *-lines.txt are only n lines of changelog the *-changes.txt are only n changelog entries.
keeping at most 20 Changelog entries per package would save quite some space and (to me at least :-) ) would be quite sufficient.
RPM supports automatically truncating based on timeline. In Fedora, we truncate at two years: https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/macros#_2... In Mageia, we truncate at three years: http://gitweb.mageia.org/software/rpm/rpm-setup/tree/macros.in#n21 It might also help if we didn't let people put ridiculously awful changelog entries (often generated by source services or something) due to the (IMO) odd desire to put lots of upstream changelog details from VCS history. I don't even bother to review some packages' changelogs on my computer anymore because too many packages make functionally useless changelogs because they just copy VCS history in, which is too much... -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday, September 19, 2018 5:33:41 PM CDT Neal Gompa wrote:
It might also help if we didn't let people put ridiculously awful changelog entries (often generated by source services or something) due to the (IMO) odd desire to put lots of upstream changelog details from VCS history. I don't even bother to review some packages' changelogs on my computer anymore because too many packages make functionally useless changelogs because they just copy VCS history in, which is too much...
I 100% agree with this, but opposite of policy. If one says "Update to version 1.2.3" with a link to upstream changelogs that will be denied. IMO changelogs should contain RPM specific changes, spec reworking, new exposed features, build flag changes, etc, but leave upstream to upstream. If there is an exceptionally noteworthy upstream change sure, but for must packages this really does not apply. The automatic changelogs taken from vcs commits is the result since it is essentially what one has to do manually although it does not scale beyond relatively small projects. Use VCS for VCS and leave changelog to be a curated summary of relevant information. This is all off-topic of course. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 20/09/2018 08:35, Jimmy Berry wrote:
On Wednesday, September 19, 2018 5:33:41 PM CDT Neal Gompa wrote:
It might also help if we didn't let people put ridiculously awful changelog entries (often generated by source services or something) due to the (IMO) odd desire to put lots of upstream changelog details from VCS history. I don't even bother to review some packages' changelogs on my computer anymore because too many packages make functionally useless changelogs because they just copy VCS history in, which is too much...
I 100% agree with this, but opposite of policy. If one says "Update to version 1.2.3" with a link to upstream changelogs that will be denied. IMO changelogs should contain RPM specific changes, spec reworking, new exposed features, build flag changes, etc, but leave upstream to upstream. If there is an exceptionally noteworthy upstream change sure, but for must packages this really does not apply.
The automatic changelogs taken from vcs commits is the result since it is essentially what one has to do manually although it does not scale beyond relatively small projects.
Use VCS for VCS and leave changelog to be a curated summary of relevant information.
This is all off-topic of course.
Its off the topic of the original post but still on topic for this list. The best way to interpret this policy is generally somewhere in between nothing but including a link and everything. Sure if its a bug fix release and there's less then 10 commits, including them all is fine. Normally for new releases I will take something derived from the upstream release notes which hopefully include all the major changes and atleast a list of bugs that were fixed + any changes I made to packaging. This is generally far more useful then a whole commit log and with most upstreams doesn't take much work to put together. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B
On Thursday 2018-09-20 01:05, Jimmy Berry wrote:
On Wednesday, September 19, 2018 5:33:41 PM CDT Neal Gompa wrote:
It might also help if we didn't let people put ridiculously awful changelog entries (often generated by source services or something) due to the (IMO) odd desire to put lots of upstream changelog details from VCS history. I don't even bother to review some packages' changelogs on my computer anymore because too many packages make functionally useless changelogs because they just copy VCS history in, which is too much...
If one says "Update to version 1.2.3" with a link to upstream changelogs that will be denied. IMO changelogs should contain RPM specific changes, spec reworking, new exposed features, build flag changes, etc, but leave upstream to upstream.
Everything you say is - I hope - codified already in the Recommendation Page[1] (edits are welcome). [1] https://en.opensuse.org/openSUSE:Creating_a_changes_file_(RPM) "You can use SCM commit messages, if they prove to be useful. If in doubt, don't." "Be concise. Pick only the topmost interesting points:" (== new exposed features - or perhaps the antithesis of that which seriously broke/changed something) But even figuring out the new exposed features (or lack thereof and saying so with a single statement in the .changes) is something that some people seem to be totally overwhelmed with. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Il giorno Thu, 20 Sep 2018 09:36:46 +0200 (CEST) Jan Engelhardt <jengelh@inai.de> ha scritto:
But even figuring out the new exposed features (or lack thereof and saying so with a single statement in the .changes) is something that
It can be tough for larger things, when one release gives you almost 200 tarballs.
Am 20.09.18 um 00:33 schrieb Neal Gompa:
On Wed, Sep 19, 2018 at 4:33 PM Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:
keeping at most 20 Changelog entries per package would save quite some space and (to me at least :-) ) would be quite sufficient.
RPM supports automatically truncating based on timeline.
What does this do if the last change was 3 years ago, will this only keep one entry (assuming a one-year "timeout")? -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello, On Sep 19 22:12 Stefan Seyfried wrote (excerpt):
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote: ...
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever.
No hardcoded vaule would work well in practice because either the value is mostly too big but ensures that all changes since the last one that the user knows about are included or the value is mostly o.k. but fails to ensure that all changes since the last one that the user knows about are included. Think about a system upgrade from an older version where some packages have very many changelog entries. In particular think about those changelog entries that tell about security fixes (i.e. the CVE numbers) and important bug fixes (e.g. 'bsc#...' and 'boo#...'). We (at least in SUSE - perhaps openSUSE users don't care ;-) will get customer questions when expected CVE or bug numbers do not (or do no longer) appear in the RPM changelog (at least some customers check the RPM changelogs for those things). My gut feeling is that this is not the first time that people think about if the huge RPM changelogs could be improved and that nothing changed means it is actually a complicated problem to develop a solution that really works in practice. By the way: I remember the "great mess" that the "great idea" caused to move common license files out of the RPM packages and provide them in a single 'license texts' package. Unfortunately cruel legal reality did not nicely match techies' "easily save space" view of the world ;-) Kind Regards Johannes Meixner -- SUSE LINUX GmbH - GF: Felix Imendoerffer, Jane Smithard, Graham Norton - HRB 21284 (AG Nuernberg)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 20/09/2018 04.15, Johannes Meixner wrote:
Hello,
On Sep 19 22:12 Stefan Seyfried wrote (excerpt):
Am 19.09.18 um 22:01 schrieb Michael Ströder:
On 9/19/18 9:46 PM, Stefan Seyfried wrote: ...
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
500 lines. Last 20 Changelog entries. Whatever.
No hardcoded vaule would work well in practice because either the value is mostly too big but ensures that all changes since the last one that the user knows about are included or the value is mostly o.k. but fails to ensure that all changes since the last one that the user knows about are included.
Think about a system upgrade from an older version where some packages have very many changelog entries.
In particular think about those changelog entries that tell about security fixes (i.e. the CVE numbers) and important bug fixes (e.g. 'bsc#...' and 'boo#...').
We (at least in SUSE - perhaps openSUSE users don't care ;-) will get customer questions when expected CVE or bug numbers do not (or do no longer) appear in the RPM changelog (at least some customers check the RPM changelogs for those things).
My gut feeling is that this is not the first time that people think about if the huge RPM changelogs could be improved and that nothing changed means it is actually a complicated problem to develop a solution that really works in practice.
Forgive me if I look naive, but why can not the header content be compressed? Then the size of the changelog would not matter. Yes, this would mean a format change and toolset change, I know. The whole header does not need to be compressed, only some fields known to be large. - -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas)) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCW6OoZQAKCRC1MxgcbY1H 1YhEAJ9n8waunqO/NnkOR0alGyUx6Swy+gCdGiBcZsGs80D9WQJJyeN+ATq2NBI= =P3r8 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello, On 9/19/18 9:46 PM, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
just another offhanded "great idea": I think what might work in practice is to provide always all lines of the changelog as regular file /usr/share/doc/packages/%name/changelog and have in the RPM metadata only one line of text "See /usr/share/doc/packages/%name/changelog" My basic reasoning behind is to never cut the RPM changelog into pieces but keep it fully intact. Ideally 'rpm -q --changelog' would be enhanced to check if /usr/share/doc/packages/%name/changelog exists and if yes use that to display the changelog and if not fall back to display the changelog of the metadata. Kind Regards Johannes Meixner -- SUSE LINUX GmbH - GF: Felix Imendoerffer, Jane Smithard, Graham Norton - HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Moin, On Thu, 20 Sep 2018, 11:02:06 +0200, Johannes Meixner wrote:
Hello,
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
just another offhanded "great idea":
I think what might work in practice is to provide always all lines of the changelog as regular file /usr/share/doc/packages/%name/changelog and have in the RPM metadata only one line of text "See /usr/share/doc/packages/%name/changelog"
My basic reasoning behind is to never cut the RPM changelog into pieces but keep it fully intact.
Ideally 'rpm -q --changelog' would be enhanced to check if /usr/share/doc/packages/%name/changelog exists and if yes use that to display the changelog and if not fall back to display the changelog of the metadata.
trouble with that is, you would have to *install* a package first to be able to look at its changelog :(
Kind Regards Johannes Meixner
Cheers. l8er manfred
20.09.2018 12:09, Manfred Hollstein пишет:
Moin,
On Thu, 20 Sep 2018, 11:02:06 +0200, Johannes Meixner wrote:
Hello,
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
just another offhanded "great idea":
I think what might work in practice is to provide always all lines of the changelog as regular file /usr/share/doc/packages/%name/changelog and have in the RPM metadata only one line of text "See /usr/share/doc/packages/%name/changelog"
My basic reasoning behind is to never cut the RPM changelog into pieces but keep it fully intact.
Ideally 'rpm -q --changelog' would be enhanced to check if /usr/share/doc/packages/%name/changelog exists and if yes use that to display the changelog and if not fall back to display the changelog of the metadata.
trouble with that is, you would have to *install* a package first to be able to look at its changelog :(
Well, Debian has "apt-get changelog" which downloads it on demand. Could use something similar.
On Wed, 19 Sep 2018, Michael Ströder wrote:
On 9/19/18 9:46 PM, Stefan Seyfried wrote:
Am 19.09.18 um 10:59 schrieb Bernhard M. Wiedemann:
nay, that is only about payload, but most drpms consist of >90% metadata (so not payload) and that is why they are so compressible. And even normal rpms have as much metadata (just the ratio is different).
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
The "right" N is probably hard to determine.
Well, why does the "d"rpm not just contain the difference? Richard. -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Wed, Sep 19, 2018 at 09:46:16PM +0200, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
Most of the metadata is probably changelog, isn't it?
We already do that, see /usr/lib/rpm/macros: # maxnum,cuttime,minnum # 2009/03/01 (SLES11 GA) %_binarychangelogtrim 0,1235862000,10 We might want to advance the cutoff time to SLE12 GA, though ;) (The src rpms always contain the full changelog) Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am 20.09.18 um 11:23 schrieb Michael Schroeder:
# maxnum,cuttime,minnum # 2009/03/01 (SLES11 GA) %_binarychangelogtrim 0,1235862000,10
We might want to advance the cutoff time to SLE12 GA, though ;)
For SLES12, yes. For Factory, the time of $(snapshot - 1) would be even more space-saving ;-) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2018 at 7:25 AM Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:
Am 20.09.18 um 11:23 schrieb Michael Schroeder:
# maxnum,cuttime,minnum # 2009/03/01 (SLES11 GA) %_binarychangelogtrim 0,1235862000,10
We might want to advance the cutoff time to SLE12 GA, though ;)
For SLES12, yes. For Factory, the time of $(snapshot - 1) would be even more space-saving ;-)
It may make sense to just trim at ~4 years, so that the rolling window is relatively relevant w.r.t. shared SLE/Leap and TW package maintenance. That way we can drop the %_binarychangelogtrim patchset and not have to deal with forgetting to roll the changelog trimming forward. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2018 at 08:05:18AM -0400, Neal Gompa wrote:
That way we can drop the %_binarychangelogtrim patchset and not have to deal with forgetting to roll the changelog trimming forward.
No, we don't want to trim source rpms. And rolling only makes sense for factory, we can't do that for sle. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2018 at 8:07 AM Michael Schroeder <mls@suse.de> wrote:
On Thu, Sep 20, 2018 at 08:05:18AM -0400, Neal Gompa wrote:
That way we can drop the %_binarychangelogtrim patchset and not have to deal with forgetting to roll the changelog trimming forward.
No, we don't want to trim source rpms. And rolling only makes sense for factory, we can't do that for sle.
But the full transformed changelog is in the SRPM spec file...? It's not like the binary packages where there would be no way to get the full changelog anyway... And why couldn't we do it for SLE going forward? -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 2018-09-20T08:12:36, Neal Gompa <ngompa13@gmail.com> wrote: Given how useless most changelog entries are ("updated to version X", as if I can't tell that from the package version string), trimming them seems sensible. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 9/20/18 2:29 PM, Lars Marowsky-Bree wrote:
Given how useless most changelog entries are ("updated to version X", as if I can't tell that from the package version string), trimming them seems sensible.
I'd never trust an RPM changelog to contain all relevant upstream changes because maintainers are free to leave anything at their will. So for the above is one of the most reasonable changelog entries. Because it tells me to consult the *original upstream changelog* when having issues possibly caused by an upstream regression. Ciao, Michael. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 20/09/2018 20:55, Stefan Seyfried wrote:
Am 20.09.18 um 11:23 schrieb Michael Schroeder:
# maxnum,cuttime,minnum # 2009/03/01 (SLES11 GA) %_binarychangelogtrim 0,1235862000,10
We might want to advance the cutoff time to SLE12 GA, though ;)
For SLES12, yes. For Factory, the time of $(snapshot - 1) would be even more space-saving ;-)
This is possibly a question better answered by SLE release managers as it impacts them far more then openSUSE, given SLE 11-SP4 is still supported which inherits a bunch of packages from older SLE releases we might want to wait until it goes end of life to bump it up. Although you could make a case that its ok to do it in tumbleweed now, because while SLE-11 -> SLE-15 is a valid and supported upgrade path, SLE-11 -> SLE-16 won't be (but SLE-12 -> SLE-16 will be). -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B
On 20/09/2018 11:23, Michael Schroeder wrote:
On Wed, Sep 19, 2018 at 09:46:16PM +0200, Stefan Seyfried wrote:
How about "let's only store max. N lines of changelog in rpm metadata" (and just put the rest into the package in /usr/share/doc/packages/%name/old-rpm-changelog.txt)
Most of the metadata is probably changelog, isn't it?
We already do that, see /usr/lib/rpm/macros:
# maxnum,cuttime,minnum # 2009/03/01 (SLES11 GA) %_binarychangelogtrim 0,1235862000,10
We might want to advance the cutoff time to SLE12 GA, though ;)
(The src rpms always contain the full changelog)
It would make things easier if a srpm contained the .changes file but unfortunately it only has the normal uncompressed CHANGES. Dave
Cheers, Michael.
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 2018-09-19 21:46, Stefan Seyfried wrote:
Most of the metadata is probably changelog, isn't it?
maybe it is not. e.g. looking at http://download.opensuse.org/distribution/leap/15.0/repo/oss/noarch/zramcfg-... rpm -qp --changelog zramcfg-0.2-lp150.1.5.noarch.rpm |wc 12 54 346 strings zramcfg-0.2-lp150.1.5.noarch.rpm |head -n 186|wc 186 564 5792 There are headers (license, description) %post scripts provides requires many "root" strings and sha256sums for the next file the ratio is better: 1:2 rpm -qp --changelog zvbi-lang-0.2.35-lp150.2.5.noarch.rpm|wc 55 240 1665 strings zvbi-lang-0.2.35-lp150.2.5.noarch.rpm|head -n 136|wc 136 303 3478 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 19, 2018 at 4:53 AM Jan Engelhardt <jengelh@inai.de> wrote:
On Wednesday 2018-09-19 10:47, Bernhard M. Wiedemann wrote:
Hi,
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433
The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip.
Updating %_binary_payload in prjconf or rpm-config-SUSE should do the job, at least for the plain BRPMs. %_source_payload for SRPMS. Not sure about drpms - AFAIU, they count as BRPMs.
So I looked into this in the context of doing this in Fedora. While it is true that it's faster and uses a lot less memory, the on-disk usage goes up by ~5-15% (zstd 9 vs xz 2), depending on the package. So for the moment, I've shelved the idea of zstd RPMs until I can get a better picture on how to get better compression without it taking longer than xz. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, 19 Sep 2018, Neal Gompa wrote:
On Wed, Sep 19, 2018 at 4:53 AM Jan Engelhardt <jengelh@inai.de> wrote:
On Wednesday 2018-09-19 10:47, Bernhard M. Wiedemann wrote:
Hi,
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433
The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip.
Updating %_binary_payload in prjconf or rpm-config-SUSE should do the job, at least for the plain BRPMs. %_source_payload for SRPMS. Not sure about drpms - AFAIU, they count as BRPMs.
So I looked into this in the context of doing this in Fedora. While it is true that it's faster and uses a lot less memory, the on-disk usage goes up by ~5-15% (zstd 9 vs xz 2), depending on the package. So for the moment, I've shelved the idea of zstd RPMs until I can get a better picture on how to get better compression without it taking longer than xz.
It doesn't compress metadata though, not sure if rpm can now compress things like %changelog data? Richard. -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 19, 2018 at 6:51 AM Richard Biener <rguenther@suse.de> wrote:
On Wed, 19 Sep 2018, Neal Gompa wrote:
On Wed, Sep 19, 2018 at 4:53 AM Jan Engelhardt <jengelh@inai.de> wrote:
On Wednesday 2018-09-19 10:47, Bernhard M. Wiedemann wrote:
Hi,
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433
The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip.
Updating %_binary_payload in prjconf or rpm-config-SUSE should do the job, at least for the plain BRPMs. %_source_payload for SRPMS. Not sure about drpms - AFAIU, they count as BRPMs.
So I looked into this in the context of doing this in Fedora. While it is true that it's faster and uses a lot less memory, the on-disk usage goes up by ~5-15% (zstd 9 vs xz 2), depending on the package. So for the moment, I've shelved the idea of zstd RPMs until I can get a better picture on how to get better compression without it taking longer than xz.
It doesn't compress metadata though, not sure if rpm can now compress things like %changelog data?
Nope. Headers still can't be compressed. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 2018-09-19 12:35, Neal Gompa wrote:
I had noticed that our rpm files and especially .drpm files can get a lot smaller by compressing them with xz or gzip.
Prior report: http://bugzilla.novell.com/show_bug.cgi?id=557433
The current stance is that *zstd* is the new kid on the block (and rpm supports it now, too), compressing nearly as strong as xz, but in the timespace of gzip.
[Fedora] the on-disk usage goes up by ~5-15% (zstd 9 vs xz 2)
The graph at https://raw.githubusercontent.com/facebook/zstd/master/doc/images/DCspeed5.p... is based on a linguistic corpus. Would be cool to see it for the set of BRPMs of a distribution. Then again, there is no good trivial command (like "gzip -d a.gz; gzip -9 a") yet to do such a test. :-/
, depending on the package. So for the moment, I've shelved the idea of zstd RPMs until I can get a better picture on how to get better compression without it taking longer than xz. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 19, 2018 at 10:47:38AM +0200, Bernhard M. Wiedemann wrote:
Hi,
Now I was wondering: 1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files. Maybe not all of them need a patch - e.g. if libzypp uncompresses files on the user side before passing them on to further processing, then libsolv and librpm dont need any change.
I think it will be difficult to fix all the tools in the infrastructure, SMT, SUSE Manager, and third party tools. Especially the ones not under our control will be unfixable. Ciao, Marcus -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 2018-09-19 10:56, Marcus Meissner wrote:
On Wed, Sep 19, 2018 at 10:47:38AM +0200, Bernhard M. Wiedemann wrote:
Now I was wondering: 1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files. Maybe not all of them need a patch - e.g. if libzypp uncompresses files on the user side before passing them on to further processing, then libsolv and librpm dont need any change.
I think it will be difficult to fix all the tools in the infrastructure, SMT, SUSE Manager, and third party tools. Especially the ones not under our control will be unfixable.
It depends... https://github.com/SUSE/rmt/search?q=rpm&unscoped_q=rpm https://github.com/SUSE/smt/search?q=rpm&unscoped_q=rpm shows it is mostly interested in repo metadata and smt is legacy-only (SLE-12) anyway. and most tools will just use librpm or call rpm directly instead of implementing their own rpm parser.
With https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#precompressed or https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#inflate this can even save disk-space by storing only .drpm.gz files on the server but delivering to clients whatever is requested. But then again, synchronizing such a mirror with rsync is harder
gzip compression during transfer via apache's mod_deflate (can be limited to the .drpm extension)
(1) How about offering two mirror sources? Mirrors can then choose which one to mirror. (2) How many mirrors cannot easily activate mod_deflate precompressed (or equivalent)? Would it be ok to lose mirrors that don't support it? (3) If there are too many applications that download the drpms to modify them all, how about running an http proxy server on localhost that transparently edits the download URL and decompresses the payload and running zypper and yast with $http_proxy set accordingly? (Doesn't work with https but last I checked my proxy logs (~3 years ago) it was all plain http.) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 19, 2018 at 10:47:38AM +0200, Bernhard M. Wiedemann wrote:
[...] Now I was wondering: 1) how much effort would it be to patch librpm, libsolv, libzypp, createrepo, OBS and other tools to support .rpm.xz files. Maybe not all of them need a patch - e.g. if libzypp uncompresses files on the user side before passing them on to further processing, then libsolv and librpm dont need any change.
If this is just about making drpms smaller it would be much easier to simply add a new drpm format that uses a compressed header. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (18)
-
Andrei Borzenkov
-
Bernhard M. Wiedemann
-
Carlos E. R.
-
Dave Plater
-
Jan Engelhardt
-
Jimmy Berry
-
Joachim Wagner
-
Johannes Meixner
-
Lars Marowsky-Bree
-
Luca Beltrame
-
Manfred Hollstein
-
Marcus Meissner
-
Michael Schroeder
-
Michael Ströder
-
Neal Gompa
-
Richard Biener
-
Simon Lees
-
Stefan Seyfried