[opensuse-factory] RPM compression in openSUSE for 201906
These is this month's compression shootout. News: * Fedora announced intent to switch rpms to Zstd compression; the reported timings on the proposal paper https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression suggest to me that the measurement was not done with due statistical care with regard to CPU thermal throttling characteristics. * Earlier this year Ubuntu switched kernel and initramfs to lz4 https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/040258.html * In both of these instances, decompression time is the hot new topic. (Something the previous openSUSE report, https://lists.opensuse.org/opensuse-factory/2019-05/msg00344.html did not analyze) Change in methods: * I cut the testset to just x86_64 rpms (dropping *.noarch.rpm) to cut time. The proportion of executables over data should be higher now. * Ran xz only from -1..-5; we all know its progression characteristics (-6..-9) from older tests. * Added brotli, all levels New results: * http://paste.opensuse.org/view/97377621 - compression * http://paste.opensuse.org/view/80494192 - decompression * http://inai.de/files/openSUSE-compression-201906.ods - details Interpretation: * The ups and downs in the decompression times are read as jitter; they often complete in less than a minute. * For a particular algorithm, decompression speed is unaffected by chosen compression level. This means the choice of level is only influenced by the desired compression-time characteristics. * zstd decompression is decidedly faster than xz (~4.3x). Fedora is making sensible choices. * lz4 decompression is 15% slower than zstd. Ubuntu is making suboptimal choices again. * Brotli: only marginally better in the low-to-medium-level compression than zstd. Equally weak in decomp like lz4. That's probably the reasons it was not worth supporting in rpm. * Fedora's plan to replace xz-2 with zstd-18 will at least double their compression times. * Replacing openSUSE's xz-5 with zstd-18 gives the decompression benefit, no improvement in compression time and a slight space increase. * xz-5 to zstd-10 would also give a 7x compression time saving with a slight more space increase. (23.6->27.9G) So with that, I propose changing openSUSE to zstd-10 that will give people time back they're waiting for.. for the computer. xkcd.com/303 . The problem is that even though we have a suitable version of rpm since Dec 2017, the maintainer did not enable zstd so the "compatibility time" accumulated since was unfortunately wasted. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, Jun 07, 2019 at 04:58:36PM +0200, Jan Engelhardt wrote:
* Fedora announced intent to switch rpms to Zstd compression; the reported timings on the proposal paper https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression suggest to me that the measurement was not done with due statistical care with regard to CPU thermal throttling characteristics.
That's a proposal, nothing is decided yet. See the mail thread at the fedora-devel list: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
[...] The problem is that even though we have a suitable version of rpm since Dec 2017, the maintainer did not enable zstd so the "compatibility time" accumulated since was unfortunately wasted.
It makes zero sense to enable features and add library dependencies when there is no plan to use them. And zstd isn't even is SLES (but fortunately in Leap). Just sayin... Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 2019-06-07 17:13, Michael Schroeder wrote:
On Fri, Jun 07, 2019 at 04:58:36PM +0200, Jan Engelhardt wrote:
* Fedora announced intent to switch rpms to Zstd compression; the reported timings on the proposal paper https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression suggest to me that the measurement was not done with due statistical care with regard to CPU thermal throttling characteristics.
That's a proposal, nothing is decided yet. See the mail thread at the fedora-devel list:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
[...] The problem is that even though we have a suitable version of rpm since Dec 2017, the maintainer did not enable zstd so the "compatibility time" accumulated since was unfortunately wasted.
It makes zero sense to enable features and add library dependencies when there is no plan to use them.
Oh there is absolutely a plan to use them in some capacity - this thread and the meanwhile two submit requests for Base:System/rpm that get no response should be strong indicators. Fedora 30 can produce Zstd RPMs - and openSUSE cannot read them, which is a... suboptimal situation to boot. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, Jun 09, 2019 at 01:33:35PM +0200, Jan Engelhardt wrote:
On Friday 2019-06-07 17:13, Michael Schroeder wrote:
It makes zero sense to enable features and add library dependencies when there is no plan to use them.
Oh there is absolutely a plan to use them in some capacity - this thread and the meanwhile two submit requests for Base:System/rpm that get no response should be strong indicators.
You were talking about Dec 2017. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tuesday 2019-06-11 11:45, Michael Schroeder wrote:
On Sun, Jun 09, 2019 at 01:33:35PM +0200, Jan Engelhardt wrote:
On Friday 2019-06-07 17:13, Michael Schroeder wrote:
It makes zero sense to enable features and add library dependencies when there is no plan to use them.
Oh there is absolutely a plan to use them in some capacity - this thread and the meanwhile two submit requests for Base:System/rpm that get no response should be strong indicators.
You were talking about Dec 2017.
Where? 706643 State:new By:Pharaoh_Atem When:2019-05-31T10:34:29 708407 State:new By:jengelh When:2019-06-07T15:38:34 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jun 11, 2019 at 12:06:32PM +0200, Jan Engelhardt wrote:
On Tuesday 2019-06-11 11:45, Michael Schroeder wrote:
On Sun, Jun 09, 2019 at 01:33:35PM +0200, Jan Engelhardt wrote:
On Friday 2019-06-07 17:13, Michael Schroeder wrote:
It makes zero sense to enable features and add library dependencies when there is no plan to use them.
Oh there is absolutely a plan to use them in some capacity - this thread and the meanwhile two submit requests for Base:System/rpm that get no response should be strong indicators.
You were talking about Dec 2017.
Where?
In your first mail: "The problem is that even though we have a suitable version of rpm since Dec 2017, the maintainer did not enable zstd so the "compatibility time" accumulated since was unfortunately wasted." Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
[I keep an experimental repo for Cygwin in ZStd compression with the intent of eventually getting the official repo switched over. My comments only apply to ZStd in comparison to Xz.] Jan Engelhardt writes:
* The ups and downs in the decompression times are read as jitter; they often complete in less than a minute.
* For a particular algorithm, decompression speed is unaffected by chosen compression level. This means the choice of level is only influenced by the desired compression-time characteristics.
In my experience there is a slight degradation in decompression speed for the highest compression levels when using large files that compress well (it depends a bit on how well the dictionary turns out), but the overall wall time still improves (more so if you count the download time). The whole point of going to ZStd for me was to be able to decompress to local disk as fast as a filer on the LAN will deliver the data. As long as the compression ratio is better than 1:5 that works out as intended and I've been pulling data at around 600 MBit/s peak from the repo on current hardware w/ NVMe as the local disk (i.e. I can write to the local file system at around 500 MiB/s peak, average is more like 30MiB/s of course due to the file system overhead for small files).
* zstd decompression is decidedly faster than xz (~4.3x). Fedora is making sensible choices.
It's also faster than any of the alternatives if you target similar compression ratio as Xz. As said above, the repo size should not be neglected as most users will have to download from a relatively slow connection w.r.t. the local decompression to disk speed.
* Replacing openSUSE's xz-5 with zstd-18 gives the decompression benefit, no improvement in compression time and a slight space increase.
I can confirm that this pair is roughly comparable in compression speed across a representative range of processors. I specifically timed an Ivy Bridge Celeron as a a sensible low-end for that task. I have no data for Atoms, but I don't expect them to fall too far out of line relative to the general performance ratio.
* xz-5 to zstd-10 would also give a 7x compression time saving with a slight more space increase. (23.6->27.9G)
Currently I compress with zstd --ultra -22 for about 3% more disk space in the repo vs. the Xz compressed packages. This comes back to bite me for the really large packages that then take _much_ longer than with Xz to compress, but these are not common enough for me to implement an heuristic for dialing back to -18 for these. This setting does consume noticeably more memory, which may be a concern in certain setups. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptations for KORG EX-800 and Poly-800MkII V0.9: http://Synth.Stromeko.net/Downloads.html#KorgSDada -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 2019-06-07 21:14, Achim Gratz wrote:
In my experience there is a slight degradation in decompression speed for the highest compression levels
Certainly. As there is only so much CPU cache, one could expect more evictions and reloads. So I'll reword: the algorithms seem still so computationally expensive that memory timing effects from increased decompression requirements don't visibly show yet, for me.
The whole point of going to ZStd for me was to be able to decompress to local disk as fast as a filer on the LAN will deliver the data. As long as the compression ratio is better than 1:5 [5.00×?]
xz-9 does 3.5× on openSUSE packages on average; getting 5.0× requires data with more repetitions/patterns, e.g. text files/source code.
the repo size should not be neglected as most users will have to download from a relatively slow connection
Yes, I know that - especially - Germany is notorious for its unwillingness to procure contemporary Internet speeds and then some, but without actual statistics, claims such as that seem more like a folktale/anecdote. As long as there are Steam games where updates are at least as big as the openSUSE first-time installation, I feel no regrets :-) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, 7 Jun 2019 16:58:36 +0200 (CEST)
Jan Engelhardt
These is this month's compression shootout. ... Interpretation:
* The ups and downs in the decompression times are read as jitter; they often complete in less than a minute.
* For a particular algorithm, decompression speed is unaffected by chosen compression level. This means the choice of level is only influenced by the desired compression-time characteristics.
* zstd decompression is decidedly faster than xz (~4.3x). Fedora is making sensible choices.
* lz4 decompression is 15% slower than zstd. Ubuntu is making suboptimal choices again.
* Brotli: only marginally better in the low-to-medium-level compression than zstd. Equally weak in decomp like lz4. That's probably the reasons it was not worth supporting in rpm.
* Fedora's plan to replace xz-2 with zstd-18 will at least double their compression times.
* Replacing openSUSE's xz-5 with zstd-18 gives the decompression benefit, no improvement in compression time and a slight space increase.
* xz-5 to zstd-10 would also give a 7x compression time saving with a slight more space increase. (23.6->27.9G)
Nice summary for speed How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Saturday 2019-06-08 21:48, Michal Suchánek wrote:
How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html
I'll let this reponse speak for me: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sat, 8 Jun 2019 22:08:11 +0200 (CEST)
Jan Engelhardt
On Saturday 2019-06-08 21:48, Michal Suchánek wrote:
How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html
I'll let this reponse speak for me: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
It says 'most of the critique has been debunked' but without any explanation whatsoever. Whereas the article while bashing one particular format gives actual technical reasons why it is considered flawed. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Michal Suchánek writes:
How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html
I haven't gone over the details, but the format is described here: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md This seems to address most of the criticism the article you cited, although I'd prefer to have a more neutral checklist that doesn't grind an axe against one particular implementation. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf Q+, Q and microQ: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, 09 Jun 2019 08:48:43 +0200
Achim Gratz
Michal Suchánek writes:
How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html
I haven't gone over the details, but the format is described here:
https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
This seems to address most of the criticism the article you cited, although I'd prefer to have a more neutral checklist that doesn't grind an axe against one particular implementation.
The article mainly criticizes - variable size of everything which prevents you making sense of stream with a bitflip in one of the headers that determine what size everything is. This makes recovering good data from slightly corrupted file very difficult. - nonsensical checksums that don't really add to data integrity. The situation with zstd is not much better. The optional checksum is attached at the very end of a series of variable-everything blocks. At least it checksums the decompressed data which gives some integrity check for RLE and raw blocks that don't benefit from internal checks of zstd algorithm itself. Unfortunately, you are required to write blocks as raw when you cannot compress them smaller. From the point of view of data integrity and recovery the zstd format is not particularly awesome AFAICS. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, 9 Jun 2019 15:59:20 +0200
Michal Suchánek
On Sun, 09 Jun 2019 08:48:43 +0200 Achim Gratz
wrote: Michal Suchánek writes:
How does zstd compare to xz in robustness? https://www.nongnu.org/lzip/xz_inadequate.html
I haven't gone over the details, but the format is described here:
https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
This seems to address most of the criticism the article you cited, although I'd prefer to have a more neutral checklist that doesn't grind an axe against one particular implementation.
The article mainly criticizes
- variable size of everything which prevents you making sense of stream with a bitflip in one of the headers that determine what size everything is. This makes recovering good data from slightly corrupted file very difficult. - nonsensical checksums that don't really add to data integrity. The situation with zstd is not much better. The optional checksum is attached at the very end of a series of variable-everything blocks. At least it checksums the decompressed data which gives some integrity check for RLE and raw blocks that don't benefit from internal checks of zstd algorithm itself. Unfortunately, you are required to write blocks as raw when you cannot compress them smaller.
From the point of view of data integrity and recovery the zstd format is not particularly awesome AFAICS.
That said, this is not particularly critical for use with rpm. The packages are protected by strong cryptographic hash and signature anyway so you should not even get a corrupted rpm package in hand. For the compression format in general it should either provide decent corruption protection or none at all. Both of the formats are flawed in this respect. This might be somewhat more relevant for something like initrd which tends to be transferred over networks and stored on media of questionable quality which are accessed by experimental hardware drivers. In this case you might want to do corruption protection independent of the compression. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Monday 2019-06-10 15:43, Michal Suchánek wrote:
From the point of view of data integrity and recovery the zstd format is not particularly awesome AFAICS.
That said, this is not particularly critical for use with rpm. The packages are protected by strong cryptographic hash and signature anyway so you should not even get a corrupted rpm package in hand. [...] This might be somewhat more relevant for something like initrd which tends to be transferred over networks
So just sign the initrd as well, either the kernel or a potent bootloader can check it :-) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, 10 Jun 2019 15:58:24 +0200 (CEST)
Jan Engelhardt
On Monday 2019-06-10 15:43, Michal Suchánek wrote:
From the point of view of data integrity and recovery the zstd format is not particularly awesome AFAICS.
That said, this is not particularly critical for use with rpm. The packages are protected by strong cryptographic hash and signature anyway so you should not even get a corrupted rpm package in hand. [...] This might be somewhat more relevant for something like initrd which tends to be transferred over networks
So just sign the initrd as well, either the kernel or a potent bootloader can check it :-)
And that's the thing: for secure boot the bootloader verifies it, not the kernel. And that's not supported on all platforms. When the kernel gets to reading the initrd it has not idea if the bootloader really verified it or not. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 08/06/2019 00:28, Jan Engelhardt wrote:
These is this month's compression shootout.
* Fedora's plan to replace xz-2 with zstd-18 will at least double their compression times.
For those of us actively trying to make things like the kernel build significantly faster because the current build time is too slow this kind of build time penalty wouldn't be acceptable no matter how much space it saved or time it saved extracting. In these larger packages compression etc takes up a significant percentage of the build time. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, 9 Jun 2019 21:19:06 +0930
Simon Lees
On 08/06/2019 00:28, Jan Engelhardt wrote:
These is this month's compression shootout.
* Fedora's plan to replace xz-2 with zstd-18 will at least double their compression times.
For those of us actively trying to make things like the kernel build significantly faster because the current build time is too slow this kind of build time penalty wouldn't be acceptable no matter how much space it saved or time it saved extracting. In these larger packages compression etc takes up a significant percentage of the build time.
The question is: is it compression or "etc" that takes significant amount of time? Compression takes quite a bit of time but in my experience it is "etc" that takes most. Also during the build the package is compressed twice and decompressed twice (AFAIK) so saving on decompression may still offset higher compression cost to some extent. And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz to save on compression time. You are supposed to be able to tune the compression method and parameters but the setting is lost for subpackages. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, Jun 9, 2019 at 10:51 AM Jan Engelhardt
On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
I'm pretty sure it's to maintain support from dumb things like SLE 11 to SLE 15 direct upgrades. Which people *should not* do! -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, 9 Jun 2019 10:58:47 -0400
Neal Gompa
On Sun, Jun 9, 2019 at 10:51 AM Jan Engelhardt
wrote: On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
I'm pretty sure it's to maintain support from dumb things like SLE 11 to SLE 15 direct upgrades. Which people *should not* do!
https://build.opensuse.org/package/view_file/Kernel:stable/kernel-default/ke... line 150 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 10/06/2019 01:53, Michal Suchánek wrote:
On Sun, 9 Jun 2019 10:58:47 -0400 Neal Gompa
wrote: On Sun, Jun 9, 2019 at 10:51 AM Jan Engelhardt
wrote: On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
I'm pretty sure it's to maintain support from dumb things like SLE 11 to SLE 15 direct upgrades. Which people *should not* do!
https://build.opensuse.org/package/view_file/Kernel:stable/kernel-default/ke... line 150
Well the good news is in tumbleweed we now only have to care about upgrades to SLE-12 which means we can now look at this again which is now something on my todo list. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, 10 Jun 2019 06:57:13 +0930
Simon Lees
On 10/06/2019 01:53, Michal Suchánek wrote:
On Sun, 9 Jun 2019 10:58:47 -0400 Neal Gompa
wrote: On Sun, Jun 9, 2019 at 10:51 AM Jan Engelhardt
wrote: On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
I'm pretty sure it's to maintain support from dumb things like SLE 11 to SLE 15 direct upgrades. Which people *should not* do!
https://build.opensuse.org/package/view_file/Kernel:stable/kernel-default/ke... line 150
Well the good news is in tumbleweed we now only have to care about upgrades to SLE-12 which means we can now look at this again which is now something on my todo list.
With the bzip2 compression you can install kernel-default (but not the KMPs because the compression settings are not inherited to subpackages) and kernel-vanilla from Tumbleweed on SLE11 (to test if a bug is fixed upstream). So it is somewhat useful to have this ability. As pointed out in the RH discussion it is not too difficult to add new compression methods to existing rpm (provided the required compression library version is already in the distribution). It is even easier to keep the kernel compressed with a compatible compression method as is now. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, Jun 09, 2019 at 04:51:15PM +0200, Jan Engelhardt wrote:
On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
Btw, bzip2 is really good if you run it multithreaded, i.e. lbzip2: https://community.centminmod.com/threads/compression-comparison-benchmarks-z... Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tuesday 2019-06-11 11:50, Michael Schroeder wrote:
On Sun, Jun 09, 2019 at 04:51:15PM +0200, Jan Engelhardt wrote:
On Sunday 2019-06-09 15:44, Michal Suchánek wrote:
And as said in the e-mail you cite you can save on both compression and decompression at the cost of little space. Also the kernel tries (unsuccessfully) to use bzip2 instead of xz
That is not true (in my opinion). The choice for bzdio on kernel*.spec was made so it can be installed on some class of by-now ancient SUSE systems. bzip2 is a bad choice today: it's neither the fastest, nor the most compressing, nor the most compatible.
Btw, bzip2 is really good if you run it multithreaded, i.e. lbzip2 https://community.centminmod.com/threads/compression-comparison-benchmarks-z...
Maybe. But the SUSE rpm does not use a multithreaded bzip2 compressor AFAICT. Other problems with that benchmark is that it is using only a limited corpus of 211MB (Silesia again). That does not leave a lot of room for by-chunk parallelization when the default window size is already 128MB (e.g. zstd). The benchmarks I had run should be more representative of OBS, both use multiple instances of single-threaded compressors. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, Jun 09, 2019 at 03:44:54PM +0200, Michal Suchánek wrote:
The question is: is it compression or "etc" that takes significant amount of time? Compression takes quite a bit of time but in my experience it is "etc" that takes most.
You want statistics? I can help: Factory Ring Statistics: ------------------------ (All data in seconds) startup: mean: 10.3 deviation: 9.1 median(0 25 50 75 80 90 95 99 100): 4 6 7 10 13 21 29 44 164 vm: mean: 7.0 deviation: 5.3 median(0 25 50 75 80 90 95 99 100): 2 5 5 8 8 11 13 29 116 pkginstall: mean: 21.4 deviation: 19.5 median(0 25 50 75 80 90 95 99 100): 1 7 18 28 31 44 55 89 234 build_start: mean: 1.7 deviation: 1.0 median(0 25 50 75 80 90 95 99 100): 0 1 2 2 2 3 3 5 15 build_prep: mean: 0.8 deviation: 4.6 median(0 25 50 75 80 90 95 99 100): 0 0 0 0 1 1 2 13 107 build_build: mean: 94.7 deviation: 578.1 median(0 25 50 75 80 90 95 99 100): 0 1 11 36 47 111 245 1486 13675 build_install: mean: 15.5 deviation: 328.6 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 4 12 24 96 14701 build_dbgextract: mean: 23.9 deviation: 171.4 median(0 25 50 75 80 90 95 99 100): 0 1 2 4 6 21 60 415 5541 build_collectfiles: mean: 10.0 deviation: 110.0 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 4 11 26 112 5070 build_writerpms: mean: 16.9 deviation: 106.7 median(0 25 50 75 80 90 95 99 100): 0 0 1 4 6 21 50 268 3348 build_clean: mean: 0.1 deviation: 0.3 median(0 25 50 75 80 90 95 99 100): 0 0 0 0 0 0 1 1 7 build_post: mean: 7.6 deviation: 38.8 median(0 25 50 75 80 90 95 99 100): 0 1 2 4 5 9 17 145 1287 rpmlint: mean: 4.2 deviation: 17.6 median(0 25 50 75 80 90 95 99 100): 0 1 1 2 3 6 13 54 644 buildcmp: mean: 10.1 deviation: 65.6 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 5 12 29 172 2102 finish: mean: 4.0 deviation: 2.8 median(0 25 50 75 80 90 95 99 100): 2 3 3 4 4 5 7 13 98 Here's how to interpret the median line: median(0 25 50 75 80 90 95 99 100): 0 1 2 4 6 21 60 415 5541 means that 50% of the packages took less then 2 seconds means that 75% of the packages took less then 4 seconds means that 99% of the packages took less then 415 seconds Here's the top 10 for the compile and link step (build_build): build_build: mean: 94.7 deviation: 578.1 median(0 25 50 75 80 90 95 99 100): 0 1 11 36 47 111 245 1486 13675 rust: 13675 libqt5-qtwebengine: 11411 libreoffice: 9792 llvm7: 7674 ceph: 6996 llvm6: 6833 java-11-openjdk: 6340 gcc7: 6337 libqt5-qtwebkit: 6225 kernel-vanilla: 5940 Here's the top 10 for the compression step (build_writerpm): build_writerpms: mean: 16.9 deviation: 106.7 median(0 25 50 75 80 90 95 99 100): 0 0 1 4 6 21 50 268 3348 ceph: 3348 llvm6: 1541 llvm7: 1516 MozillaFirefox: 1359 MozillaThunderbird: 1359 gcc8: 1245 gcc7: 971 libqt5-qtbase: 960 kernel-vanilla: 942 mariadb: 919 Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, 11 Jun 2019 09:43:26 +0000
Michael Schroeder
On Sun, Jun 09, 2019 at 03:44:54PM +0200, Michal Suchánek wrote:
The question is: is it compression or "etc" that takes significant amount of time? Compression takes quite a bit of time but in my experience it is "etc" that takes most.
You want statistics? I can help:
Factory Ring Statistics: ------------------------
(All data in seconds)
startup: mean: 10.3 deviation: 9.1 median(0 25 50 75 80 90 95 99 100): 4 6 7 10 13 21 29 44 164
vm: mean: 7.0 deviation: 5.3 median(0 25 50 75 80 90 95 99 100): 2 5 5 8 8 11 13 29 116
pkginstall: mean: 21.4 deviation: 19.5 median(0 25 50 75 80 90 95 99 100): 1 7 18 28 31 44 55 89 234
build_start: mean: 1.7 deviation: 1.0 median(0 25 50 75 80 90 95 99 100): 0 1 2 2 2 3 3 5 15
build_prep: mean: 0.8 deviation: 4.6 median(0 25 50 75 80 90 95 99 100): 0 0 0 0 1 1 2 13 107
build_build: mean: 94.7 deviation: 578.1 median(0 25 50 75 80 90 95 99 100): 0 1 11 36 47 111 245 1486 13675
build_install: mean: 15.5 deviation: 328.6 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 4 12 24 96 14701
build_dbgextract: mean: 23.9 deviation: 171.4 median(0 25 50 75 80 90 95 99 100): 0 1 2 4 6 21 60 415 5541
build_collectfiles: mean: 10.0 deviation: 110.0 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 4 11 26 112 5070
build_writerpms: mean: 16.9 deviation: 106.7 median(0 25 50 75 80 90 95 99 100): 0 0 1 4 6 21 50 268 3348
build_clean: mean: 0.1 deviation: 0.3 median(0 25 50 75 80 90 95 99 100): 0 0 0 0 0 0 1 1 7
build_post: mean: 7.6 deviation: 38.8 median(0 25 50 75 80 90 95 99 100): 0 1 2 4 5 9 17 145 1287
rpmlint: mean: 4.2 deviation: 17.6 median(0 25 50 75 80 90 95 99 100): 0 1 1 2 3 6 13 54 644
buildcmp: mean: 10.1 deviation: 65.6 median(0 25 50 75 80 90 95 99 100): 0 0 1 3 5 12 29 172 2102
finish: mean: 4.0 deviation: 2.8 median(0 25 50 75 80 90 95 99 100): 2 3 3 4 4 5 7 13 98
Here's how to interpret the median line:
median(0 25 50 75 80 90 95 99 100): 0 1 2 4 6 21 60 415 5541
means that 50% of the packages took less then 2 seconds means that 75% of the packages took less then 4 seconds means that 99% of the packages took less then 415 seconds
Here's the top 10 for the compile and link step (build_build):
build_build: mean: 94.7 deviation: 578.1 median(0 25 50 75 80 90 95 99 100): 0 1 11 36 47 111 245 1486 13675 rust: 13675 libqt5-qtwebengine: 11411 libreoffice: 9792 llvm7: 7674 ceph: 6996 llvm6: 6833 java-11-openjdk: 6340 gcc7: 6337 libqt5-qtwebkit: 6225 kernel-vanilla: 5940
Here's the top 10 for the compression step (build_writerpm):
build_writerpms: mean: 16.9 deviation: 106.7 median(0 25 50 75 80 90 95 99 100): 0 0 1 4 6 21 50 268 3348 ceph: 3348 llvm6: 1541 llvm7: 1516 MozillaFirefox: 1359 MozillaThunderbird: 1359 gcc8: 1245 gcc7: 971 libqt5-qtbase: 960 kernel-vanilla: 942 mariadb: 919
So according to this statistic the kernel build takes much longer than kernel compression. What this does not take into account is that much time is spent in some time kind of rpm checkers some of which are run as part of build (ie file deduplication for kernel-source which is required because it has so many duplicate files it would overflow badness limit). And it does not give statistics for the other parts (ie debuginfo extraction and checkers accounted separately). All in all I am surprised that kernel-vanilla occupies top places when kernel-default does more work and did not even make it to the top list. I suspect some bias in the data. Thanks Michal -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, Jun 11, 2019 at 01:23:06PM +0200, Michal Suchánek wrote:
So according to this statistic the kernel build takes much longer than kernel compression. What this does not take into account is that much time is spent in some time kind of rpm checkers some of which are run as part of build (ie file deduplication for kernel-source which is required because it has so many duplicate files it would overflow badness limit).
And it does not give statistics for the other parts (ie debuginfo extraction and checkers accounted separately).
Here's all the data for kernel-vanilla and kernel-default: kernel-vanilla: startup: 22 vm: 7 pkginstall: 32 build_start: 6 build_prep: 99 build_build: 5940 build_install: 160 build_dbgextract: 700 build_collectfiles: 869 build_writerpms: 942 build_clean: 7 build_post: 716 rpmlint: 91 buildcmp: 1 finish: 23 kernel-default 12 7 32 5 50 2594 67 281 381 636 2 533 65 1 5 startup: 12 vm: 7 pkginstall: 32 build_start: 5 build_prep: 50 build_build: 2594 build_install: 67 build_dbgextract: 281 build_collectfiles: 381 build_writerpms: 636 build_clean: 2 build_post: 533 rpmlint: 65 buildcmp: 1 finish: 5
All in all I am surprised that kernel-vanilla occupies top places when kernel-default does more work and did not even make it to the top list. I suspect some bias in the data.
Sure, it's just the data from the last successful build. To make it really meaningful it would need to be averaged over a couple of builds. Still, it is useful to find out how percentages, i.e. how does the build time compare to the compression time. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Jun 11 2019, Michael Schroeder
Sure, it's just the data from the last successful build. To make it really meaningful it would need to be averaged over a couple of builds.
You also need to take into account that there are big differences between build workers. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 11/06/2019 21:29, Andreas Schwab wrote:
On Jun 11 2019, Michael Schroeder
wrote: Sure, it's just the data from the last successful build. To make it really meaningful it would need to be averaged over a couple of builds.
You also need to take into account that there are big differences between build workers.
Due to the constraints files the kernel should only ever build on the faster workers already. What there is though is a massive difference between architectures and generally we care most about the slower architectures because its not possible to move forward with a maintenance submission until all architectures are built, currently on 64bit intel we spend 15 minutes compressing packages, other architectures are at least twice as slow, some of the ones we care about much less now like itanium and 32bit power are well beyond twice as slow. Either way considering a kernel build taking 30 minutes currently on compression is not unreasonable, so fedora's plan which doubles the compression time takes it out to an hour and in the context of some of the deadlines my team works to that 30 minutes can make a big difference. As a side note the next version of RPM should start doing compression of subpackages in parallel which will help alot and will give far less incentive to swap to a algorithm that does compression in parallel inside rpm, pretty much every significantly sized package is split out into subpackages anyway. Cheers -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tuesday 2019-06-11 13:50, Michael Schroeder wrote:
On Tue, Jun 11, 2019 at 01:23:06PM +0200, Michal Suchánek wrote:
So according to this statistic the kernel build takes much longer than kernel compression. What this does not take into account is that much time is spent in some time kind of rpm checkers some of which are run as part of build (ie file deduplication for kernel-source which is required because it has so many duplicate files it would overflow badness limit).
And it does not give statistics for the other parts (ie debuginfo extraction and checkers accounted separately).
Here's all the data for kernel-vanilla and kernel-default:
kernel-vanilla: startup: 22 vm: 7
How is this stored - can it be extracted by regular users and with what command? (I'm keen to have a look at the FlightGear-data build, for example.) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 6/7/19 4:58 PM, Jan Engelhardt wrote:
These is this month's compression shootout.
News:
* Fedora announced intent to switch rpms to Zstd compression; the reported timings on the proposal paper https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression suggest to me that the measurement was not done with due statistical care with regard to CPU thermal throttling characteristics.
* Earlier this year Ubuntu switched kernel and initramfs to lz4 https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/040258.html
* In both of these instances, decompression time is the hot new topic. (Something the previous openSUSE report, https://lists.opensuse.org/opensuse-factory/2019-05/msg00344.html did not analyze)
Change in methods:
* I cut the testset to just x86_64 rpms (dropping *.noarch.rpm) to cut time. The proportion of executables over data should be higher now.
* Ran xz only from -1..-5; we all know its progression characteristics (-6..-9) from older tests.
* Added brotli, all levels
New results:
* http://paste.opensuse.org/view/97377621 - compression * http://paste.opensuse.org/view/80494192 - decompression * http://inai.de/files/openSUSE-compression-201906.ods - details
Interpretation:
* The ups and downs in the decompression times are read as jitter; they often complete in less than a minute.
* For a particular algorithm, decompression speed is unaffected by chosen compression level. This means the choice of level is only influenced by the desired compression-time characteristics.
* zstd decompression is decidedly faster than xz (~4.3x). Fedora is making sensible choices.
* lz4 decompression is 15% slower than zstd. Ubuntu is making suboptimal choices again.
* Brotli: only marginally better in the low-to-medium-level compression than zstd. Equally weak in decomp like lz4. That's probably the reasons it was not worth supporting in rpm.
* Fedora's plan to replace xz-2 with zstd-18 will at least double their compression times.
* Replacing openSUSE's xz-5 with zstd-18 gives the decompression benefit, no improvement in compression time and a slight space increase.
* xz-5 to zstd-10 would also give a 7x compression time saving with a slight more space increase. (23.6->27.9G)
So with that, I propose changing openSUSE to zstd-10 that will give people time back they're waiting for.. for the computer. xkcd.com/303 .
The problem is that even though we have a suitable version of rpm since Dec 2017, the maintainer did not enable zstd so the "compatibility time" accumulated since was unfortunately wasted.
Hi. Thank you for working on that, I was suggesting the same here: https://github.com/openSUSE/rpm-config-SUSE/issues/11 Note that one possible blocker is missing support for zstd in deltarpm. Martin -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Monday 2019-06-10 11:56, Martin Liška wrote:
On 6/7/19 4:58 PM, Jan Engelhardt wrote:
These is this month's compression shootout.[...] So with that, I propose changing openSUSE to zstd-10 that will give people time back they're waiting for.. for the computer. xkcd.com/303 .
Thank you for working on that, I was suggesting the same here: https://github.com/openSUSE/rpm-config-SUSE/issues/11 Note that one possible blocker is missing support for zstd in deltarpm. [ticket 11: "Sure, I'm willing to implement that if that's the only blocker."]
Please do. Adding zstd to deltarpm seems simple, what do you think? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Jun 10, 2019 at 6:35 AM Jan Engelhardt
On Monday 2019-06-10 11:56, Martin Liška wrote:
On 6/7/19 4:58 PM, Jan Engelhardt wrote:
These is this month's compression shootout.[...] So with that, I propose changing openSUSE to zstd-10 that will give people time back they're waiting for.. for the computer. xkcd.com/303 .
Thank you for working on that, I was suggesting the same here: https://github.com/openSUSE/rpm-config-SUSE/issues/11 Note that one possible blocker is missing support for zstd in deltarpm. [ticket 11: "Sure, I'm willing to implement that if that's the only blocker."]
Please do.
Adding zstd to deltarpm seems simple, what do you think?
Daniel Mach is already looking into this in connection to the Fedora change. I would suggest coordinating with him before you do anything further, since he's working on implementing everything to support it now. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (8)
-
Achim Gratz
-
Andreas Schwab
-
Jan Engelhardt
-
Martin Liška
-
Michael Schroeder
-
Michal Suchánek
-
Neal Gompa
-
Simon Lees