Mailinglist Archive: opensuse-factory (381 mails)

< Previous Next >
Re: [opensuse-factory] A look on RPM compression in openSUSE
On Mon, 8 Oct 2018 14:51:51 +0200 (CEST), Jan Engelhardt
<jengelh@xxxxxxx> wrote:

Compression performance plots are often done with something like the Silesia
corpus. Linux distributions have rather different proportions of file types, I
think. They have a lot of machine code, and even more data files, and probably
not so much text and images. Since our data set is also 2.8 orders of
magnitude
bigger, rerunning a compression shootout will give more detail. So I did just
that.

http://paste.opensuse.org/15790105
http://inai.de/files/openSUSE-compression.ods
(My measurements included just *.x86_64.rpm + *.noarch.rpm.)

The takeaway from that is:

* xz outperforms zstd in the regions that xz caters to. But overall, xz
forms the far end of the "law of diminishing returns".

* Moving openSUSE from xz-5 to xz-2 saves 50% of time for an investment of
just 3.2 GB of space. Or, moving to zstd-7, saving 85% for ~6.1 GB.


Other observations:

* There are steps in compressor behavior, and that penalizes a lot of levels,
leaving only a few sensible ones: zstd-2-3-7-12, xz-1-2-3-4-5-9
(disregarding memory use, which is another factor).

* Some of our packages are too fat. kicad-packages takes longer to compress
than the entire remaining distro at zstd-19 with 16x-parallelism.
In other words, a sufficiently parallelized system
may have to wait just for that one to complete.

(* "Trend lines" in LibreOffice Calc are quite useless sometimes, as it does
not appear to calculate a constant offset portion for exp/pow fittings.)

I value this kind of work!

Here's a list of my findings in compression performance on database
dumps. The first set is just looking at the size, the second takes time
compared to gain into account. First line of each set is to compare
without compression.

Overall, we've chose to use «lbzip2 -9» as best option.

The tests were executed on an old 8 CPU openSUSE 13.2
Linux 3.16.7 HP Z440
Xeon(R) CPU E5-1620 v3 @ 3.50GHz/1256(8) x86_64 15972 Mb

Sorted by size

Command Time Size rel_sz compr effccy Filename
------------ -------- ---------- ------ ----- ------ -----------
# pg_dumpall 00:01:27 5635943963 100.0% 0.0% bu.psql
# compress 00:01:21 2672452529 47.4% 52.6% 27.31 bu.psql.Z
# lz4 -9 00:03:03 1199599818 21.3% 78.7% 27.09 bu.psql.lz4
# lzop -9 00:16:02 1126038770 20.0% 80.0% 5.32 bu.psql.lzo
# lha c -o7 00:08:42 842297791 14.9% 85.1% 11.09 bu.lzh
# zip -9 00:07:32 835282929 14.8% 85.2% 12.84 bu.zip
# gzip -9 00:07:26 835282675 14.8% 85.2% 13.01 bu.psql.gz
# pigz -9 00:01:19 833434110 14.8% 85.2% 73.53 bu.psql.gz
# pbzip2 -9 00:01:52 766662771 13.6% 86.4% 53.32 bu.psql.bz2
# bzip2 -9 00:08:17 766019009 13.6% 86.4% 12.02 bu.psql.bz2
# lbzip2 -9 00:00:56 765752138 13.6% 86.4% 106.67 bu.psql.bz2
# rar a -m5 00:04:05 732839985 13.0% 87.0% 24.71 bu.rar
# zstd -19 00:32:24 639443277 11.3% 88.7% 3.23 bu.psql.zst
# lrzip -U 00:10:40 597932768 10.6% 89.4% 9.99 bu.psql.lrz
# 7z a -r 00:22:04 588093913 10.4% 89.6% 4.85 bu.7z
# plzip -9 00:17:32 496047492 8.8% 91.2% 6.32 bu.psql.lz
# lzip -9 01:23:33 476972612 8.5% 91.5% 1.34 bu.psql.lz
# clzip -9 01:22:44 476972612 8.5% 91.5% 1.35 bu.psql.lz
# xz -9 00:58:40 450632908 8.0% 92.0% 1.92 bu.psql.xz

Sorted by efficiency

Command Time Size rel_sz compr effccy Filename
------------ -------- ---------- ------ ----- ------ -----------
# pg_dumpall 00:01:27 5635943963 100.0% 0.0% bu.psql
# lbzip2 -9 00:00:56 765752138 13.6% 86.4% 106.67 bu.psql.bz2
# pigz -9 00:01:19 833434110 14.8% 85.2% 73.53 bu.psql.gz
# pbzip2 -9 00:01:52 766662771 13.6% 86.4% 53.32 bu.psql.bz2
# compress 00:01:21 2672452529 47.4% 52.6% 27.31 bu.psql.Z
# lz4 -9 00:03:03 1199599818 21.3% 78.7% 27.09 bu.psql.lz4
# rar a -m5 00:04:05 732839985 13.0% 87.0% 24.71 bu.rar
# gzip -9 00:07:26 835282675 14.8% 85.2% 13.01 bu.psql.gz
# zip -9 00:07:32 835282929 14.8% 85.2% 12.84 bu.zip
# bzip2 -9 00:08:17 766019009 13.6% 86.4% 12.02 bu.psql.bz2
# lha c -o7 00:08:42 842297791 14.9% 85.1% 11.09 bu.lzh
# lrzip -U 00:10:40 597932768 10.6% 89.4% 9.99 bu.psql.lrz
# plzip -9 00:17:32 496047492 8.8% 91.2% 6.32 bu.psql.lz
# lzop -9 00:16:02 1126038770 20.0% 80.0% 5.32 bu.psql.lzo
# 7z a -r 00:22:04 588093913 10.4% 89.6% 4.85 bu.7z
# zstd -19 00:32:24 639443277 11.3% 88.7% 3.23 bu.psql.zst
# xz -9 00:58:40 450632908 8.0% 92.0% 1.92 bu.psql.xz
# clzip -9 01:22:44 476972612 8.5% 91.5% 1.35 bu.psql.lz
# lzip -9 01:23:33 476972612 8.5% 91.5% 1.34 bu.psql.lz


--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.29 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
< Previous Next >
References