On Mon, 8 Oct 2018 14:51:51 +0200 (CEST), Jan Engelhardt
Compression performance plots are often done with something like the Silesia corpus. Linux distributions have rather different proportions of file types, I think. They have a lot of machine code, and even more data files, and probably not so much text and images. Since our data set is also 2.8 orders of magnitude bigger, rerunning a compression shootout will give more detail. So I did just that.
http://paste.opensuse.org/15790105 http://inai.de/files/openSUSE-compression.ods (My measurements included just *.x86_64.rpm + *.noarch.rpm.)
The takeaway from that is:
* xz outperforms zstd in the regions that xz caters to. But overall, xz forms the far end of the "law of diminishing returns".
* Moving openSUSE from xz-5 to xz-2 saves 50% of time for an investment of just 3.2 GB of space. Or, moving to zstd-7, saving 85% for ~6.1 GB.
Other observations:
* There are steps in compressor behavior, and that penalizes a lot of levels, leaving only a few sensible ones: zstd-2-3-7-12, xz-1-2-3-4-5-9 (disregarding memory use, which is another factor).
* Some of our packages are too fat. kicad-packages takes longer to compress than the entire remaining distro at zstd-19 with 16x-parallelism. In other words, a sufficiently parallelized system may have to wait just for that one to complete.
(* "Trend lines" in LibreOffice Calc are quite useless sometimes, as it does not appear to calculate a constant offset portion for exp/pow fittings.)
I value this kind of work! Here's a list of my findings in compression performance on database dumps. The first set is just looking at the size, the second takes time compared to gain into account. First line of each set is to compare without compression. Overall, we've chose to use «lbzip2 -9» as best option. The tests were executed on an old 8 CPU openSUSE 13.2 Linux 3.16.7 HP Z440 Xeon(R) CPU E5-1620 v3 @ 3.50GHz/1256(8) x86_64 15972 Mb Sorted by size Command Time Size rel_sz compr effccy Filename ------------ -------- ---------- ------ ----- ------ ----------- # pg_dumpall 00:01:27 5635943963 100.0% 0.0% bu.psql # compress 00:01:21 2672452529 47.4% 52.6% 27.31 bu.psql.Z # lz4 -9 00:03:03 1199599818 21.3% 78.7% 27.09 bu.psql.lz4 # lzop -9 00:16:02 1126038770 20.0% 80.0% 5.32 bu.psql.lzo # lha c -o7 00:08:42 842297791 14.9% 85.1% 11.09 bu.lzh # zip -9 00:07:32 835282929 14.8% 85.2% 12.84 bu.zip # gzip -9 00:07:26 835282675 14.8% 85.2% 13.01 bu.psql.gz # pigz -9 00:01:19 833434110 14.8% 85.2% 73.53 bu.psql.gz # pbzip2 -9 00:01:52 766662771 13.6% 86.4% 53.32 bu.psql.bz2 # bzip2 -9 00:08:17 766019009 13.6% 86.4% 12.02 bu.psql.bz2 # lbzip2 -9 00:00:56 765752138 13.6% 86.4% 106.67 bu.psql.bz2 # rar a -m5 00:04:05 732839985 13.0% 87.0% 24.71 bu.rar # zstd -19 00:32:24 639443277 11.3% 88.7% 3.23 bu.psql.zst # lrzip -U 00:10:40 597932768 10.6% 89.4% 9.99 bu.psql.lrz # 7z a -r 00:22:04 588093913 10.4% 89.6% 4.85 bu.7z # plzip -9 00:17:32 496047492 8.8% 91.2% 6.32 bu.psql.lz # lzip -9 01:23:33 476972612 8.5% 91.5% 1.34 bu.psql.lz # clzip -9 01:22:44 476972612 8.5% 91.5% 1.35 bu.psql.lz # xz -9 00:58:40 450632908 8.0% 92.0% 1.92 bu.psql.xz Sorted by efficiency Command Time Size rel_sz compr effccy Filename ------------ -------- ---------- ------ ----- ------ ----------- # pg_dumpall 00:01:27 5635943963 100.0% 0.0% bu.psql # lbzip2 -9 00:00:56 765752138 13.6% 86.4% 106.67 bu.psql.bz2 # pigz -9 00:01:19 833434110 14.8% 85.2% 73.53 bu.psql.gz # pbzip2 -9 00:01:52 766662771 13.6% 86.4% 53.32 bu.psql.bz2 # compress 00:01:21 2672452529 47.4% 52.6% 27.31 bu.psql.Z # lz4 -9 00:03:03 1199599818 21.3% 78.7% 27.09 bu.psql.lz4 # rar a -m5 00:04:05 732839985 13.0% 87.0% 24.71 bu.rar # gzip -9 00:07:26 835282675 14.8% 85.2% 13.01 bu.psql.gz # zip -9 00:07:32 835282929 14.8% 85.2% 12.84 bu.zip # bzip2 -9 00:08:17 766019009 13.6% 86.4% 12.02 bu.psql.bz2 # lha c -o7 00:08:42 842297791 14.9% 85.1% 11.09 bu.lzh # lrzip -U 00:10:40 597932768 10.6% 89.4% 9.99 bu.psql.lrz # plzip -9 00:17:32 496047492 8.8% 91.2% 6.32 bu.psql.lz # lzop -9 00:16:02 1126038770 20.0% 80.0% 5.32 bu.psql.lzo # 7z a -r 00:22:04 588093913 10.4% 89.6% 4.85 bu.7z # zstd -19 00:32:24 639443277 11.3% 88.7% 3.23 bu.psql.zst # xz -9 00:58:40 450632908 8.0% 92.0% 1.92 bu.psql.xz # clzip -9 01:22:44 476972612 8.5% 91.5% 1.35 bu.psql.lz # lzip -9 01:23:33 476972612 8.5% 91.5% 1.34 bu.psql.lz -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.29 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/