On Wednesday 2021-01-13 16:51, Takashi Iwai wrote:
defaults (xz -0 --check=crc32 --memlimit-compress=50) 9.7s compress="zstd -T0" 9.4s compress="xz -6 -T0" 18.5s
Intel 4700U, Tumbleweed du jour, dracut 051
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 10.4s compress=cat 7.5s ... 38.0MB compress="zstd -T0" 7.7s compress="xz -6 -T0" 17.9s
Compression is 27%.
... here is much faster. What makes so different? Or is it about the different xz options?
The usual - options. Enabling multithreading splits up the input into blocks which are individually compressed, throwing away the benefits of compressing one huge block. Reducing the dict size is a similar thing. (total mkinitrd Leap runtime - not just compression) cat: 29875 KB (8.4/8.9/9.8s) xz -0 -T1 (--lzma2=dict=256KiB): 12582 KB (10s) xz -0 -T8 (--lzma2=dict=256KiB): 12684 KB (8.9s) xz -6 -T1 (--lzma2=dict=8MiB): 10990 KB (21.2s) xz -6 -T8 (--lzma2=dict=8MiB): 11074 KB (19.3s) xz -6 -T1 --lzma2=dict=1MiB: 11496 KB (18.2s) xz -6 -T8 --lzma2=dict=1MiB: 11587 KB (11.5s) zstd -3 -T1: 13799 KB (8.9s) zstd -3 -T8: 13799 KB (8.9/9.1s) There is fluctuation... probably the CPU can momentarily boost longer due to TDP budgets. This is not a scientific measurement - it runs way too short anyway. It was just done in an attempt disprove your original point that compression is insignificant - and it would seem this is highly dependent upon the dracut generation, possibly Meltdown mitigations, and, of course, compression itself. In a sense, dracut chose options that suitably reduce the time pain of xz and dial in somewhat close to zstd. Switching to zstd will trade a few more bytes for a bit of time. I am still in favor of using zstd - because the main usecase is initramfs decompression (which is not measured here), which is the thing that probably happens more often - every boot.
If switching to zstd makes things better, it should be a nice low-hanging fruit; the current kernel already supports zstd initrd.
AMD 5700X, Tumbleweed du jour
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 5.2s compress=cat 3.6s ... 26.7MB compress="xz -6 -T0" 10.3s
Compression makes up 30%.
Interested in the number of zstd on AMD, too :)
compress="zstd -T0" 3.7s