openSUSE vs. Fedora - package installation speed
Maybe it's a known issue, but I think speed of package installation is much slower on openSUSE (takes 2x longer). The following example installs packages needed for rpmlint and should be 1:1 in between the 2 mentioned distros: https://github.com/marxin/opensuse-vs-fedora https://github.com/marxin/opensuse-vs-fedora/actions/runs/480371322 Thanks, Martin
Am Di, 12. Jan, 2021 um 3:41 P. M. schrieb Martin Liška <mliska@suse.cz>:
Maybe it's a known issue, but I think speed of package installation is much slower on openSUSE (takes 2x longer). The following example installs packages needed for rpmlint and should be 1:1 in between the 2 mentioned distros:
https://github.com/marxin/opensuse-vs-fedora https://github.com/marxin/opensuse-vs-fedora/actions/runs/480371322
I would recommend seeing if using dnf on openSUSE changes the result. I suspect this is due to the way libzypp calls RPM a few times per package LCP [Stasiek] https://lcp.world
On 1/12/21 3:45 PM, Stasiek Michalski wrote:
I would recommend seeing if using dnf on openSUSE changes the result. I suspect this is due to the way libzypp calls RPM a few times per package
Can you please make a pull request to the GitHub repo where you will install the same packages using dnf? Thanks, Martin
Am Di, 12. Jan, 2021 um 3:47 P. M. schrieb Martin Liška <mliska@suse.cz>:
On 1/12/21 3:45 PM, Stasiek Michalski wrote:
I would recommend seeing if using dnf on openSUSE changes the result. I suspect this is due to the way libzypp calls RPM a few times per package
Can you please make a pull request to the GitHub repo where you will install the same packages using dnf?
https://github.com/marxin/opensuse-vs-fedora/pull/1 It's faster, but keep in mind dnf only has to install 166 packages where zypper installed 191 (though still it took it 3x the time). The entire workflow would probably also be faster if I used Neal's new Tumbleweed dnf images instead of the docker ones. LCP [Stasiek] https://lcp.world
On Tue, Jan 12, 2021 at 10:01 AM Stasiek Michalski <hellcp@opensuse.org> wrote:
Am Di, 12. Jan, 2021 um 3:47 P. M. schrieb Martin Liška <mliska@suse.cz>:
On 1/12/21 3:45 PM, Stasiek Michalski wrote:
I would recommend seeing if using dnf on openSUSE changes the result. I suspect this is due to the way libzypp calls RPM a few times per package
Can you please make a pull request to the GitHub repo where you will install the same packages using dnf?
https://github.com/marxin/opensuse-vs-fedora/pull/1 It's faster, but keep in mind dnf only has to install 166 packages where zypper installed 191 (though still it took it 3x the time). The entire workflow would probably also be faster if I used Neal's new Tumbleweed dnf images instead of the docker ones.
The new opensuse/tumbleweed-dnf container is ready for use, so I took the liberty of extending the pipeline you have to add it to your test matrix: https://github.com/marxin/opensuse-vs-fedora/pull/2 It looks like the tumbleweed-dnf environment takes half the time of the tumbleweed (with zypper) + dnf environment, and is 3x faster than the zypper environment. I'm hoping that for Leap 15.3, we can add leap-dnf images too... -- 真実はいつも一つ!/ Always, there's only one truth!
On Tuesday 2021-01-12 15:41, Martin Liška wrote:
Maybe it's a known issue, but I think speed of package installation is much slower on openSUSE (takes 2x longer). The following example installs packages needed for rpmlint and should be 1:1 in between the 2 mentioned distros:
https://github.com/marxin/opensuse-vs-fedora https://github.com/marxin/opensuse-vs-fedora/actions/runs/480371322
Curious if openSUSE has more scriptlets like %post..
Am Di, 12. Jan, 2021 um 4:07 P. M. schrieb Jan Engelhardt <jengelh@inai.de>:
Curious if openSUSE has more scriptlets like %post..
We probably do, but considering from testing installing dnf with zypper and then running dnf is faster than zypper alone, we have another problem on our hands ;) LCP [Stasiek] https://lcp.world
On 1/12/21 3:41 PM, Martin Liška wrote:
Maybe it's a known issue, but I think speed of package installation is much slower on openSUSE (takes 2x longer). The following example installs packages needed for rpmlint and should be 1:1 in between the 2 mentioned distros:
https://github.com/marxin/opensuse-vs-fedora https://github.com/marxin/opensuse-vs-fedora/actions/runs/480371322
Thanks, Martin
It'd be nice to split network vs local performance. Step 1 - repo refresh Step 2 - download packages to cache Step 3 - Install from cache. At least it'd make sure the issue is not due to a slow mirror. Nicolas
On 1/12/21 4:11 PM, Nicolas Morey-Chaisemartin wrote:
On 1/12/21 3:41 PM, Martin Liška wrote:
Maybe it's a known issue, but I think speed of package installation is much slower on openSUSE (takes 2x longer). The following example installs packages needed for rpmlint and should be 1:1 in between the 2 mentioned distros:
https://github.com/marxin/opensuse-vs-fedora https://github.com/marxin/opensuse-vs-fedora/actions/runs/480371322
Thanks, Martin
Thank you @Stasiek for the pull request. I extended that to build in parallel opensuse with dnf and zypper: https://github.com/marxin/opensuse-vs-fedora/runs/1688929845?check_suite_foc... I pre-install dnf in both scenarios and difference in the install step is now: 58s vs. 2m 45s
It'd be nice to split network vs local performance. Step 1 - repo refresh Step 2 - download packages to cache Step 3 - Install from cache.
At least it'd make sure the issue is not due to a slow mirror.
Feel free to make a pull request to the git project. Thanks, Martin
Nicolas
On Tue, Jan 12, 2021 at 11:42 AM Martin Liška <mliska@suse.cz> wrote:
Maybe it's a known issue,
Yes, well known,. by design it does not install packages in a transaction but one by one and this is why it is so slow. it has improved enormously since it came about but this particular issue has never been addressed. This was not a big deal back then because 15 fyears ago there was no off the shelf, cheap and extremely fast SSD/NVME drives. It was going to be slow and have an I/O bottleneck anyway. Now we have Zen CPUs and mind-numbingly fast drives and software needs to keep up.
Dne středa 13. ledna 2021 13:16:08 CET, Cristian Rodríguez napsal(a):
On Tue, Jan 12, 2021 at 11:42 AM Martin Liška wrote:
Maybe it's a known issue,
Yes, well known,. by design it does not install packages in a transaction but one by one and this is why it is so slow. it has improved enormously since it came about but this particular issue has never been addressed. This was not a big deal back then because 15 fyears ago there was no off the shelf, cheap and extremely fast SSD/NVME drives. It was going to be slow and have an I/O bottleneck anyway.
Might be it wouldn't be so difficult to parallelize these individual installs...? -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
Am Mi, 13. Jan, 2021 um 1:18 P. M. schrieb Vojtěch Zeisek <vojtech.zeisek@opensuse.org>:
Might be it wouldn't be so difficult to parallelize these individual installs...?
We would need to drop _all_ scriplets to be able to support parallel installs, Fedora is working in that direction, zypper pretty much prevents us from being able to even have file triggers, so not having scriplets with zypper would be an impossible task to say the least. LCP [Stasiek] https://lcp.world
On Wed, Jan 13, 2021 at 01:21:38PM +0100, Stasiek Michalski wrote:
Am Mi, 13. Jan, 2021 um 1:18 P. M. schrieb Vojtěch Zeisek <vojtech.zeisek@opensuse.org>:
Might be it wouldn't be so difficult to parallelize these individual installs...?
We would need to drop _all_ scriplets to be able to support parallel installs, Fedora is working in that direction, zypper pretty much prevents us from being able to even have file triggers, so not having scriplets with zypper would be an impossible task to say the least.
Zypper already supports normal file triggers, transaction file triggers is what currently doesn't work but that's been worked on. Just wait another couple of weeks. Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
On Wednesday 2021-01-13 13:21, Stasiek Michalski wrote:
Am Mi, 13. Jan, 2021 um 1:18 P. M. schrieb Vojtěch Zeisek <vojtech.zeisek@opensuse.org>:
Might be it wouldn't be so difficult to parallelize these individual installs...?
We would need to drop _all_ scriplets to be able to support parallel installs,
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
On Wed, Jan 13, 2021 at 8:06 AM Jan Engelhardt <jengelh@inai.de> wrote:
On Wednesday 2021-01-13 13:21, Stasiek Michalski wrote:
Am Mi, 13. Jan, 2021 um 1:18 P. M. schrieb Vojtěch Zeisek <vojtech.zeisek@opensuse.org>:
Might be it wouldn't be so difficult to parallelize these individual installs...?
We would need to drop _all_ scriplets to be able to support parallel installs,
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts. -- 真実はいつも一つ!/ Always, there's only one truth!
Am 13.01.21 um 14:08 schrieb Neal Gompa:
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :) I must say I can follow Jan's argumentation more than yours. Greetings, Stephan -- Lighten up, just enjoy life, smile more, laugh more, and don't get so worked up about things. Kenneth Branagh
On Wed, Jan 13, 2021 at 8:26 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:08 schrieb Neal Gompa:
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :)
I must say I can follow Jan's argumentation more than yours.
You don't know what the script is doing, which means you don't know if you're creating a race condition between two package installs. If two independent packages are modifying the same file at the same time, what is the result? This is just one example of the problems that parallel arbitrary script execution can cause. -- 真実はいつも一つ!/ Always, there's only one truth!
Am 13.01.21 um 14:31 schrieb Neal Gompa:
On Wed, Jan 13, 2021 at 8:26 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:08 schrieb Neal Gompa:
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :)
I must say I can follow Jan's argumentation more than yours.
You don't know what the script is doing, which means you don't know if you're creating a race condition between two package installs. If two independent packages are modifying the same file at the same time, what is the result? This is just one example of the problems that parallel arbitrary script execution can cause.
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation Greetings, Stephan -- Lighten up, just enjoy life, smile more, laugh more, and don't get so worked up about things. Kenneth Branagh
On Wed, Jan 13, 2021 at 8:43 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:31 schrieb Neal Gompa:
On Wed, Jan 13, 2021 at 8:26 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:08 schrieb Neal Gompa:
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all.
We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :)
I must say I can follow Jan's argumentation more than yours.
You don't know what the script is doing, which means you don't know if you're creating a race condition between two package installs. If two independent packages are modifying the same file at the same time, what is the result? This is just one example of the problems that parallel arbitrary script execution can cause.
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
If we didn't have %pretrans and %pre scriptlets, I would agree with you. :( -- 真実はいつも一つ!/ Always, there's only one truth!
2021-01-13 15:46, Neal Gompa rašė:
On Wed, Jan 13, 2021 at 8:43 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:31 schrieb Neal Gompa:
On Wed, Jan 13, 2021 at 8:26 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:08 schrieb Neal Gompa:
I do not think so. Running scripts is just another node to execute in the "to-do DAG". A %post just needs to run (sometime) after installation of the package, and, if another pkg B requires A, said %post may need to be ordered before B's installation. But that should be all. We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :)
I must say I can follow Jan's argumentation more than yours.
You don't know what the script is doing, which means you don't know if you're creating a race condition between two package installs. If two independent packages are modifying the same file at the same time, what is the result? This is just one example of the problems that parallel arbitrary script execution can cause.
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
If we didn't have %pretrans and %pre scriptlets, I would agree with you. :(
Most packages don't use scriptlets – these should be quite safe for parallelization (or installing selevel packages at once). Remaining packages (with scriptlets) – could be serialized. -- Regards, Mindaugas
On Wed, Jan 13, 2021 at 11:02 AM opensuse.lietuviu.kalba <opensuse.lietuviu.kalba@gmail.com> wrote:
2021-01-13 15:46, Neal Gompa rašė:
On Wed, Jan 13, 2021 at 8:43 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:31 schrieb Neal Gompa:
On Wed, Jan 13, 2021 at 8:26 AM Stephan Kulow <coolo@suse.de> wrote:
Am 13.01.21 um 14:08 schrieb Neal Gompa:
> I do not think so. Running scripts is just another node to execute in the > "to-do DAG". A %post just needs to run (sometime) after installation of the > package, and, if another pkg B requires A, said %post may need to be ordered > before B's installation. But that should be all. We do because we don't know what a script *does*. Anyone can do *anything* in a script, so it is unsafe to try parallelization with arbitrary scripts.
Installing packages is a pretty unsafe operation to begin with :)
I must say I can follow Jan's argumentation more than yours.
You don't know what the script is doing, which means you don't know if you're creating a race condition between two package installs. If two independent packages are modifying the same file at the same time, what is the result? This is just one example of the problems that parallel arbitrary script execution can cause.
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
If we didn't have %pretrans and %pre scriptlets, I would agree with you. :(
Most packages don't use scriptlets – these should be quite safe for parallelization (or installing selevel packages at once). Remaining packages (with scriptlets) – could be serialized.
That is not true. Because of the dependency web among libraries, applications, and services, scriptlets are in the hot path pretty much all the time. openSUSE also aggressively uses things like alternatives, which adds even more scriptlet heavy interaction. In some cases, openSUSE does as much script magic as Debian (which is usually my bar for "too much scripting to work around package management"). -- 真実はいつも一つ!/ Always, there's only one truth!
Am Mi, 13. Jan, 2021 um 11:07 A. M. schrieb Neal Gompa <ngompa13@gmail.com>:
That is not true. Because of the dependency web among libraries, applications, and services, scriptlets are in the hot path pretty much all the time.
openSUSE also aggressively uses things like alternatives, which adds even more scriptlet heavy interaction. In some cases, openSUSE does as much script magic as Debian (which is usually my bar for "too much scripting to work around package management").
I'm kind of hoping for https://github.com/rpm-software-management/rpm/issues/993 LCP [Stasiek] https://lcp.world
On Wed, Jan 13, 2021 at 02:43:00PM +0100, Stephan Kulow wrote:
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
Maybe, but somewhat should first take a look where all the time is spent in the rpm installation code. Maybe uncompression is just 10% and file system operation is 70%. Or the other way round. Or maybe it's the fsync() on the database? Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
On Wed, Jan 13, 2021 at 01:53:36PM +0000, Michael Schroeder wrote:
On Wed, Jan 13, 2021 at 02:43:00PM +0100, Stephan Kulow wrote:
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
Maybe, but somewhat should first take a look where all the time is spent in the rpm installation code. Maybe uncompression is just 10% and file system operation is 70%. Or the other way round. Or maybe it's the fsync() on the database?
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :). Ciao, Marcus
On Wed, 13 Jan 2021 14:57:06 +0100, Marcus Meissner wrote:
On Wed, Jan 13, 2021 at 01:53:36PM +0000, Michael Schroeder wrote:
On Wed, Jan 13, 2021 at 02:43:00PM +0100, Stephan Kulow wrote:
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
Maybe, but somewhat should first take a look where all the time is spent in the rpm installation code. Maybe uncompression is just 10% and file system operation is 70%. Or the other way round. Or maybe it's the fsync() on the database?
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess. thanks, Takashi
Am 13.01.21 um 15:41 schrieb Takashi Iwai:
On Wed, 13 Jan 2021 14:57:06 +0100, Marcus Meissner wrote:
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :). That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess. It would help if we finally had a transaction hook that does not need to run these regenerations multiple times in one update, if more than one package needs them.
On Wed, 13 Jan 2021 16:06:26 +0100, Ben Greiner wrote:
Am 13.01.21 um 15:41 schrieb Takashi Iwai:
On Wed, 13 Jan 2021 14:57:06 +0100, Marcus Meissner wrote:
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :). That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess. It would help if we finally had a transaction hook that does not need to run these regenerations multiple times in one update, if more than one package needs them.
The current post trigger in kernel package is intended for safety reason. But maybe it makes little sense nowadays where multiple kernels are installed. Takashi
On 13/01/2021 15.41, Takashi Iwai wrote:
On Wed, 13 Jan 2021 14:57:06 +0100, Marcus Meissner wrote:
On Wed, Jan 13, 2021 at 01:53:36PM +0000, Michael Schroeder wrote:
On Wed, Jan 13, 2021 at 02:43:00PM +0100, Stephan Kulow wrote:
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
Maybe, but somewhat should first take a look where all the time is spent in the rpm installation code. Maybe uncompression is just 10% and file system operation is 70%. Or the other way round. Or maybe it's the fsync() on the database?
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess.
On 15.2 I prefixed some commands with /usr/bin/time -p to get for cpio: real 4.06 user 0.05 sys 0.13 for xz (the line with | $compress) real 4.18 user 4.14 sys 0.02 total run time for 2 kernels is real 0m20.917s user 0m17.552s sys 0m3.835s so the compression alone takes 40% of the total time. Using date '+%s.%N' I found that another 28% (2.95s per kernel) is spent on the section "Including module"
On Wed, 13 Jan 2021 16:16:01 +0100, Bernhard M. Wiedemann wrote:
On 13/01/2021 15.41, Takashi Iwai wrote:
On Wed, 13 Jan 2021 14:57:06 +0100, Marcus Meissner wrote:
On Wed, Jan 13, 2021 at 01:53:36PM +0000, Michael Schroeder wrote:
On Wed, Jan 13, 2021 at 02:43:00PM +0100, Stephan Kulow wrote:
Well, you can still serialze the scriplet part and parallelize the payload uncompression and file installation
Maybe, but somewhat should first take a look where all the time is spent in the rpm installation code. Maybe uncompression is just 10% and file system operation is 70%. Or the other way round. Or maybe it's the fsync() on the database?
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess.
On 15.2 I prefixed some commands with /usr/bin/time -p to get
for cpio: real 4.06 user 0.05 sys 0.13
for xz (the line with | $compress) real 4.18 user 4.14 sys 0.02
total run time for 2 kernels is real 0m20.917s user 0m17.552s sys 0m3.835s
so the compression alone takes 40% of the total time.
Thanks, that's an interesting number. An easier option would be to parallelize the compression. How does the number change if you add -P4 or whatever to compress=... line in /usr/lib/dracut.conf.d/01-dist.conf? (Or override in /etc/dracut.conf.d/...) And cpio takes also so long... It's no CPU-bound, and it's waiting for the inputs?
Using date '+%s.%N' I found that another 28% (2.95s per kernel) is spent on the section "Including module"
Hm, is the part loading dracut modules? Takashi
Takashi Iwai <tiwai@suse.de> wrote:
And cpio takes also so long... It's no CPU-bound, and it's waiting for the inputs?
GNU cpio is slow. star is almost two times faster than the cpio implementation typically used on Linux. You may either use "spcio", "star cli=cpio ..." or "star" and there are two main features from star that make it faster: - star forks into two processes: one for archive handling and one as the filesystem interface. Both are coupled via a shared memory used as FIFO and decouples reading/writing the archive from writing/reading to/from the filesystem. - As a result of the FIFO, star uses larger I/O sizes to reduce the filesystem overhead from the kernel. If you like to compare results, it makes sense to check the options fs= for larger FIFO size and -no-fsync to make star as unreliable as GNU tar or GNU cpio which is important for comparison on Linux with it's ineffective filesystem cache that slows down with fsync() calls. If star is run in cpio CLI compatibility, it enables -no-fsync and -install. The option -install implements AT&T cpio compatibility that allows to "overwrite" existing binaries without causing the an old running copy to dump core. Since the latter AT&T feature is not documented, this is not part of the gcpio implementation. Do you know how Linux package managers handle this situation? Jörg -- EMail:joerg@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
On Wednesday 2021-01-13 15:41, Takashi Iwai wrote:
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess.
Quite the contrary. Intel 8250U, Leap, dracut 049, initramfs slightly fatter due to dm_crypt etc. defaults (xz -0 --check=crc32 --memlimit-compress=50) 9.7s compress=cat 8.6s ... 29.91MB compress="zstd -T0" 9.4s compress="xz -6 -T0" 18.5s Compression makes up 11%. Intel 4700U, Tumbleweed du jour, dracut 051 defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 10.4s compress=cat 7.5s ... 38.0MB compress="zstd -T0" 7.7s compress="xz -6 -T0" 17.9s Compression is 27%. AMD 5700X, Tumbleweed du jour defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 5.2s compress=cat 3.6s ... 26.7MB compress="xz -6 -T0" 10.3s Compression makes up 30%.
On Wed, 13 Jan 2021 16:34:43 +0100, Jan Engelhardt wrote:
On Wednesday 2021-01-13 15:41, Takashi Iwai wrote:
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
That's a long-standing pain, yeah. Did anyone profile the dracut operation to point out what takes so long? I know xz takes quite some time, but it's not dominant, I guess.
Quite the contrary.
Intel 8250U, Leap, dracut 049, initramfs slightly fatter due to dm_crypt etc.
defaults (xz -0 --check=crc32 --memlimit-compress=50) 9.7s compress=cat 8.6s ... 29.91MB compress="zstd -T0" 9.4s compress="xz -6 -T0" 18.5s
Compression makes up 11%.
Hm, zstd is almost comparable with xz in this case, but ...
Intel 4700U, Tumbleweed du jour, dracut 051
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 10.4s compress=cat 7.5s ... 38.0MB compress="zstd -T0" 7.7s compress="xz -6 -T0" 17.9s
Compression is 27%.
... here is much faster. What makes so different? Or is it about the different xz options? If switching to zstd makes things better, it should be a nice low-hanging fruit; the current kernel already supports zstd initrd.
AMD 5700X, Tumbleweed du jour
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 5.2s compress=cat 3.6s ... 26.7MB compress="xz -6 -T0" 10.3s
Compression makes up 30%.
Interested in the number of zstd on AMD, too :) thanks, Takashi
On Wednesday 2021-01-13 16:51, Takashi Iwai wrote:
defaults (xz -0 --check=crc32 --memlimit-compress=50) 9.7s compress="zstd -T0" 9.4s compress="xz -6 -T0" 18.5s
Intel 4700U, Tumbleweed du jour, dracut 051
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 10.4s compress=cat 7.5s ... 38.0MB compress="zstd -T0" 7.7s compress="xz -6 -T0" 17.9s
Compression is 27%.
... here is much faster. What makes so different? Or is it about the different xz options?
The usual - options. Enabling multithreading splits up the input into blocks which are individually compressed, throwing away the benefits of compressing one huge block. Reducing the dict size is a similar thing. (total mkinitrd Leap runtime - not just compression) cat: 29875 KB (8.4/8.9/9.8s) xz -0 -T1 (--lzma2=dict=256KiB): 12582 KB (10s) xz -0 -T8 (--lzma2=dict=256KiB): 12684 KB (8.9s) xz -6 -T1 (--lzma2=dict=8MiB): 10990 KB (21.2s) xz -6 -T8 (--lzma2=dict=8MiB): 11074 KB (19.3s) xz -6 -T1 --lzma2=dict=1MiB: 11496 KB (18.2s) xz -6 -T8 --lzma2=dict=1MiB: 11587 KB (11.5s) zstd -3 -T1: 13799 KB (8.9s) zstd -3 -T8: 13799 KB (8.9/9.1s) There is fluctuation... probably the CPU can momentarily boost longer due to TDP budgets. This is not a scientific measurement - it runs way too short anyway. It was just done in an attempt disprove your original point that compression is insignificant - and it would seem this is highly dependent upon the dracut generation, possibly Meltdown mitigations, and, of course, compression itself. In a sense, dracut chose options that suitably reduce the time pain of xz and dial in somewhat close to zstd. Switching to zstd will trade a few more bytes for a bit of time. I am still in favor of using zstd - because the main usecase is initramfs decompression (which is not measured here), which is the thing that probably happens more often - every boot.
If switching to zstd makes things better, it should be a nice low-hanging fruit; the current kernel already supports zstd initrd.
AMD 5700X, Tumbleweed du jour
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 5.2s compress=cat 3.6s ... 26.7MB compress="xz -6 -T0" 10.3s
Compression makes up 30%.
Interested in the number of zstd on AMD, too :)
compress="zstd -T0" 3.7s
On Wed, 13 Jan 2021 17:39:32 +0100, Jan Engelhardt wrote:
On Wednesday 2021-01-13 16:51, Takashi Iwai wrote:
defaults (xz -0 --check=crc32 --memlimit-compress=50) 9.7s compress="zstd -T0" 9.4s compress="xz -6 -T0" 18.5s
Intel 4700U, Tumbleweed du jour, dracut 051
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 10.4s compress=cat 7.5s ... 38.0MB compress="zstd -T0" 7.7s compress="xz -6 -T0" 17.9s
Compression is 27%.
... here is much faster. What makes so different? Or is it about the different xz options?
The usual - options.
Enabling multithreading splits up the input into blocks which are individually compressed, throwing away the benefits of compressing one huge block. Reducing the dict size is a similar thing.
(total mkinitrd Leap runtime - not just compression) cat: 29875 KB (8.4/8.9/9.8s) xz -0 -T1 (--lzma2=dict=256KiB): 12582 KB (10s) xz -0 -T8 (--lzma2=dict=256KiB): 12684 KB (8.9s) xz -6 -T1 (--lzma2=dict=8MiB): 10990 KB (21.2s) xz -6 -T8 (--lzma2=dict=8MiB): 11074 KB (19.3s) xz -6 -T1 --lzma2=dict=1MiB: 11496 KB (18.2s) xz -6 -T8 --lzma2=dict=1MiB: 11587 KB (11.5s) zstd -3 -T1: 13799 KB (8.9s) zstd -3 -T8: 13799 KB (8.9/9.1s)
There is fluctuation... probably the CPU can momentarily boost longer due to TDP budgets. This is not a scientific measurement - it runs way too short anyway.
Thanks for the detailed analysis.
It was just done in an attempt disprove your original point that compression is insignificant - and it would seem this is highly dependent upon the dracut generation, possibly Meltdown mitigations, and, of course, compression itself.
Oh I didn't mean that the compression is insignificant at all. I meant it doesn't look "dominant". But, admittedly, the time cost for compression is higher than I thought, it could be 1/3 of total time.
In a sense, dracut chose options that suitably reduce the time pain of xz and dial in somewhat close to zstd. Switching to zstd will trade a few more bytes for a bit of time. I am still in favor of using zstd - because the main usecase is initramfs decompression (which is not measured here), which is the thing that probably happens more often - every boot.
If the decompression gives a good result, I find it good to move on to zstd for TW. It's only initrd and I believe it's fairly safe. Also, some automatic adjustment of -T would be nice...
If switching to zstd makes things better, it should be a nice low-hanging fruit; the current kernel already supports zstd initrd.
AMD 5700X, Tumbleweed du jour
defaults (xz --check=crc32 --lzma2=dict=1MiB -T0) 5.2s compress=cat 3.6s ... 26.7MB compress="xz -6 -T0" 10.3s
Compression makes up 30%.
Interested in the number of zstd on AMD, too :)
compress="zstd -T0" 3.7s
Oh that's fast. thanks, Takashi
On 13.01.21 17:51, Takashi Iwai wrote:
Also, some automatic adjustment of -T would be nice...
-T 0 is "use as many threads as there are CPU cores" or something like that, so it is automatic. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
Dne středa 13. ledna 2021 20:38:43 CET, Stefan Seyfried napsal(a):
On 13.01.21 17:51, Takashi Iwai wrote:
Also, some automatic adjustment of -T would be nice...
-T 0 is "use as many threads as there are CPU cores" or something like that, so it is automatic.
I'd rather suggest something like "number of CPUs minus one" or so to prevent system from slow down, freeze or so as GNOME/KDE, web browsers and similar user apps take some resource... -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
On Wednesday 2021-01-13 21:20, Vojtěch Zeisek wrote:
Dne středa 13. ledna 2021 20:38:43 CET, Stefan Seyfried napsal(a):
On 13.01.21 17:51, Takashi Iwai wrote:
Also, some automatic adjustment of -T would be nice...
-T 0 is "use as many threads as there are CPU cores" or something like that, so it is automatic.
I'd rather suggest something like "number of CPUs minus one" or so to prevent system from slow down, freeze or so as GNOME/KDE, web browsers and similar user apps take some resource...
So, zero on uniprocessor? ;-)
On Mittwoch, 13. Januar 2021 22:47:47 CET Larry Len Rainey wrote:
On 1/13/21 3:39 PM, Jan Engelhardt wrote:
So, zero on uniprocessor? ;-)
Is there any Uniprocessors that are 64 bit? Everyone seems to be dual or better.
Intel Core 2 Solo, so yes. -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019
On Wednesday 2021-01-13 22:47, Larry Len Rainey wrote:
On 1/13/21 3:39 PM, Jan Engelhardt wrote:
So, zero on uniprocessor? ;-)
Is there any Uniprocessors that are 64 bit? Everyone seems to be dual or better.
Atom N450 when you disable HT - or when you simply boot any other system with nosmp.
On Thu, 14 Jan 2021 00:52:41 +0100, Jan Engelhardt wrote:
On Wednesday 2021-01-13 22:47, Larry Len Rainey wrote:
On 1/13/21 3:39 PM, Jan Engelhardt wrote:
So, zero on uniprocessor? ;-)
Is there any Uniprocessors that are 64 bit? Everyone seems to be dual or better.
Atom N450 when you disable HT - or when you simply boot any other system with nosmp.
... and moreover, many installations on VM. Takashi
On 13.01.21 14:57, Marcus Meissner wrote:
It is the dracut initrd regeneration for me actually, even on SSD it takes *minutes* :).
and seems to run once for every single kmp, no matter if it ends up in the initrd or not. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
On Wed, Jan 13, 2021 at 09:16:08AM -0300, Cristian Rodríguez wrote:
Yes, well known,. by design it does not install packages in a transaction but one by one and this is why it is so slow.
I don't see why that makes so much of a difference. All the filesystem and database operations are the same if you use one transaction or multiple transactions. The file fingerprinting could be an issue, but it shouldn't take that much time. Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
On Wed, Jan 13, 2021 at 12:31:00PM +0000, Michael Schroeder wrote:
On Wed, Jan 13, 2021 at 09:16:08AM -0300, Cristian Rodríguez wrote:
Yes, well known,. by design it does not install packages in a transaction but one by one and this is why it is so slow.
I don't see why that makes so much of a difference. All the filesystem and database operations are the same if you use one transaction or multiple transactions. The file fingerprinting could be an issue, but it shouldn't take that much time.
Btw, what exactly did you measure? Rpm turns of database fsyncing if it created a new rpm database, so installing a new system will be faster if using a big transaction. (We could do the same in libzypp, of course.) Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
participants (18)
-
Ben Greiner
-
Bernhard M. Wiedemann
-
Cristian Rodríguez
-
Jan Engelhardt
-
Larry Len Rainey
-
Marcus Meissner
-
Martin Liška
-
Michael Schroeder
-
Neal Gompa
-
Nicolas Morey-Chaisemartin
-
opensuse.lietuviu.kalba
-
schily@schily.net
-
Stasiek Michalski
-
Stefan Brüns
-
Stefan Seyfried
-
Stephan Kulow
-
Takashi Iwai
-
Vojtěch Zeisek