[Bug 1096072] New: systemd hangs on initrd-switch-root.service for 1.5 minutes when booting
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072 Bug ID: 1096072 Summary: systemd hangs on initrd-switch-root.service for 1.5 minutes when booting Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: alarrosa@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 772558 --> http://bugzilla.opensuse.org/attachment.cgi?id=772558&action=edit journalctl --system --since "1 hour ago" After upgrading my Leap 42.3 system to Leap 15.0 it takes very long to boot up. Of that time, around 1.5 minutes (1:34) is spent on initrd-switch-root.service while it seems the system is doing nothing (at least the hard disk light stops blinking for all that time). The system stops just after: Jun 05 18:32:41 geeko systemd[1]: Switching root. Jun 05 18:32:41 geeko systemd-journald[154]: Journal stopped And then continues with: Jun 05 18:34:16 geeko systemd-journald[154]: Received SIGTERM from PID 1 (systemd). Jun 05 18:34:16 geeko kernel: systemd: 20 output lines suppressed due to ratelimiting Jun 05 18:34:16 geeko kernel: EXT4-fs (sda6): re-mounted. Opts: acl,user_xattr Jun 05 18:34:16 geeko systemd-journald[414]: Journal started Jun 05 18:34:16 geeko systemd-journald[414]: Runtime journal (/run/log/journal/f7cb102c5d6c079fd36e51615215ef87) is 8.0M, max 383.4M, 375.4M free. Jun 05 18:32:43 geeko systemd[1]: systemd 234 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN default-hierarchy=hybrid) Jun 05 18:32:43 geeko systemd[1]: Detected architecture x86-64. Jun 05 18:32:43 geeko systemd[1]: Set hostname to <geeko>. Jun 05 18:34:13 geeko systemd[1]: system-generators terminated by signal ALRM. Jun 05 18:34:16 geeko apparmor.systemd[420]: Restarting AppArmor Jun 05 18:34:13 geeko systemd[1]: Execution failed: Protocol error Note the systemd log from 18:32:43 being written after other logs from 18:34:16 and the "Execution failed: Protocol error" message which might be related. I tried enabling debug-shell.service, but that is started after systemd unfreezes. I attach the full output of journalctl --system --since "1 hour ago". Btw, for some reason, in this system, journalctl only shows the journal logs from the current boot, never showing logs from previous boots. Also, just in case it rings a bell, I also have this problem on the same system: https://bugzilla.opensuse.org/show_bug.cgi?id=1096050 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c1
--- Comment #1 from Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c3
Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c4
--- Comment #4 from Franck Bui
systemd[414]: Spawned /usr/lib/systemd/system-generators/ibft-rule-generator as 415. systemd[414]: Spawned /usr/lib/systemd/system-generators/lvm2-activation-generator as 416. systemd[414]: Spawned /usr/lib/systemd/system-generators/nfs-server-generator as 417. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-cryptsetup-generator as 418. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-debug-generator as 419. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-fstab-generator as 420. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-getty-generator as 421. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-gpt-auto-generator as 422. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-hibernate-resume-generator as 423. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-insserv-generator as 424. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-openqa-generator as 425. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-rc-local-generator as 426. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-system-update-generator as 427. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-sysv-generator as 428. systemd[414]: Spawned /usr/lib/systemd/system-generators/systemd-veritysetup-generator as 429.
It would be interesting to figure out which one fails. You can do so by masking one generator by one and run "systemctl daemon-reload" which should run all generators again assuming that the issue can be reproduced while reloading systemd configuration. To mask one generator you can do for example: $ ln -s /dev/null /etc/systemd/system-generators/ibft-rule-generator -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c5
--- Comment #5 from Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c8
--- Comment #8 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c9
--- Comment #9 from Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c10
--- Comment #10 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c11
--- Comment #11 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c12
--- Comment #12 from Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c13
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c14
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c15
--- Comment #15 from Dr. Werner Fink
hum, what's this thing:
#8 0x00007ffff64dd751 in _gcry_random_selftest (report=report@entry=0x7ffff642b2a0 <reporter>) at random.c:581 #9 0x00007ffff642c1ea in run_random_selftests () at fips.c:589 #10 _gcry_fips_run_selftests (extended=extended@entry=0) at fips.c:736
Werner any idea ?
Ahh .. yes, the FIPS tests do drain the current entropy of the system. This is a known problem of FIPS I was not aware that FIPS had been enabled on Leap 15. Also I've never understood why those FIPS tests done at first usage of FIPS do drain the entropy of a system without restoring or refilling the entropy as not existin entropy make the system unusable ... no program can use real random numbers anymore upto the point where the entropy gets filled again. Now on switch root every program wich had not been rebased to the new root (by using chroot(2)) and protected with a leading `@' byte in its argv[0] will be killed by systemd and/or do removed name space. To get this final solved the kernel has to collect entropy from any source it can use in initrd, e.g. using the jitter on the CPUs https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c16
Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c17
--- Comment #17 from Antonio Larrosa
Since this started happening after upgrading 42.3 to 15.0, I guess this file was created during the upgrade process.
Yes, the problems (both this and the sudo issue) started happening after upgrading to 15.0, but note that it seems the file wasn't created during the upgrade process. At least, it has an mtime of 2017-10-16 at 13:42:00 and the upgrade was done on 2018-06-03. I checked old zypper logs and it seems on 2017-10-16 I ran "zypper in libgcrypt20-hmac" at 13:41:42 for some reason I can't remember. So it matches with probably the %post script of the package creating it, can it be? At least, the package description mentions FIPS. Just in case it matters, libgcrypt20-hmac-1.6.1-39.1 was the package installed at that time, from the openSUSE 42.3 Update repository, while I currently have libgcrypt20-hmac-1.8.2-lp150.4.2 . -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c18
--- Comment #18 from Dr. Werner Fink
Talked to Antonio privately and it appears that gcrypt run in fips mode for some *unknown* reasons. This has the side effect to run some self tests on each program linked to libgcrypt (I think) and those self tests exhaust /dev/urandom somehow.
The gcrypt fips mode is activated by the presence of /etc/gcrypt/fips_enabled file. And removing this file solves this issue as well as bug #1096050.
So I guess the question is now how was this file created at all.
I don't have a /etc/gcrypt/fips_enabled nor /etc/gcrypt/ around here ... to which package does those belogn to? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c19
--- Comment #19 from Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c21
--- Comment #21 from Dr. Werner Fink
it actually is on sle15 and leap 15.
Hmm ... then the question rises: why do the fips test not restore the entropy level after the tests? It should be possible to save a copy of the current entropy and write it back after the tests, the kernel has an API exactly for this, see ioctl RNDADDENTROPY in man:random(4). Drain the entropy makes the system unusable. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c22
--- Comment #22 from Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c23
--- Comment #23 from Franck Bui
I don't have a /etc/gcrypt/fips_enabled nor /etc/gcrypt/ around here ... to which package does those belogn to?
Perhaps /proc/sys/crypto/fips_enabled exists and contains a value different from 0 ? Otherwise your issue is probably different. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c24
--- Comment #24 from Dr. Werner Fink
(In reply to Dr. Werner Fink from comment #18).
I don't have a /etc/gcrypt/fips_enabled nor /etc/gcrypt/ around here ... to which package does those belogn to?
Perhaps /proc/sys/crypto/fips_enabled exists and contains a value different from 0 ?
Otherwise your issue is probably different.
noether:~ # cat /proc/sys/crypto/fips_enabled cat: /proc/sys/crypto/fips_enabled: No such file or directory noether:~ # cat /proc/sys/crypto/ cat: /proc/sys/crypto/: No such file or directory -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c25
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c26
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c27
--- Comment #27 from Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c28
--- Comment #28 from Dr. Werner Fink
fips is not expected to work on leap 15. Clearly a SLE feature. How could this activate itself?
IMHO this may lead to similar problems as I guess that fips test might also drain the entropy on a SLES 15 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c29
--- Comment #29 from Franck Bui
IMHO this may lead to similar problems as I guess that fips test might also drain the entropy on a SLES 15
Agreed although in practice the HWs targeted by SLES might have their entropy pool initialized before it is needed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c32
Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
Dr. Werner Fink
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c33
Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c34
--- Comment #34 from Dr. Werner Fink
I enabled fips again and rebooted to be sure the problem was reproduced again. Then I installed the haveged package from your home, ran mkinitrd and rebooted. With your package, I can confirm the boot delay doesn't happen anymore.
Thanks a lot ... that is now it makes sense to change the hack (currently using a signal handler instead a local socket to trigger the chroot/execv). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c35
--- Comment #35 from Dr. Werner Fink
Btw, the problem with sudo (bsc#1096050) still continues happening as I guess that's related to fips being enabled and not the the entropy pool being empty.
Then bsc#1096050 is not duplicate, isn't it? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072#c37
--- Comment #37 from Antonio Larrosa
http://bugzilla.opensuse.org/show_bug.cgi?id=1096072
Dr. Werner Fink
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com