[Bug 1231325] New: dracut-pcr-signature races with udev for ESP symlinks
https://bugzilla.suse.com/show_bug.cgi?id=1231325 Bug ID: 1231325 Summary: dracut-pcr-signature races with udev for ESP symlinks Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Bootloader Assignee: screening-team-bugs@suse.de Reporter: arvidjaar@gmail.com QA Contact: qa-bugs@suse.de CC: aplanas@suse.com Target Milestone: --- Found By: --- Blocker: --- My FDE VM systematically fails to automatically unlock root the first time after host has been restarted. I compared PCR values and TPM log between bad and good cases and no changes are present - the values are identical. Then I remembered that not so long ago the same VM failed to boot, with log in https://bugzilla.suse.com/attachment.cgi?id=873748. Which shows [ 23.780502] localhost pcr-signature.sh[417]: mount: /tmp/pcr-signature: wrong fs type, bad option, bad superblock on /dev/disk/by-partuuid/4d904b6d-6495-4891-8481-fe0fbdacec21, missing codepage or helper program, or other error. ... [ 24.814025] localhost systemd[1]: systemd-cryptsetup@cr_root.service: Control process exited, code=exited, status=1/FAILURE and then I looked into dracut-pcr-oracle changelog which says commit ae07fc611de1884207614125533482477e9e2f8b Author: Alberto Planas <aplanas@suse.com> Date: Tue Apr 16 19:19:21 2024 +0200 Do not hard fail when error in mount which explains why my VM stopped to fail, but continued to require manual unlocking. pcr-signature.sh is using DEV="/dev/disk/by-partuuid/${ESP_UUID}" where ${ESP_UUID} is taken from EFI NVRAM, but it never actually checks that it exists or waits for this link to appear. Which pretty much explains my failures. pcr-signature.sh must really be split into two parts 1. systemd generator that determines device and adds Requires and After for this device to (root device) systemd-cryptsetup@.service. 2. The script that mounts this device. Actually, if we go generator route we could just as well create the proper mount unit for the LoaderDevice and simply add the proper dependencies. Because currently there is obvious race condition between multiple encrypted device, each attempting to mount and then unmount LoaderDevice. Having the proper mount unit will solve this as well. Hmm ... I expected that it would race with /boot/efi mount on Tumbleweed, but I was surprised that with systemd-boot ESP is not mounted in initrd. I am not sure why, but this may be potential source of problems with grub2 BLS module. Because we cannot mount the same FAT filesystem twice. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 Alberto Planas Dominguez <aplanas@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|screening-team-bugs@suse.de |aplanas@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 https://bugzilla.suse.com/show_bug.cgi?id=1231325#c3 --- Comment #3 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Andrei Borzenkov from comment #2)
Anyway, at the end is proof-of-concept generator that adds dependency on the ESP device.
Great analysis and thanks for the generator, I will use it.
It intentionally does Wants, not Requires, because we want to boot even if something fails here.
The referenced PR in dracut-pcr-oracle uses the same logic. Instead of hard fail continue with the boot. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 https://bugzilla.suse.com/show_bug.cgi?id=1231325#c4 --- Comment #4 from Andrei Borzenkov <arvidjaar@gmail.com> --- There is still race condition when multiple encrypted devices are present in intird (most obvious case - root and swap). All of them will independently copy file(s) from ESP overwriting the previous copy racing with systemd-cryptsetup which is using them. It may result in systemd-cryptsetup getting incomplete or otherwise corrupted file. Checking for existence of the copy target reduces race window but does not eliminate it completely. Copying should be done in separate unit which has RemainAfterExit=true. It guarantees that copying will happen just once before all systemd-cryptsetup services. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 https://bugzilla.suse.com/show_bug.cgi?id=1231325#c5 --- Comment #5 from Alberto Planas Dominguez <aplanas@suse.com> --- @Andrei, based on your idea of the generator I am preparing something like this: https://github.com/aplanas/dracut-pcr-signature/pull/4 Is still a draft as I am doing something wrong when making the service wanted in some target. For example, if I add it to cryptsetup.target.wants/ then the cycle is something like: [ 2.579258][ T1] systemd[1]: No hostname configured, using default hostname. [ 2.580325][ T1] systemd[1]: Hostname set to <localhost>. [ 2.762644][ T1] systemd[1]: sysinit.target: Found ordering cycle on cryptsetup.target/start [ 2.763712][ T1] systemd[1]: sysinit.target: Found dependency on pcr-signature.service/start [ 2.764745][ T1] systemd[1]: sysinit.target: Found dependency on basic.target/start [ 2.765713][ T1] systemd[1]: sysinit.target: Found dependency on sysinit.target/start [ 2.766668][ T1] systemd[1]: sysinit.target: Job cryptsetup.target/start deleted to break ordering cycle starting with sysinit.target/start [ SKIP ] Ordering cycle found, skipping Local Encrypted Volumes And then I add it to initrd.target.wants, the cycle gets extended with a "Found dependency or cryptsetupt-pre.target/start" I would love a review from your side on the code, but I think that if I figure out the cycle issue this could resolve the races in the code. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 https://bugzilla.suse.com/show_bug.cgi?id=1231325#c6 --- Comment #6 from Alberto Planas Dominguez <aplanas@suse.com> --- After some more research I think both races conditions should be fixed here: https://github.com/aplanas/dracut-pcr-signature/pull/4 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231325 https://bugzilla.suse.com/show_bug.cgi?id=1231325#c7 Alberto Planas Dominguez <aplanas@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #7 from Alberto Planas Dominguez <aplanas@suse.com> --- Closing as PR#4 has been merged and released. Thanks a lot for the help here. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com