[Bug 1233752] New: [Build E.84.3] openQA test fails in trup_smoke

https://bugzilla.suse.com/show_bug.cgi?id=1233752 Bug ID: 1233752 Summary: [Build E.84.3] openQA test fails in trup_smoke Classification: openSUSE Product: openSUSE Distribution Version: Leap 16.0 Hardware: Other URL: https://openqa.opensuse.org/tests/4657304/modules/trup _smoke/steps/26 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: jslaby@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: openQA Blocker: Yes ## Observation openQA test in scenario microos-Staging:E-Staging-MicroOS-Image-sdboot-x86_64-microos-wizard-tpm@uefi-staging fails in [trup_smoke](https://openqa.opensuse.org/tests/4657304/modules/trup_smoke/steps/26) ## Test suite description Like MicroOS, but use neither combustion nor ignition for the intial configuration, so jeos-firstboot runs. ## Reproducible Fails since (at least) Build [E.84.3](https://openqa.opensuse.org/tests/4657304) (current job) ## Expected result Last good: [E.83.16](https://openqa.opensuse.org/tests/4648526) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=microos&flavor=Staging-MicroOS-Image-sdboot&machine=uefi-staging&test=microos-wizard-tpm&version=Staging%3AE) -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[Build E.84.3] openQA test |[Build E.84.3] openQA test |fails in trup_smoke |fails in trup_smoke -- TPM | |failure in 6.12 Version|Leap 16.0 |Current Product|openSUSE Distribution |openSUSE Tumbleweed Target Milestone|--- |Current Component|Kernel |Kernel -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c2 --- Comment #2 from Jiri Slaby <jslaby@suse.com> --- I can reproduce with the qcow image provided on: https://openqa.opensuse.org/tests/4657304#downloads I use qemu with this cmdline added: -chardev socket,id=chrtpm,path=/tmp/tpm/mytpm0/swtpm-sock -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0 I use swtpm with this cmdline: swtpm socket --tpmstate dir=/tmp/tpm/mytpm0 --ctrl type=unixio,path=/tmp/tpm/mytpm0/swtpm-sock --log level=20 --tpm2 However, I failed to reproduce with 6.11.8 and 6.12.1 from: ibs://home:jirislaby:tpm ibs://home:jirislaby:tpm611 -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c3 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dleuenberger@suse.com --- Comment #3 from Jiri Slaby <jslaby@suse.com> --- (In reply to Jiri Slaby from comment #2)
I can reproduce with the qcow image provided on: https://openqa.opensuse.org/tests/4657304#downloads
But I cannot reproduce with (still failing) re-submitted: https://openqa.opensuse.org/tests/4665202#downloads I suspect this is a swtpm or qemu bug triggered by the kernel update. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c5 --- Comment #5 from Alberto Planas Dominguez <aplanas@suse.com> --- @jlee maybe you can help here. One issue that I see is that during the first boot, sometimes, not always, the kernel measures the initrd properly, extending PCR9 with the correct SHA256. We can see it in the event log: the hash from the initrd file match the one in the event log. But during the second boot, the registered hash is not the one from the initrd, but something else. In my tests the hash is stable (always the same), but does not comes from the initrd file. Checking the kernel, seems that those early (libstub) extensions comes from the UEFI protocol, not from the TPM2 kernel driver: https://github.com/torvalds/linux/blob/master/drivers/firmware/efi/libstub/e... The API used from EFI will do the hash, the extension and entry in the event log, so I guess that maybe one candidate is OVMF. To reproduce the issue, you can get any qcow2 MicroOS sdboot image (for example: https://build.suse.de/project/monitor/home:jirislaby:tpm611 from Jiry Slaby, but should also work with the one in Factory: https://build.opensuse.org/package/show/openSUSE:Factory/openSUSE-MicroOS -- Flavor openSUSE-MicroOS:kvm-and-xen-sdboot) # sha256sum /boot/efi/opensuse-microos/**/initrd* 4e7df07a218c41b3d941b4e82c4b57236c4be2d9427c464984ad8115e3d37762 /boot/efi/opensuse-microos/6.11.9-2-default/initrd-62348a39d2c47d5e54d8d622a7a516c8ad5f111b # tpm2_eventlog /sys/kernel/security/tpm0/binary_bios_measurements | grep 4e7df07a218c (no match) The extension that correspond to the initrd is: - EventNum: 38 PCRIndex: 9 EventType: EV_EVENT_TAG DigestCount: 1 Digests: - AlgorithmId: sha256 Digest: "09a0b37829f6c5d1630f18ece6f421bee2bfc3ab0f8296971ba0c01e22bae062" EventSize: 21 Event: "ec223b8f0d0000004c696e757820696e6974726400" Running in the host: ovmf-202408-2.1.x86_64 qemu-ovmf-x86_64-202408-2.1.noarch -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c6 --- Comment #6 from Jiri Slaby <jslaby@suse.com> --- FWIW 6.11.10 passes openQA too. So it effectively confirms 6.11..6.12 TPM changes are out of question (all were backported to 6.11.10 already). Once openQA finishes testing this 6.11.10, I plan to submit 6.12 minus: commit 77d48d39e99170b528e4f2e9fc5d1d64cdedd386 Author: Ard Biesheuvel <ardb@kernel.org> Date: Thu Sep 12 17:45:49 2024 +0200 efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption It already caused bug 1231465. As a _side-effect_ it corrupted the memattr memory. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c8 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jlee@suse.com) --- Comment #8 from Jiri Slaby <jslaby@suse.com> --- (In reply to Jiri Slaby from comment #6)
Once openQA finishes testing this 6.11.10, I plan to submit 6.12 minus: commit 77d48d39e99170b528e4f2e9fc5d1d64cdedd386 efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption
FTR, no change. If you have some ideas what to try to revert, feel free to drop a message. I am out of ideas. (In reply to Alberto Planas Dominguez from comment #5)
@jlee maybe you can help here.
Please see comment #5. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c9 --- Comment #9 from Jiri Slaby <jslaby@suse.com> --- FTR, I dropped a message to upstream too: https://lore.kernel.org/all/9c893c52-e960-4f30-98ce-ba7d873145bb@kernel.org/ -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c11 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jlee@suse.com) | Status|NEW |CONFIRMED CC| |fvogt@suse.com --- Comment #11 from Fabian Vogt <fvogt@suse.com> --- I was able to reproduce the issue in a local openQA instance and then in local QEMU as well, with the same (wrong) hash value for the initrd. Instead of debugging from the kernel side I decided to add some debug output to ovmf and I see: FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd-66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd-66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open '\opensuse-microos\6.12.1-1-default\linux-09b6ad7be161aebd75dd2f8aa5f775668763c54b' Success ... Called TcgDxeHashLogExtendEvent 0 58683000 1B1E78C 5FE63C00 5E3492AA Data 28 B5 2F FD ... E1 29 FE 0 i.e. it measured 0x1B1E78C bytes of (presumably) initrd-66cefa944713471a097f3b09c647448959ae286f, which start with 28 B5 2F FD and end with E1 29 FE 0. Looking at the mounted EFI partition this doesn't quite match: 00000000 28 b5 2f fd 04 58 4c e1 01 da a7 4a a4 3e 10 10 |(./..XL....J.>..| ... 01b1e780 20 1a 17 dd 4a 1e c0 c9 e1 29 fe | ...J....).| The beginning is correct, but there's a spurious 0 byte at the end because the size is one byte too much. Question is where this is coming from - some attempt at alignment leaking through until measurement? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c12 --- Comment #12 from Jiri Slaby <jslaby@suse.com> --- (In reply to Fabian Vogt from comment #11)
I was able to reproduce the issue in a local openQA instance and then in local QEMU as well, with the same (wrong) hash value for the initrd.
Instead of debugging from the kernel side I decided to add some debug output to ovmf and I see:
FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd- 66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd- 66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open
I assume, this is coming from: https://github.com/torvalds/linux/blob/v6.12/drivers/firmware/efi/libstub/ef... First, obtain size, then load the file.
'\opensuse-microos\6.12.1-1-default\linux- 09b6ad7be161aebd75dd2f8aa5f775668763c54b' Success ... Called TcgDxeHashLogExtendEvent 0 58683000 1B1E78C 5FE63C00 5E3492AA Data 28 B5 2F FD ... E1 29 FE 0
And this is: https://github.com/torvalds/linux/blob/v6.12/drivers/firmware/efi/libstub/ef... with size obtained above on line 569.
i.e. it measured 0x1B1E78C bytes of (presumably) initrd-66cefa944713471a097f3b09c647448959ae286f, which start with 28 B5 2F FD and end with E1 29 FE 0.
Looking at the mounted EFI partition this doesn't quite match:
00000000 28 b5 2f fd 04 58 4c e1 01 da a7 4a a4 3e 10 10 |(./..XL....J.>..| ... 01b1e780 20 1a 17 dd 4a 1e c0 c9 e1 29 fe | ...J....).|
The beginning is correct, but there's a spurious 0 byte at the end because the size is one byte too much. Question is where this is coming from - some attempt at alignment leaking through until measurement?
Looks like from ovmf's load_file then? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c13 --- Comment #13 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Fabian Vogt from comment #11)
i.e. it measured 0x1B1E78C bytes of (presumably) initrd-66cefa944713471a097f3b09c647448959ae286f
Does this size match the one from initrd, or is one byte longer the buffer? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c14 --- Comment #14 from Fabian Vogt <fvogt@suse.com> --- (In reply to Alberto Planas Dominguez from comment #13)
(In reply to Fabian Vogt from comment #11)
i.e. it measured 0x1B1E78C bytes of (presumably) initrd-66cefa944713471a097f3b09c647448959ae286f
Does this size match the one from initrd, or is one byte longer the buffer?
The initrd file size is 0x1B1E78B, i.e. one byte shorter. By adding the 0 byte I can replicate the measured digest. (In reply to Jiri Slaby from comment #12)
(In reply to Fabian Vogt from comment #11)
I was able to reproduce the issue in a local openQA instance and then in local QEMU as well, with the same (wrong) hash value for the initrd.
Instead of debugging from the kernel side I decided to add some debug output to ovmf and I see:
FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd- 66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open '\opensuse-microos\6.12.1-1-default\initrd- 66cefa944713471a097f3b09c647448959ae286f' Success FSOpen: Open
I assume, this is coming from: https://github.com/torvalds/linux/blob/v6.12/drivers/firmware/efi/libstub/ efi-stub-helper.c#L569
First, obtain size, then load the file.
'\opensuse-microos\6.12.1-1-default\linux- 09b6ad7be161aebd75dd2f8aa5f775668763c54b' Success ... Called TcgDxeHashLogExtendEvent 0 58683000 1B1E78C 5FE63C00 5E3492AA Data 28 B5 2F FD ... E1 29 FE 0
And this is: https://github.com/torvalds/linux/blob/v6.12/drivers/firmware/efi/libstub/ efi-stub-helper.c#L313
with size obtained above on line 569.
i.e. it measured 0x1B1E78C bytes of (presumably) initrd-66cefa944713471a097f3b09c647448959ae286f, which start with 28 B5 2F FD and end with E1 29 FE 0.
Looking at the mounted EFI partition this doesn't quite match:
00000000 28 b5 2f fd 04 58 4c e1 01 da a7 4a a4 3e 10 10 |(./..XL....J.>..| ... 01b1e780 20 1a 17 dd 4a 1e c0 c9 e1 29 fe | ...J....).|
The beginning is correct, but there's a spurious 0 byte at the end because the size is one byte too much. Question is where this is coming from - some attempt at alignment leaking through until measurement?
Looks like from ovmf's load_file then?
Maybe, I have to check how to trace that... -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c15 --- Comment #15 from Fabian Vogt <fvogt@suse.com> --- Using the EFI shell I see the correct size and with the "initrd" command and then executing the linux kernel directly it's measured as expected. I found that sd-boot aligns the initrd size if there is more than one initrd component: https://github.com/systemd/systemd/blob/c3dc460b6c3f062af540e4233c65ac12c010... Maybe for some reason it thinks there is another empty initrd component? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c16 --- Comment #16 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Fabian Vogt from comment #15)
Using the EFI shell I see the correct size and with the "initrd" command and then executing the linux kernel directly it's measured as expected.
I found that sd-boot aligns the initrd size if there is more than one initrd component: https://github.com/systemd/systemd/blob/ c3dc460b6c3f062af540e4233c65ac12c01077fa/src/boot/stub.c#L108
Maybe for some reason it thinks there is another empty initrd component?
Uhm ... this is the libstub from systemd (that understand UKIs, sections, extensions, ...), but in this case we use the libstub from the kernel. The systemd-boot is this one (run entry point): https://github.com/systemd/systemd/blob/c3dc460b6c3f062af540e4233c65ac12c010... I think that in our case is the kernel (via its own libstub) the one that loads the initrd with LOAD_FILE protocol. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c17 James Bottomley <James.Bottomley@HansenPartnership.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |James.Bottomley@HansenPartn | |ership.com --- Comment #17 from James Bottomley <James.Bottomley@HansenPartnership.com> --- (In reply to Alberto Planas Dominguez from comment #16)
Uhm ... this is the libstub from systemd (that understand UKIs, sections, extensions, ...), but in this case we use the libstub from the kernel.
The systemd-boot is this one (run entry point): https://github.com/systemd/systemd/blob/ c3dc460b6c3f062af540e4233c65ac12c01077fa/src/boot/boot.c#L2863
I think that in our case is the kernel (via its own libstub) the one that loads the initrd with LOAD_FILE protocol.
You're missing the fact that systemd-boot installs its own Load File2 protocol which the kernel then uses: https://github.com/systemd/systemd/blob/3f3b4959e2cb9bca1e1ed527a0692c9a8b6a... This protocol currently zero pads the binary to a multiple of 4: https://github.com/systemd/systemd/blob/3f3b4959e2cb9bca1e1ed527a0692c9a8b6a... This was actually a measurement breaking change introduced into v257-rc1 by f8fa4222c9 ("boot: Make initrd_prepare() semantically equivalent to combine_initrds()"). -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c18 --- Comment #18 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to James Bottomley from comment #17)
(In reply to Alberto Planas Dominguez from comment #16)
Uhm ... this is the libstub from systemd (that understand UKIs, sections, extensions, ...), but in this case we use the libstub from the kernel.
The systemd-boot is this one (run entry point): https://github.com/systemd/systemd/blob/ c3dc460b6c3f062af540e4233c65ac12c01077fa/src/boot/boot.c#L2863
I think that in our case is the kernel (via its own libstub) the one that loads the initrd with LOAD_FILE protocol.
You're missing the fact that systemd-boot installs its own Load File2 protocol which the kernel then uses:
Oh! I was aware the one from the shim, but not this.
This protocol currently zero pads the binary to a multiple of 4:
Do you know if this regression is reported upstream? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c19 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?, | |needinfo?(fbui@suse.com) CC| |fbui@suse.com --- Comment #19 from Jiri Slaby <jslaby@suse.com> --- (In reply to Alberto Planas Dominguez from comment #18)
This protocol currently zero pads the binary to a multiple of 4:
Do you know if this regression is reported upstream?
I am not aware of any reports, so I created one: https://github.com/systemd/systemd/issues/35439 (In reply to James Bottomley from comment #17)
This was actually a measurement breaking change introduced into v257-rc1 by f8fa4222c9 ("boot: Make initrd_prepare() semantically equivalent to combine_initrds()").
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo? | -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c20 --- Comment #20 from Jiri Slaby <jslaby@suse.com> ---
(In reply to James Bottomley from comment #17)
This was actually a measurement breaking change introduced into v257-rc1 by f8fa4222c9 ("boot: Make initrd_prepare() semantically equivalent to combine_initrds()").
And backported to 256.7 as: commit d64193a2a652b15db9cb9ed10c6b77a17ca46cd2 Author: Daan De Meyer <daan.j.demeyer@gmail.com> Date: Fri Sep 13 21:37:20 2024 +0200 boot: Make initrd_prepare() semantically equivalent to combine_initrds()
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c21 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(fbui@suse.com) | --- Comment #21 from Franck Bui <fbui@suse.com> --- Hi Jiri, (In reply to Jiri Slaby from comment #19)
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes?
I'm not a fan of committing and submitting changes in Base:System solely for testing purposes. However, we are seeing the same regression while testing my development project for v257: https://openqa.opensuse.org/tests/4674128# I can give it a try in this playground instead. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c22 --- Comment #22 from Fabian Vogt <fvogt@suse.com> --- (In reply to Franck Bui from comment #21)
Hi Jiri,
(In reply to Jiri Slaby from comment #19)
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes?
I'm not a fan of committing and submitting changes in Base:System solely for testing purposes.
IMO it's not just for testing. If it fixes the issue we need it in TW.
However, we are seeing the same regression while testing my development project for v257: https://openqa.opensuse.org/tests/4674128#
I can give it a try in this playground instead. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c23 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo? --- Comment #23 from Jiri Slaby <jslaby@suse.com> --- (In reply to Fabian Vogt from comment #22)
(In reply to Franck Bui from comment #21)
Hi Jiri,
(In reply to Jiri Slaby from comment #19)
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes?
I'm not a fan of committing and submitting changes in Base:System solely for testing purposes.
IMO it's not just for testing. If it fixes the issue we need it in TW.
Yes, I meant "test" if this is the issue and we can let it go through, if it performs well (fixes this bug). -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo? | -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c25 --- Comment #25 from Jiri Slaby <jslaby@suse.com> --- Heh, with systemd 256.6, openQA still fails with:
Failed to unseal secret using TPM2: State not recoverable -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c26 --- Comment #26 from Franck Bui <fbui@suse.com> --- Strange. With my devel project + commit d64193a2a652b15db9cb9ed10c6b77a17ca46cd2 reverted the failing test seems to be fixed now: https://build.opensuse.org/staging_workflows/openSUSE:Factory/staging_projec... -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c27 --- Comment #27 from Jiri Slaby <jslaby@suse.com> --- (In reply to Franck Bui from comment #26)
Strange.
With my devel project + commit d64193a2a652b15db9cb9ed10c6b77a17ca46cd2 reverted the failing test seems to be fixed now: https://build.opensuse.org/staging_workflows/openSUSE:Factory/ staging_projects/openSUSE:Factory:Staging:M
I suspect there are two bugs. First, a mismatch in initrd handling which leads to the root password prompt (to unlock / (root)) instead of verification against the root password embedded in TPM. This one is fixed (I believe) by the above revert (or going back to systemd 256.6). Second, the one which passes (unlocks) the encrypted / (root) partition, but logs the TPM failure:
Failed to unseal secret using TPM2: State not recoverable -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c28 --- Comment #28 from Jiri Slaby <jslaby@suse.com> --- (In reply to Jiri Slaby from comment #27)
(In reply to Franck Bui from comment #26)
Strange.
With my devel project + commit d64193a2a652b15db9cb9ed10c6b77a17ca46cd2 reverted the failing test seems to be fixed now: https://build.opensuse.org/staging_workflows/openSUSE:Factory/ staging_projects/openSUSE:Factory:Staging:M
I suspect there are two bugs.
First, a mismatch in initrd handling which leads to the root password prompt (to unlock / (root)) instead of verification against the root password embedded in TPM. This one is fixed (I believe) by the above revert (or going back to systemd 256.6).
In the original (6.12) submission, this was: https://openqa.opensuse.org/tests/4657304#step/trup_smoke/26 This one is reported in this bug (comment #0).
Second, the one which passes (unlocks) the encrypted / (root) partition, but logs the TPM failure:
Failed to unseal secret using TPM2: State not recoverable
And this one back then too: https://openqa.opensuse.org/tests/4657285#step/journal_check/22 I originally thought, they are the same, but apparently not. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c29 --- Comment #29 from Jiri Slaby <jslaby@suse.com> --- Created attachment 878842 --> https://bugzilla.suse.com/attachment.cgi?id=878842&action=edit screenshot of console (In reply to Jiri Slaby from comment #28)
Second, the one which passes (unlocks) the encrypted / (root) partition, but logs the TPM failure:
Failed to unseal secret using TPM2: State not recoverable
And this one back then too: https://openqa.opensuse.org/tests/4657285#step/journal_check/22
I originally thought, they are the same, but apparently not.
OK, again, I cannot reproduce. FWIW, the console says twice during encryption:
Warning: keyslot operation could fail as it requires more than available memory.
I have no idea if it is related, but still, journal does NOT contain the error here. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c30 --- Comment #30 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Jiri Slaby from comment #29)
OK, again, I cannot reproduce. FWIW, the console says twice during encryption:
Warning: keyslot operation could fail as it requires more than available memory.
I have no idea if it is related, but still, journal does NOT contain the error here.
I saw it too and I do not think they are related. This is about the key derivation function. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c31 --- Comment #31 from Franck Bui <fbui@suse.com> --- (In reply to Jiri Slaby from comment #20)
Franck, would it be possible to revert the above commit and submit (to Staging:E) for testing purposes?
Given that upstream acknowledged the problem and they will likely fix it quickly (as it's a blocker for the v257 release) I reverted the offending commit in Base:System and will submit to Factory shortly (as soon as OBS is able to build the package). -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c34 --- Comment #34 from Jiri Slaby <jslaby@suse.com> --- The other failure logs this as found in journal_check-full_journal.log in openQA:
systemd-cryptsetup[588]: WARNING:esys:src/tss2-esys/api/Esys_PolicyPCR.c:288:Esys_PolicyPCR_Finish() Received TPM Error systemd-cryptsetup[588]: ERROR:esys:src/tss2-esys/api/Esys_PolicyPCR.c:100:Esys_PolicyPCR() Esys Finish ErrorCode (0x00000128) systemd-cryptsetup[588]: Failed to unseal secret using TPM2: State not recoverable systemd-cryptsetup[588]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/7a3a2159-a270-4e1f-ba3a-6571268c249f. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c35 --- Comment #35 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Jiri Slaby from comment #34)
The other failure logs this as found in journal_check-full_journal.log in openQA:
systemd-cryptsetup[588]: WARNING:esys:src/tss2-esys/api/Esys_PolicyPCR.c:288:Esys_PolicyPCR_Finish() Received TPM Error systemd-cryptsetup[588]: ERROR:esys:src/tss2-esys/api/Esys_PolicyPCR.c:100:Esys_PolicyPCR() Esys Finish ErrorCode (0x00000128) systemd-cryptsetup[588]: Failed to unseal secret using TPM2: State not recoverable systemd-cryptsetup[588]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/7a3a2159-a270-4e1f-ba3a-6571268c249f.
This is the normal error when the prediction fail and the policy in the TPM2 does not match the expected values for the tracked PCRs. This is always followed by a request of the recovery password to unlock the LUKS2 device. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c36 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(aplanas@suse.com) --- Comment #36 from Jiri Slaby <jslaby@suse.com> --- (In reply to Alberto Planas Dominguez from comment #35)
(In reply to Jiri Slaby from comment #34)
The other failure logs this as found in journal_check-full_journal.log in openQA:
systemd-cryptsetup[588]: WARNING:esys:src/tss2-esys/api/Esys_PolicyPCR.c:288:Esys_PolicyPCR_Finish() Received TPM Error systemd-cryptsetup[588]: ERROR:esys:src/tss2-esys/api/Esys_PolicyPCR.c:100:Esys_PolicyPCR() Esys Finish ErrorCode (0x00000128) systemd-cryptsetup[588]: Failed to unseal secret using TPM2: State not recoverable systemd-cryptsetup[588]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/7a3a2159-a270-4e1f-ba3a-6571268c249f.
This is the normal error when the prediction fail and the policy in the TPM2 does not match the expected values for the tracked PCRs. This is always followed by a request of the recovery password to unlock the LUKS2 device.
In this case, it is not. So what's wrong now, actually? I spent quite some time to find out the constant in libtpms: src/tpm2/TpmTypes.h:#define TPM_RC_PCR_CHANGED (TPM_RC)(RC_VER1+0x028) which might come from: https://github.com/stefanberger/libtpms/blob/f22745c72933b17a743e5e8648ca691... or https://github.com/stefanberger/libtpms/blob/f22745c72933b17a743e5e8648ca691... Might be worth trying to update libtpms+swtpm to 0.10s? -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c37 Alberto Planas Dominguez <aplanas@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(aplanas@suse.com) | --- Comment #37 from Alberto Planas Dominguez <aplanas@suse.com> --- Sure, I updated them here: https://build.opensuse.org/requests/1228299 https://build.opensuse.org/requests/1228302 -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c39 --- Comment #39 from Fabian Vogt <fvogt@suse.com> --- (In reply to Jiri Slaby from comment #36)
(In reply to Alberto Planas Dominguez from comment #35)
(In reply to Jiri Slaby from comment #34)
The other failure logs this as found in journal_check-full_journal.log in openQA:
systemd-cryptsetup[588]: WARNING:esys:src/tss2-esys/api/Esys_PolicyPCR.c:288:Esys_PolicyPCR_Finish() Received TPM Error systemd-cryptsetup[588]: ERROR:esys:src/tss2-esys/api/Esys_PolicyPCR.c:100:Esys_PolicyPCR() Esys Finish ErrorCode (0x00000128) systemd-cryptsetup[588]: Failed to unseal secret using TPM2: State not recoverable systemd-cryptsetup[588]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/7a3a2159-a270-4e1f-ba3a-6571268c249f.
This is the normal error when the prediction fail
Sure? FWICT this error happens if a measurement happens during unsealing resp. preparations.
and the policy in the TPM2 does not match the expected values for the tracked PCRs. This is always followed by a request of the recovery password to unlock the LUKS2 device.
In this case, it is not. So what's wrong now, actually?
If my analysis is correct (and I'm not that sure) it's a race condition between systemd-cryptsetup building the policy before unsealing and some other process (maybe kernel?) triggering a PCR value change. Currently systemd only handles this for Unseal, but not PolicyPCR. I filed https://github.com/systemd/systemd/issues/35490. I still don't know why only kernel 6.12 triggers this.
I spent quite some time to find out the constant in libtpms: src/tpm2/TpmTypes.h:#define TPM_RC_PCR_CHANGED (TPM_RC)(RC_VER1+0x028)
There's a tool for that :P
tpm2_rc_decode 0x00000128 tpm:error(2.0): PCR have changed since checked
I'll try to debug swtpm and trace what exactly triggers the error. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c40 --- Comment #40 from Alberto Planas Dominguez <aplanas@suse.com> --- (In reply to Fabian Vogt from comment #39)
If my analysis is correct (and I'm not that sure) it's a race condition between systemd-cryptsetup building the policy before unsealing and some other process (maybe kernel?) triggering a PCR value change.
I can think about IMA (PCR10), and we used to have this race condition. In systemd the solution was re-try it several times: https://github.com/aafeijoo-suse/systemd/commit/f6a077f3e780cf5fd572b92648cc... -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 https://bugzilla.suse.com/show_bug.cgi?id=1233752#c41 --- Comment #41 from Fabian Vogt <fvogt@suse.com> --- (In reply to Alberto Planas Dominguez from comment #40)
(In reply to Fabian Vogt from comment #39)
If my analysis is correct (and I'm not that sure) it's a race condition between systemd-cryptsetup building the policy before unsealing and some other process (maybe kernel?) triggering a PCR value change.
I can think about IMA (PCR10), and we used to have this race condition. In systemd the solution was re-try it several times:
https://github.com/aafeijoo-suse/systemd/commit/ f6a077f3e780cf5fd572b92648ccead51b311c26
Yes, the report I filed upstream is about doing the same for PolicyPCR. I see that in openQA, secure boot is enabled which sets the default IMA policy to measure all loaded modules. This can definitely cause races during boot. My system here does not have SB enabled, so that explains why I don't have problems here. -- You are receiving this mail because: You are the assignee for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1233752 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |systemd-maintainers@suse.de -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com