[Bug 1205261] New: dracut/hooks/emergency...ESP's FAT serial number in initrd halts boot in dracut emergency shell after rsync migration to new GPT system disk
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 Bug ID: 1205261 Summary: dracut/hooks/emergency...ESP's FAT serial number in initrd halts boot in dracut emergency shell after rsync migration to new GPT system disk Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: mrmazda@earthlink.net QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 862770 --> http://bugzilla.opensuse.org/attachment.cgi?id=862770&action=edit rdsosreport.txt from TW boot attempt Original Summary: dracut/hooks/emergency...ESP's FAT serial number in initrd halts boot in dracut emergency shell after rsync migration to new GPT system disk Initial state: 1-configured booting & mounting are via LABELs (making UUIDs administratively unimportant, and grub.cfg's own auto-generated stanzas uncommonly necessary or desired), with /etc/grub.d/06_custom causing /boot/grub2/custom.cfg's vmlinuz & initrd symlink stanzas to precede auto-generated entries at boot time. Example: <https://forums.opensuse.org/showthread.php/533087-How-to-have-a-custom-UEFI-grub-menu-for-a-multiboot-system?p=2880389#post2880389> 2-in /etc/default/grub: GRUB_DISTRIBUTOR="opensusetw" # TW20221008 last zypper dup 3-multiboot of TW with Leap 15.1, 15.2, 15.3, 15.4, Debian 11 & 12, Ubuntu 20.04 & 22.04 on single NVME 4-only TW installation is configured to touch NVRAM or ESP for writing (e.g., other OSes' fstabs don't mount ESP, and/or they have no bootloader installed) To reproduce: 1-GPT partition new NVME with ESP, swap, and / targets for TW, and / for at least one additional distro 2-format new NVME's matching targets ESP FAT32, swap swap, and / EXT4 for TW 3-"rsync -rlptgoDHAX --exclude 'lost+found'" from old NVME's ESP and / to new ESP and / filesystems for TW 4-appropriately edit volume LABELs on new NVME /boot/grub2/grub.cfg, /boot/grub2/custom.cfg, /etc/fstab 5-repeat create/format/rsync/edit for additional distro(s) 6-remove original NVME 7-try to boot from new NVME Actual behavior: 1-all other distro(s) boot normally (via TW's custom.cfg entries) as if nothing had been changed 2-TW boot halts in dracut emergency shell because the original ESP's FAT serial number cannot be found (see attached rdsosreport.txt) Expected behavior: 1-all distros boot normally (via custom.cfg entries) as if nothing had been changed Notes: 1-Boot is normal since rebuilding of initrds post-migration. 2-I looked for ESP FAT serial references in several non-TW initrds and found none. 3-from lsinitrd of original initrd-5.19.13-default: -rw-r--r-- 1 root root 92 Oct 9 23:10 usr/lib/dracut/hooks/emergency/80-\x2fdev\x2fdisk\x2fby-uuid\x2f20A0-1003.sh 4-# lsblk -f | grep vfat # (current state, not initial state) ������nvme1n1p1 vfat FAT16 PI3P01ESP 20A0-1003 (*original* ESP on 120G NVME) ������nvme0n1p1 vfat FAT32 PNY5P01ESP 4C58-8D7E 294.1M 8% /boot/efi (*new* ESP on 500G NVME) 5-This issue complicates restoring from backups. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c1 --- Comment #1 from Felix Miata <mrmazda@earthlink.net> --- Same comment #0 host gb250 normally has a HDD RAID, which I disconnected to perform the NVME migration. Since the migration appeared to be complete, I removed the original NVME, and reconnected the RAID, which has its own TW installation that had been bootable either from the native /boot directory on the RAID, or an extra EXT2 filesystem on the NVME that contains an rsync of the RAID's native /boot directory. I copied the RAID's initrd-5.19.13-default over the freshly built and working NVME TW's 5.19.13 and tried booting the NVME TW using it. The result is similar to comment #0: halted in the dracut emergency shell, not because the ESP FAT serial number is missing, but because what's missing is the initrd's dracut/hooks/emergency UUID for the old NVME's EXT2 /boot rsync'd filesystem, which is represented in the RAID TW's /etc/fstab mounted noatime,noacl to /boot. Trying to boot the RAID TW using the NVME EXT2 or it native /boot also fails because what's missing is the initrd's dracut/hooks/emergency UUID for the old NVME's EXT2 /boot rsync'd filesystem. The fresh 5.19.13 initrd from NVME TW copied to the EXT2 and to the RAID /boot also produces failure for same reason attempted from EXT2, or failure to find RAID / attempted from RAID /boot (/etc/mdadm.conf absent from initrd). ToDo: rebuild the RAID TW's initrds and retest. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c3 --- Comment #3 from Felix Miata <mrmazda@earthlink.net> --- (In reply to Antonio Feijoo from comment #2)
I recommend you to create a new conf snippet with a higher number, because this will be overwritten after a dracut update.
I haven't touched it. Wouldn't changing it from "by-uuid" to "by-label" just change the usr/lib/dracut/hooks/emergency failure from a wrong ID to a wrong label? Is this recommendation just an aside for future use? I've now created 13-persistent-local.conf containing persistent_policy="by-label", but I have at least 40 more TW installations on other multiboot PCs subject to disk upgrades and restoring from backup. For / filesystem there is Grub linu line option root= to override initrd, but what is there for whatever this usr/lib/dracut/hooks/emergency is there for? Once a Grub menu selection has been made, there's no need for anything to read the ESP again before init completes (absent any encrypted filesystems), is there? What man page covers usr/lib/dracut/hooks/emergency? man /etc/dracut.conf.d/10-persistent_policy.conf is unhelpful. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c4 --- Comment #4 from Felix Miata <mrmazda@earthlink.net> --- (In reply to Felix Miata from comment #3)
What man page covers usr/lib/dracut/hooks/emergency? man /etc/dracut.conf.d/10-persistent_policy.conf is unhelpful.
Many many hours' work documenting and reporting here, and I just now found man dracut.cmdline. With rd.hostonly=0 the original initrds are usable. :p -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c7 --- Comment #7 from Felix Miata <mrmazda@earthlink.net> --- I still do not understand why failure to find a filesystem that won't be needed before switchroot is complete at the earliest can hang a boot in a dracut emergency shell. The non-intuitive extra complication restoring from backups or migrating disks is why I reported this. Yes, the workaround works, but why is it needed for boot to continue? Same question re root filesystem, since by default and tradition every bootloader stanza contains root= that overrides root= in initrd. Logic tells me these two dracut elements should be complementary fallbacks, not mandatory. It seems like otherwise rd.hostonly=0 should be included on every Grub linu line that contains root=. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c9 Felix Miata <mrmazda@earthlink.net> changed: What |Removed |Added ---------------------------------------------------------------------------- URL| |https://github.com/dracutde | |vs/dracut/issues/2044 --- Comment #9 from Felix Miata <mrmazda@earthlink.net> --- Summary of issue 2044: /boot/efi added as a hard dependency in hostonly mode #2044 So far, I'm its only watcher. :( (In reply to Antonio Feijoo from comment #8)
If you add this option in your first boot after restoring the backup and then regenerate the initrd from the new running system, you should not need it anymore.
As you should not need root= anymore, right? :) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c11 --- Comment #11 from Felix Miata <mrmazda@earthlink.net> --- (In reply to Felix Miata from comment #1)
ToDo: rebuild the RAID TW's initrds and retest.
I tried rebuilding initrds for the RAID installation several times and couldn't get the / filesystem to be found until I chrooted into it and dup'd from 20221008 to 20221205, which rebuilt the latest kernel at least 4 times. First and subsequent boots worked with the the last built whether starting from the ESP on NVME or from original MBR/EXT2 /boot/ filesystem on SATA. Only the first boot failure landed me in a dracut shell. All subsequent to beginning initrd rebuilds simply hung due to unlimited timeout failing to find / on /dev/md3, which turned out to be dracut was excluding 2/3 of the raid lines included in a working initrd. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c12 --- Comment #12 from Felix Miata <mrmazda@earthlink.net> --- When time came to go beyond 5.19.13 I found the RAID TW stopping in emergency shell unable to find the / filesystem on RAID. The key rdsosreport.txt message Andrei B pointed out was: "dracut-pre-trigger[373]: rd.md=0: removing MD RAID activation". After 6 unhelpful rebuilds of 6.0.12 with various modifications in /etc/dracut.conf.d/ I learned that use of rd.hostonly=0 must be coupled with rd.auto or rd.auto=1 to avoid halting in the emergency shell. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c13 --- Comment #13 from Felix Miata <mrmazda@earthlink.net> --- Created attachment 865661 --> http://bugzilla.opensuse.org/attachment.cgi?id=865661&action=edit /boot/grub2/custom.cfg on TW / filesystem from which all installations are booted through TW's Grub; and rdsosreport.txt This is nuts. Similar situation to comment #0 on different PC. I used ddrescue to clone from 120G NVME to 512G NVME. GPT partitions 7-16 each have a Linux OS, with TW on #7, and in control of UEFI boot. After the cloning I shutdown and removed the source. I then booted a live media in order to re-unique all UUIDs and volume labels before attempting to boot from the new NVME. 11 of the 12 installations boot as if nothing ever was changed. The oddball failing is TW. With the original initrd for its default 6.2.1 kernel, whether I have neither or either or both rd.hostonly=0 and rd.auto or rd.auto=1, boot halts in dracut shell after "reached target initrd root device". Last line prior to announcing generation of /run/initramfs/rdsosreport.txt is a warning that /dev/disk/by-label/ZM2P01ESP does not exist. That label is the obsolete LABEL from the old NVME. /etc/fstab has been completely adapted to the new LABELs and UUIDs, which means ZM2P01ESP is not coming from /etc/fstab. I tried twice to chroot from TW rescue CD 20230318 into TW to rebuild the default initrd. Using either result, boot fails much sooner, last line before emergency shell is "Starting dracut initqueue hook". Both ESP and / filesystems "does not exist". In this shell I'm unable to find any way to get any kind of filesystem mounted so that rdsosreport.txt can be captured for analysis, and there is nothing I can find in the shell from which to view any but its tail. grep, head and less are absent commands. /dev/nvme* and /dev/sd* do not exist. Even if I mknod /dev/sda and /dev/sda1 to try to mount a USB stick, no joy. All older kernels with original initrds, e.g. 6.1.12 or 6.0.12 or 5.19.13, stop @ "Reached target Basic System.", whether I have neither or either or both rd.hostonly=0 and rd.auto or rd.auto=1. Chrooting from 15.5 on nvme0n1p11 to TW on nvme0n1p7 I was able to install kernel-default-6.2.6, with result it boots normally. Rebuilding older initrds with mkinitrd while booted to 6.2.6 only succeeded for booting the 6.2.1 kernel. For 6.1.12, boot stops at "Starting dracut initqueue hook..." Apparently the rc.* workarounds no longer are functional. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c14 --- Comment #14 from Felix Miata <mrmazda@earthlink.net> --- It looks like I've yet to find comprehensible instructions for rebuilding an initrd with either mkinitrd or dracut from within chroot. Using lsinitrd grepping initrds for the obsolete ESP ID I also grepped for nvme, and found mismatches: e.g. 6.0.12 module for 6.1.12 initrd, 6.1.12 module for 6.2.1 initrd. In too many cases, man pages to me are worthless due to absent or inadequate examples, dracut and mkinitrd notably included. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205261 http://bugzilla.opensuse.org/show_bug.cgi?id=1205261#c15 Felix Miata <mrmazda@earthlink.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |malcolmlewis@linuxmail.org, | |nwr10cst-oslnx@yahoo.com --- Comment #15 from Felix Miata <mrmazda@earthlink.net> --- arvidjaar in forums[1] told someone (without any real detail) he needs to bind-mount /sys/firmware/efi/efivars to be able to chroot successfully for the purpose of initrd regeneration. I've never seen that direction before. Coming from him, I have to think it to be true. Could it explain why chroot-built initrds haven't worked for me? [1] <https://forums.opensuse.org/t/reinstalling-grub-after-losing-dual-boot/165169/6> -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com