[Bug 1189776] New: system fails to boot because mkinitrd fails to include nvme driver
https://bugzilla.suse.com/show_bug.cgi?id=1189776 Bug ID: 1189776 Summary: system fails to boot because mkinitrd fails to include nvme driver Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Linux Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: ohering@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- orthos host virt158. Installing SLE15SP3 works fine, nvme.ko is included. Installing TMBLEWEED 20210823 fails, nmve.ko is not included in initrd. No obvious error from "mkinitrd", it does not report the to-be-included drivers anyway. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 Chenzi Cao <chcao@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|screening-team-bugs@suse.de |dracut-maintainers@suse.de -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c1 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED CC| |antonio.feijoo@suse.com --- Comment #1 from Antonio Feijoo <antonio.feijoo@suse.com> --- Confirmed bug in dracut 055 and upstream, regardless of kernel version (5.3, 5.14): it fails if the system has only nvme block devices. In version 049, dracut installs always a bunch of block device drivers, but in version 055 it performs several checks to avoid always installing all of them. Specifically, it is failing the check_block_and_slaves_all function in dracut-functions.sh Working to fix it. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c2 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ohering@suse.com Flags| |needinfo?(ohering@suse.com) --- Comment #2 from Antonio Feijoo <antonio.feijoo@suse.com> ---
Specifically, it is failing the check_block_and_slaves_all function in dracut-functions.sh
I was wrong about this guess. Since a certain upstream commit [1], the function to detect a block device changed from: --> [ -e /sys/dev/block/$1 ] && return 0 --< to --> for _mod in $(get_dev_module /dev/block/$1); do --< and the get_dev_module relies on udevadm, which fails to detect the nvme driver in SLE-15 SP3: virt158:~ # ls -l /dev/block total 0 lrwxrwxrwx 1 root root 10 Oct 5 05:53 259:1 -> ../nvme0n1 lrwxrwxrwx 1 root root 12 Oct 5 05:53 259:2 -> ../nvme0n1p1 lrwxrwxrwx 1 root root 12 Oct 5 05:53 259:3 -> ../nvme0n1p2 lrwxrwxrwx 1 root root 12 Oct 5 05:53 259:4 -> ../nvme0n1p3 lrwxrwxrwx 1 root root 12 Oct 5 05:53 259:5 -> ../nvme0n1p4 lrwxrwxrwx 1 root root 12 Oct 5 05:53 259:6 -> ../nvme0n1p5 virt158:~ # udevadm info -a "/dev/block/259:5" | sed -n 's/\s*DRIVERS=="\(\S\+\)"/\1/p' virt158:~ # udevadm info -a "/dev/block/259:2" | sed -n 's/\s*DRIVERS=="\(\S\+\)"/\1/p' virt158:~ # udevadm info -a "/dev/block/259:3" | sed -n 's/\s*DRIVERS=="\(\S\+\)"/\1/p' virt158:~ # udevadm --version 246 But for me the check works in Tumbleweed and the initrd includes the nvme driver: dev@localhost:~/src/dracut/test> ls -l /dev/block lrwxrwxrwx 1 root root 10 Oct 6 09:21 259:0 -> ../nvme0n1 lrwxrwxrwx 1 root root 12 Oct 6 09:21 259:1 -> ../nvme0n1p1 dev@localhost:~/src/dracut/test> udevadm info -a "/dev/block/259:1" | sed -n 's/\s*DRIVERS=="\(\S\+\)"/\1/p' nvme dev@localhost:~/src/dracut/test> udevadm --version 249 Please, could you check if this still fails with the latest snapshot of Tumbleweed? [1] https://github.com/dracutdevs/dracut/commit/6375d5d504c5eac1cc5e7d7e26a8643b... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c3 --- Comment #3 from Olaf Hering <ohering@suse.com> --- Currently not possible to install TMBLWEED: error: ../../grub-core/loader/i386/efi/linux.c:120:can't allocate initrd. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c4 Olaf Hering <ohering@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(ohering@suse.com) | --- Comment #4 from Olaf Hering <ohering@suse.com> --- bug#1191378 to track the malloc error. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c5 --- Comment #5 from Antonio Feijoo <antonio.feijoo@suse.com> --- It's strange an initrd with 133 MB. While I was testing on this machine, the size of the generated initrds was around 14 MB. Are you avoiding host-only mode? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c6 --- Comment #6 from Olaf Hering <ohering@suse.com> --- This is the install initrd. It grew from a few dozen MB a decade ago to 133mb. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c7 --- Comment #7 from Olaf Hering <ohering@suse.com> --- I did a dup in chroot, no nvme is included. Hopefully dracut makes no assumptions about the running kernel and the target kernel. In the old days mkinitrd reliably detected the required storage devices via cd -P /sys/block/nvme0c0n1/device/ cat device/modalias case "$PWD" in $stuff esac The chroot is still active, see yourself. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c8 --- Comment #8 from Antonio Feijoo <antonio.feijoo@suse.com> --- Created attachment 852966 --> https://bugzilla.suse.com/attachment.cgi?id=852966&action=edit virt158 udevadm output You forgot to bind /dev/block. Anyway, after doing it, dracut does not include the nvme driver because udevadm does not find it (see attached file). I don't know if it needs to bind something else to simulate the system, but in my local environment udevadm v249 is able to detect the nvme driver (see comment #2). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c9 --- Comment #9 from Olaf Hering <ohering@suse.com> --- weird, please force the required drivers into initrd in the TMBLWEED partition, then we can actually boot into this partition and see how a Tumbleweed system behaves. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c10 --- Comment #10 from Antonio Feijoo <antonio.feijoo@suse.com> ---
weird, please force the required drivers into initrd in the TMBLWEED partition, then we can actually boot into this partition and see how a Tumbleweed system behaves.
Done. virt158:/root # lsinitrd /boot/initrd-5.14.6-2-default | grep ko | grep nvme -rw-r--r-- 1 root root 69248 Oct 4 02:57 usr/lib/modules/5.14.6-2-default/kernel/drivers/nvme/host/nvme-core.ko.xz -rw-r--r-- 1 root root 26364 Oct 4 02:57 usr/lib/modules/5.14.6-2-default/kernel/drivers/nvme/host/nvme.ko.xz -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c11 --- Comment #11 from Olaf Hering <ohering@suse.com> --- It runs now TMBLWEED, but boots only with 'add_drivers+=" nvme "'. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c12 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |systemd-maintainers@suse.de Flags| |needinfo?(systemd-maintaine | |rs@suse.de) --- Comment #12 from Antonio Feijoo <antonio.feijoo@suse.com> --- Frank, does this issue sound familiar to you? See comments #2 and #8. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c13 --- Comment #13 from Antonio Feijoo <antonio.feijoo@suse.com> --- It seems this bug is duplicated. See bug#1180494 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c14 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fbui@suse.com Flags|needinfo?(systemd-maintaine | |rs@suse.de) | --- Comment #14 from Franck Bui <fbui@suse.com> --- No sorry it doesn't ring me a bell. Can I have access to the TW system which is affected ? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c15 --- Comment #15 from Olaf Hering <ohering@suse.com> --- (In reply to Franck Bui from comment #14)
Can I have access to the TW system which is affected ?
See comment#0 about which system is affected. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c16 --- Comment #16 from Franck Bui <fbui@suse.com> --- (In reply to Olaf Hering from comment #15)
See comment#0 about which system is affected.
The credentials are missing and I tried the usual "default" passwords for the orthos machines but with no luck. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c17 --- Comment #17 from Olaf Hering <ohering@suse.com> --- (In reply to Franck Bui from comment #16)
The credentials are missing and I tried the usual "default" passwords for the orthos machines but with no luck.
Indeed, it is root/suse -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c18 --- Comment #18 from Franck Bui <fbui@suse.com> --- (In reply to Olaf Hering from comment #17)
Indeed, it is root/suse
Still doesn't work when I tried to log in via ssh: $ ssh root@virt158.devlab.prv.suse.com Can you please make accesses through ssh work ? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c19 --- Comment #19 from Olaf Hering <ohering@suse.com> --- The culprit is "PermitRootLogin prohibit-password", changed it to "yes" -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c20 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(ohering@suse.com) --- Comment #20 from Franck Bui <fbui@suse.com> --- (In reply to Olaf Hering from comment #0)
Installing SLE15SP3 works fine, nvme.ko is included.
Could you show the output of 'ls -l /sys/dev/block/' for SP3 ? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c21 Olaf Hering <ohering@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(ohering@suse.com) | --- Comment #21 from Olaf Hering <ohering@suse.com> --- virt158:~ # ls /sys/block/ nvme0c0n1 nvme0n1 virt158:~ # ls /sys/dev/block/ 259:1 259:2 259:3 259:4 259:5 259:6 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c22 --- Comment #22 from Olaf Hering <ohering@suse.com> --- 259:1 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1 259:2 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p1 259:3 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p2 259:4 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p3 259:5 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p4 259:6 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p5 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c23 Franck Bui <fbui@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jdelvare@suse.com Flags| |needinfo?(jdelvare@suse.com | |) --- Comment #23 from Franck Bui <fbui@suse.com> --- These symlinks in /sys/dev/block, used by udevadm to find the actual device and that point to devices in /sys/devices/virtual/nvme-subsystem, are actually the problem because none of the paths below nvme-susbsystem has the "driver" attribute. My VM running TW with an emulated nvme device has the symlinks in /sys/dev/block pointing to the "real" device instead: # ls -l /sys/dev/block/ total 0 lrwxrwxrwx 1 root root 0 Oct 13 14:59 259:0 -> ../../devices/pci0000:00/0000:00:06.0/nvme/nvme0/nvme0n1 lrwxrwxrwx 1 root root 0 Oct 13 14:59 259:1 -> ../../devices/pci0000:00/0000:00:06.0/nvme/nvme0/nvme0n1/nvme0n1p1 and paths below /sys/devices/pci0000:00/0000:00:06.0 do have the "driver" attribute. Now I don't know why the kernel sometimes chooses the paths below /sys/device/virtual/nvme-susbsystem and sometimes it prefers /sys/devives/pci* but the interface exposed to userspace doesn't seem consistent and reliable. Jean, could you shed some light please ? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c24 --- Comment #24 from Jean Delvare <jdelvare@suse.com> --- I asked upstream for an explanation of why NVME device sysfs paths are different from one system to the next. You can read the discussion here: https://lore.kernel.org/linux-nvme/b98f6062f59c1c1cfc4a200de83e4e244efbffbf.... I'm not going to claim I understand all the technical details, but my understanding is that the difference is there to stay and user-space will have to deal with it. Martin Wilck's answer actually includes a suggestion of how this could be done, hopefully that will help. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c25 --- Comment #25 from Antonio Feijoo <antonio.feijoo@suse.com> --- Thanks for your feedback. Actually we've submitted a dracut PR which would solve this issue following symlinks. https://github.com/dracutdevs/dracut/pull/1626 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c26 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |IN_PROGRESS Flags|needinfo?(jdelvare@suse.com | |) | --- Comment #26 from Antonio Feijoo <antonio.feijoo@suse.com> --- (In reply to Antonio Feijoo from comment #25)
Thanks for your feedback. Actually we've submitted a dracut PR which would solve this issue following symlinks.
The patch was accepted. The fix is in progress. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c27 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |elantea@runbox.com --- Comment #27 from Antonio Feijoo <antonio.feijoo@suse.com> --- *** Bug 1180494 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 Frank Kr�ger <fkrueger@mailbox.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fkrueger@mailbox.org -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 Martin Jambor <mjambor@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mjambor@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 https://bugzilla.suse.com/show_bug.cgi?id=1189776#c29 Antonio Feijoo <antonio.feijoo@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #29 from Antonio Feijoo <antonio.feijoo@suse.com> --- The fix for this bug is included since dracut-055+suse.152.g9d554c37 (TW and SLE-15-SP4), so we can close it. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1189776 Jeffrey Cheung <jcheung@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jcheung@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com