[Bug 962694] New: Upgrade to kernel 4.4 will break the system if it depends on mtp2sas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694 Bug ID: 962694 Summary: Upgrade to kernel 4.4 will break the system if it depends on mtp2sas Classification: openSUSE Product: openSUSE Tumbleweed Version: 2015* Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: ronisbr@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Hi guys! Today I upgraded my HP Workstation to latest Tumbleweed snapshot. In this time, the kernel was upgrade from 4.3.3 to 4.4.0. When I booted, I saw the following messages: [ TIME ] Timed out waiting for device dev-disk-by\x2... [DEPEND] Dependency failed for Resume from hibernation using device /dev/disk/... and I could not boot into my system. Everything works if I boot in old 4.3.3 kernel. The mentioned partition is my swap. If I remove the swap from fstab and disable the "resume", then the boot process just hangs in the very beginning without any output message. After digging, I found a very tricky problem. My workstation depends on the module mpt2sas, which was merged into mpt3sas in kernel 4.4 [1]. Thus, the openSUSE 4.4.0 kernel has mpt3sas module but does not have mpt2sas module. When the new kernel is installed and mkinitrd creates the initramfs, I think it looks the loaded modules to include the necessary ones into the image but mpt2sas does not exists in 4.4.0. Hence, neither mpt2sas nor mpt3sas is added to the image. Using lsinitrd, I could confirm that the initramfs of kernel 4.3.3 contains the modules mpt2sas and raid_class whereas the initramfs of kernel 4.4.0 does not. This is what was causing the boot problem. I managed to fix it by adding the following line to /etc/sysconfig/kernel : INITRD_MODULES="mpt3sas raid_class" and rebuilding the initramfs using mkinitrd. After that, the system became bootable again. This bug should affect all the systems that relies on mpt2sas that will have a kernel upgrade from a version prior to 4.4 to the current Tumbleweed kernel. Thus, I presume that the upgrade Leap -> Tumbleweed will yield an unbootable system in such computers. ---------- [1] https://groups.google.com/forum/#!topic/linux.kernel/EdYQpd_ozw4 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c1
Jan Engelhardt
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c2
--- Comment #2 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c3
Takashi Iwai
mpt3sas does have the PCI IDs of the former mpt2sas. Secondly, mpt3sas also has the "mpt2sas" as an alias. That's two ways dracut ought to have found the replacement with /lib/modules/4.4* evne while running under a 4.3 environment, but since it did not, I am going to point the finger to dracut ;-)
Yeah, this looks so. below is the code snippet: module_is_host_only() { ..... [[ "$kernel_current" ]] || export kernel_current=$(uname -r) if [[ "$kernel_current" != "$kernel" ]]; then # check if module is loadable on the current kernel # this covers the case, where a new module is introduced # or a module was renamed # or a module changed from builtin to a module --> if [[ -d /lib/modules/$kernel_current ]]; then --> # if the modinfo can be parsed, but the module --> # is not loaded, then we can safely return 1 --> modinfo -F filename "$_mod" &>/dev/null && return 1 --> fi # just install the module, better safe than sorry return 0 fi The part marked is the problem. In this case, mpt3sas module does exist in 4.3 kernel but it isn't loaded because 4.3 driver doesn't support yet mpt2sas devices. dracut still believes it's safe to ignore. Oops. That said, dracut can't handle the case where a module is folded into another. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c4
Fabian Vogt
The part marked is the problem. In this case, mpt3sas module does exist in 4.3 kernel but it isn't loaded because 4.3 driver doesn't support yet mpt2sas devices. dracut still believes it's safe to ignore. Oops.
That said, dracut can't handle the case where a module is folded into another.
Jup, the module handling (the "hostonly" part) logic is totally broken and flawed for kernel migrations. https://github.com/haraldh/dracut/commit/07a081f352497258862ae164d11d9e6dc2c... solved some of the issues, but most still remain... The source of this issue is that
# check if module is loaded [[ ${host_modules["$_modenc"]} ]] && return 0
does not trigger for mpt3sas, as host_modules does not contain aliases. I'd say that it should be replaced by
# check if module could be loaded. # If not, it's either new or already loaded -> install it! [[ $(/usr/sbin/modprobe -nv "$_mod" 2>/dev/null | wc -l) -eq 0 ]] && return 0
Adding aliases to $host_modules would probably work as well, but that would require parsing the output of modinfo for each module. This way is IMO more obvious and easier. Ronan: Can you test the above? Simply replace the lines in /usr/lib/dracut/dracut-init.sh:1029 and run mkinitrd from the 4.3.3 kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c5
--- Comment #5 from Ronan Chagas
# check if module could be loaded. # If not, it's either new or already loaded -> install it! [[ $(/usr/sbin/modprobe -nv "$_mod" 2>/dev/null | wc -l) -eq 0 ]] && return 0
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c6
--- Comment #6 from Fabian Vogt
# also add aliases of loaded modules for mod in "${!host_modules[@]}"; do for alias in $(modinfo -k "$kernel" -F alias "$mod"); do host_modules["$alias"]=1 done done
in /usr/bin/dracut:1223. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c7
--- Comment #7 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c8
--- Comment #8 from Fabian Vogt
modinfo mpt2sas filename: /lib/modules/4.4.0-1-default/kernel/drivers/scsi/mpt3sas/mpt3sas.ko alias: mpt2sas
it does not add mpt3sas as an alias, so that has to be recognized as well... Next try:
# also add aliases of loaded modules for mod in "${!host_modules[@]}"; do for alias in $(modinfo -k "$kernel" -F alias "$mod"); do host_modules["$alias"]=1 done # mod might be an alias itself, find the real module host_modules["$(basename -s .ko "$(modinfo "$mod" -F filename)")"]=1 done
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c9
--- Comment #9 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c10
--- Comment #10 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c11
--- Comment #11 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c12
--- Comment #12 from Fabian Vogt
Yes, if I explicit set this, then it works. Good, so something is not quite right with my patch.
One question: since I am in a 4.3.3 env, then modinfo will not show that mpt3sas has mpt2sas alias right?
Yup, that's what "-k $kernel" is for. Can you provide the output of
modinfo -k 4.4.0-1-default mpt2sas and modinfo -k 4.4.0-1-default mpt3sas
? The first few lines are enough. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c13
--- Comment #13 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c14
--- Comment #14 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c15
--- Comment #15 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c16
--- Comment #16 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c17
--- Comment #17 from Fabian Vogt
# also add aliases of loaded modules for mod in "${!host_modules[@]}"; do aliases=$(modinfo -k "$kernel" -F alias "$mod" 2>&1) [ $? -ne 0 ] && continue for alias in $aliases; do host_modules["$alias"]=1 done # mod might be an alias itself, find the real module host_modules["$(basename -s .ko "$(modinfo "$mod" -F filename)")"]=1 done
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c18
--- Comment #18 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c19
--- Comment #19 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c20
--- Comment #20 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c21
--- Comment #21 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c22
--- Comment #22 from Ronan Chagas
http://bugzilla.opensuse.org/show_bug.cgi?id=962694
http://bugzilla.opensuse.org/show_bug.cgi?id=962694#c23
Fabian Vogt
participants (1)
-
bugzilla_noreply@novell.com