[Bug 1143331] New: kernel-firmware-20190618-lp151.2.6.1 fails to boot
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 Bug ID: 1143331 Summary: kernel-firmware-20190618-lp151.2.6.1 fails to boot Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: hansper@t-online.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 812019 --> http://bugzilla.opensuse.org/attachment.cgi?id=812019&action=edit Screenshot of the system hanging (camera picture) Hi after upgrading to the latetest firmware on openSuSE 15.1 (firmware-20190618-lp151.2.6.1) my system is no longer able to boot. No hard disk are found anymore. Attached is a screenshot of the system hanging during boot. As soon as I revert the firmware back to kernel-firmware-20190312-lp151.2.3.1 I'm able to boot the latest (and previous) kernel(s) again. Older kernels fail to boot too, if the latest version of the firmware is installed. MD5sum of the faulty firmware is: 8093153dc5b29f3e1f8c0633ee9d24a2 Mainboard: Manufacturer: Gigabyte Technology Co., Ltd. Product Name: X99-UD4-CF RAM: 128GB RAM installed CPU: Intel(R) Core(TM) i7-6850K CPU Hard-Disks: 8 Disks installed, 4 SSDs, 4 HDDs Boot: Booting from /dev/md0, Raid1 device, 2 members This seems to be a rather nasty bug, as all BIOS settings will be set to their respective defaults, after the system hangs. After a hard reset (only thing that seems to work) the Mainboard BIOS says it experienced boot failure and loads factory defaults (and deletes all saved profiles...). Yours, Jochen -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P1 - Urgent -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hansper@t-online.de, | |tiwai@suse.com Flags| |needinfo?(hansper@t-online. | |de) --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Could you give hwinfo output from the running system and upload to Bugzilla? The boot snapshot doesn't tell us much other than the root device wasn't found... Possibly uuid might have changed by some reason? For further testing, please create a backup of initrd manually. You can choose it by editing GRUB setup at boot time for recovery. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c2 --- Comment #2 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812134 --> http://bugzilla.opensuse.org/attachment.cgi?id=812134&action=edit hwinfo -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c3 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #3 from Jochen Hansper <hansper@t-online.de> --- I've uploaded hwinfo. The UUID of /dev/md0 (boot device) did not change. In the screenshot it's shown that it is looking for UUID=3ed8a4ea... which is what is set in /etc/fstab: --- /dev/disk/by-uuid/3ed8a4ea-... / ext4 noatime,nodiratime,acl,user_xattr 1 1 --- Also, it looks like it can't find any partitions or disks, at least none are listed in the screenshot. I've created a backup of the working initrd. Booting using the vanilla kernel works, too. Thanks for getting back to me! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c4 --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- Thanks. I couldn't find anything obvious that may trigger the issue from the hwinfo for now, unfortunately. Could you compare the contents of both good and non-working initrds? For expanding the initrd, you can do like: cd /somewhere /usr/lib/dracut/skipcpio /boot/initrd-xxx | xz -cd | cpio -i --make-directories -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c5 --- Comment #5 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812143 --> http://bugzilla.opensuse.org/attachment.cgi?id=812143&action=edit Listing of working (good) initrd -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c6 --- Comment #6 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812144 --> http://bugzilla.opensuse.org/attachment.cgi?id=812144&action=edit Listing of failing (bad) initrd -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c7 --- Comment #7 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812145 --> http://bugzilla.opensuse.org/attachment.cgi?id=812145&action=edit diff of good and bad initrd diff --brief --recursive good/ bad/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c8 --- Comment #8 from Takashi Iwai <tiwai@suse.com> --- (In reply to Jochen Hansper from comment #7)
Created attachment 812145 [details] diff of good and bad initrd
diff --brief --recursive good/ bad/
OK, the interesting bit here are: Files good/lib/firmware/amdgpu/polaris10_k2_smc.bin and bad/lib/firmware/amdgpu/polaris10_k2_smc.bin differ Files good/lib/firmware/amdgpu/polaris10_k_mc.bin and bad/lib/firmware/amdgpu/polaris10_k_mc.bin differ .... These are AMDGPU firmware updates. Only in good/lib/modules/4.12.14-lp151.28.10-default/kernel/drivers: nvme Only in good/lib/modules/4.12.14-lp151.28.10-default/kernel/drivers/scsi: ufs ... and these are missing modules. I don't know how the latter happens just with kernel-firmware update. Is your machine with NVMe? I thought it isn't. If it's about AMDGPU, you can reproduce it easily by replacing /lib/firmware/amdgpu/* files and rebuilding initrd. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c9 --- Comment #9 from Jochen Hansper <hansper@t-online.de> --- I'll try rebuilding the new initrd with the old AMDGPU firmware files, as you suggested. I can't reboot the machine right now, I'll try it later today or tomorrow and report back. There is no NVMe installed in my machine. The hardware hasn't been changed for at least two months. Prior to the kernel and firmware updates, no configuration was significantly changed. I've just installed said updates, and it failed. I've installed the updates on other machines, without issues. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c10 --- Comment #10 from Jochen Hansper <hansper@t-online.de> --- I did as you suggested, with the following results: 1) Installing the latest firmware and replacing all files in /lib/firmware/amdgpu with those found in version 20190312 and regenerating initrd let's me boot my machine without hiccups. So I guess, something is wrong with the amdgpu firmware :-) 2) Feeling lucking, I replaced all files in amdgpu with those found in version 20190712 from the tumbleweed repo. This time around, the machine does not completely crash and it seems to continue with booting up to a certain point. Unfortunately, the screens stay completely blank and ctrl+alt+del does not work, but pressing Num-Lock works (which didn't work in 20190618). Also, I couldn't find a log file for this failed boot attempt. I've now installed version 20190618 again, and replaced the amdgpu files with those in 20190312. GPU Details: 8GB PowerColor Radeon RX Vega 56 Red Dragon Aktiv PCIe 3.0 x16 2xDisplayPort / 2xHDMI (Retail) I've checked the PowerColor homepage, but there seem to be no BIOS updates available for that card (I don't know if that could have helped). If you need further details, I'll be glad to provide them. Many thanks for your help so far! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c12 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #12 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812635 --> http://bugzilla.opensuse.org/attachment.cgi?id=812635&action=edit dmesg -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c13 --- Comment #13 from Jochen Hansper <hansper@t-online.de> --- Added dmesg output directly after boot of kernel 4.12.14-lp151.28.10-default with amdgpu firmware version 20190312 with additional command line parameter firmware_class.dyndbg=+p -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c14 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(hansper@t-online. | |de) --- Comment #14 from Takashi Iwai <tiwai@suse.com> --- Thanks! Then the problem is likely the dup of the upstream report, and the problematic firmware is vega10_sos.bin: https://bugs.freedesktop.org/show_bug.cgi?id=110733 I'm trying to build a test kernel package containing the patch suggested in the bug entry above. It's being built in OBS home:tiwai:bsc1143331. Could you give it a try later with old and new f/w files? It'll take some time until the build finishes, maybe an hour or so. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c15 --- Comment #15 from Takashi Iwai <tiwai@suse.com> --- (In reply to Takashi Iwai from comment #14)
I'm trying to build a test kernel package containing the patch suggested in the bug entry above. It's being built in OBS home:tiwai:bsc1143331.
The package will be available later at http://download.opensuse.org/repositories/home:/tiwai:/bsc1143331/standard/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c16 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #16 from Jochen Hansper <hansper@t-online.de> --- Thanks, your help is much appreciated! I've tried with kernel-default-4.12.14-lp151.1.1.g8560b19.x86_64.rpm from your repository, with both the old and new firmware versions. 1) Booting with the old firmware version works fine. I'll upload the dmesg output (+ firmware_class.dyndgb=+p). 2) Booting with the new firmware version fails, unfortunately. It's a bit different from the standard kernel though: The screens remain completely blank (with the standard kernel, some output is shown). The system hangs almost immediately. I could not find any log files for the failed attempt. If you want to try something else, I don't mind giving it a go. Cheers, Jochen -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c17 --- Comment #17 from Jochen Hansper <hansper@t-online.de> --- Created attachment 812655 --> http://bugzilla.opensuse.org/attachment.cgi?id=812655&action=edit dmesg kernel bsc1143331, old firmware files -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c21 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #21 from Jochen Hansper <hansper@t-online.de> --- I've tried with kernel-firmware-20190618-lp151.2.9.1 from your repository at http://download.opensuse.org/repositories/home:/tiwai:/branches:/openSUSE:/L... and it works flawlessly. I've compared /amdgpu from 2.9.1 with the that from 2.6.1 and only vega10_sos.bin has changed (back to 2.3.1), as you said. Should you need someone to give future amdgpu firmware versions a test run, I'll gladly do that. Thanks again for your help and fast support! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c22 --- Comment #22 from Jochen Hansper <hansper@t-online.de> --- Created attachment 813542 --> http://bugzilla.opensuse.org/attachment.cgi?id=813542&action=edit dmesg output from fw 2.9.1, firmware_class.dyndbg=+p -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c23 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #23 from Takashi Iwai <tiwai@suse.com> --- Great, thanks for quick testing. I submitted the fixed kernel-firmware package now. It'll be processed and released as the update package some time later. Please keep my test package until then. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c37 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #37 from Jochen Hansper <hansper@t-online.de> --- Hi after the recent kernel-firmware update to kernel-firmware-20191118 this bug seems to be back again. The symptoms are the same as in my initial bug report, but this time no display output is displayed at all. Reverting back to a previous kernel-firmware fixes it. Yours, Jochen -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c38 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(hansper@t-online. | |de) --- Comment #38 from Takashi Iwai <tiwai@suse.com> --- It's a pity that the amdgpu is still broken. OK, I'll work on reverting the corresponding firmware. To be sure: exactly which kernel package does it work on yours? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c39 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #39 from Jochen Hansper <hansper@t-online.de> --- I could be wrong here, but I believe this was never actually fixed for openSuSE 15.1. The update repository offers the following kernel-firmware files: 27. Jun 2019 kernel-firmware-20190312-lp151.2.3.1.noarch.rpm 21. Jul 2019 kernel-firmware-20190618-lp151.2.6.1.noarch.rpm 14. Jan 00:14 kernel-firmware-20191118-lp151.2.9.1.noarch.rpm I've currently reinstalled kernel-firmware-20190312 which works. Your kernel-firmware-20190618-lp151.2.9.1 (20190618 2.9.1 !) is not available in the update repositories. I was running with your kernel-firmware-20190618-lp151.2.9.1 until I installed kernel-firmware-20191118 which does not work for me. I could try replacing vega10_sos.bin in 20191118 with that from 20190312 and see whether that helps as it did before. I'll probably be able to do that over the weekend, if you think that might be useful. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c40 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(hansper@t-online. | |de) --- Comment #40 from Takashi Iwai <tiwai@suse.com> --- Thanks, I'm working on the revert. Meanwhile, could you test the latest kernel-firmware package in OBS Kernel:HEAD repo? This contains yet newer amdgpu firmware, and I'd like to confirm that it's still broken. Also, I'm building the kernel-firmware package that reverts the vega10_sos.bin again. It's being built in OBS home:tiwai:branches:openSUSE:Leap:15.1:Update/kernel-firmware repo. Please give it a try if the package in OBS Kernel:HEAD still doesn't work. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c41 --- Comment #41 from Takashi Iwai <tiwai@suse.com> --- Jochen, could you check the new test package, please? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c42 Jochen Hansper <hansper@t-online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hansper@t-online. | |de) | --- Comment #42 from Jochen Hansper <hansper@t-online.de> --- Hi Takashi, sorry for the late reply, but I was feeling a bit under the weather last week. I've now tried with both kernel-firmware versions: OBS:Head kernel-firmware-20200122-299.1.noarch.rpm still does not work for me. No output is displayed at all, my machine freezes instantly. Your package kernel-firmware-20200107-lp151.2.12.1.noarch.rpm from OBS:home:tiwai:branches:openSUSE:Leap:15.1:Update works perfectly. In your build, vega10_sos.bin is back to the same version as it was in kernel-firmware-20190312-lp151.2.3.1. I've compared the firmware in amdgpu from your build with the files from OBS:head: There are a couple of files in OBS:home:tiwai that differ from those in OBS:head: 4 files for navi10 and vega10_sos.bin (as mentioned above). File navi10_ta.bin is only in OBS:head but not in your build. Thanks for your help, I'll keep the kernel-firmware from OBS:home:tiwai installed for now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1143331 http://bugzilla.opensuse.org/show_bug.cgi?id=1143331#c43 --- Comment #43 from Takashi Iwai <tiwai@suse.com> --- Thank you for the confirmation! I'm going to push the fix now. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com