[Bug 1227301] New: Kernel boot crashes on Thinkpad P14s Gen 3 AMD
https://bugzilla.suse.com/show_bug.cgi?id=1227301 Bug ID: 1227301 Summary: Kernel boot crashes on Thinkpad P14s Gen 3 AMD Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.6 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Xen Assignee: xen-bugs@suse.de Reporter: tiwai@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 875832 --> https://bugzilla.suse.com/attachment.cgi?id=875832&action=edit dmesg with crash of Leap 15.6 kernel When I boot a recent kernel (openSUSE Leap 15.6 or TW 6.9.x kernel) with Xen (Dom0) on the Company's standard laptop (Thinkpad P14s Gen 3 AMD), it crashes with kernel oops and couldn't proceed the boot. After skimming over the net, I found that it's crashing at loading ucsi_acpi driver, and blacklisting it indeed made it booting further. (As a result, it lacks of the touchpad and some USB stuff, though.) Below is a dmesg output after manually loading ucsi_acpi module. I checked with 6.9.7 TW backport kernel, and it hits the same problem. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c1 --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875833 --> https://bugzilla.suse.com/attachment.cgi?id=875833&action=edit dmesg from TW kernel -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c2 --- Comment #2 from Takashi Iwai <tiwai@suse.com> --- Related report on the net https://forum.qubes-os.org/t/kernel-panic-during-installation-on-lenovo-thin... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c3 Jürgen Groß <jgross@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|xen-bugs@suse.de |jgross@suse.com CC| |jgross@suse.com --- Comment #3 from Jürgen Groß <jgross@suse.com> --- Created attachment 875846 --> https://bugzilla.suse.com/attachment.cgi?id=875846&action=edit Debug patch Could you try to boot with the patch applied to your kernel? You'd need to add "xen_mc_debug" to the kernel commandline. The kernel log should have some more data narrowing down the root cause. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 Santiago Zarate <santiago.zarate@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |santiago.zarate@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c4 --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875854 --> https://bugzilla.suse.com/attachment.cgi?id=875854&action=edit dmesg from the patched 6.9.7 kernel -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c5 --- Comment #5 from Takashi Iwai <tiwai@suse.com> --- The above is the log from the patched kernel. At this time, it was called with nomodeset, but it shouldn't matter. The bug happens right after modprobe of ucsi_acpi module. As far as I understand, the second Oops ("BUG: unable to handle page fault for address: ffffc90040715100") happened at reading a byte value via ACPI_GET8(logical_addr_ptr) in acpi_ex_system_memory_space_handler(). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c6 --- Comment #6 from Jürgen Groß <jgross@suse.com> --- (In reply to Takashi Iwai from comment #5)
The above is the log from the patched kernel. At this time, it was called with nomodeset, but it shouldn't matter. The bug happens right after modprobe of ucsi_acpi module.
As far as I understand, the second Oops ("BUG: unable to handle page fault for address: ffffc90040715100") happened at reading a byte value via ACPI_GET8(logical_addr_ptr) in acpi_ex_system_memory_space_handler().
This is to be expected, as establishing the mapping did fail due to a negative return value from the hypervisor when trying to update a PTE. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c7 Jürgen Groß <jgross@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #875846|0 |1 is obsolete| | --- Comment #7 from Jürgen Groß <jgross@suse.com> --- Created attachment 875905 --> https://bugzilla.suse.com/attachment.cgi?id=875905&action=edit Debug patch V2 Second try with more data being printed in the error case. Can you please replace the first debug patch with this one? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c8 --- Comment #8 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875916 --> https://bugzilla.suse.com/attachment.cgi?id=875916&action=edit dmesg from the v2 patched 6.9.7 kernel -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c9 --- Comment #9 from Jürgen Groß <jgross@suse.com> --- (In reply to Takashi Iwai from comment #8)
Created attachment 875916 [details] dmesg from the v2 patched 6.9.7 kernel
Thanks, this is making things much more clear. Seems as if the kernel is trying to map part of the MSI space (physical address range 0xfee00000 - 0xfeeff000). When running as dom0 this should not happen, as the hypervisor is owning this region and will deny mapping it. Seems as if the ucsi driver needs to be made Xen aware. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c10 --- Comment #10 from Jürgen Groß <jgross@suse.com> --- Are you able to tell which I/O-resources are at physical address feec2000-feec2fff? Probably you should be able to find out when booting without Xen via "cat /proc/iomem" and/or "lspci -v". I'm pretty sure the region fee01000-feefffff should only be used as MSI space. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c11 --- Comment #11 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875936 --> https://bugzilla.suse.com/attachment.cgi?id=875936&action=edit logs from xen and normal boots -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c12 --- Comment #12 from Jürgen Groß <jgross@suse.com> --- There seems to be no BAR located in the area trying to be mapped. Could you please provide an acpidump? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c13 --- Comment #13 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875954 --> https://bugzilla.suse.com/attachment.cgi?id=875954&action=edit acpidump output -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1227301 https://bugzilla.suse.com/show_bug.cgi?id=1227301#c14 --- Comment #14 from Takashi Iwai <tiwai@suse.com> --- Created attachment 875955 --> https://bugzilla.suse.com/attachment.cgi?id=875955&action=edit hwinfo output -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com