[Bug 1190434] New: crash is unable to find debuginfo of usrmerged kernel
https://bugzilla.suse.com/show_bug.cgi?id=1190434 Bug ID: 1190434 Summary: crash is unable to find debuginfo of usrmerged kernel Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: fvogt@suse.com QA Contact: qa-bugs@suse.de CC: dmair@suse.com, ptesarik@suse.com Found By: --- Blocker: --- With a usrmerged kernel, vmlinux(.xz) is in the modules directory for the kernel: /usr/lib/modules/5.14.1-1-default/vmlinux.xz This means the debug info ends up here: /usr/lib/debug/usr/lib/modules/5.14.1-1-default/vmlinux.debug Crash only looks at the old location though, and fails: please wait... (uncompressing /usr/lib/modules/5.14.1-1-default/vmlinux.xz) readmem: read_diskdump() NOTE: gnu_debuglink file: vmlinux.debug crc32: f73ed3b0 /usr/lib/modules/5.14.1-1-default//vmlinux.debug: not readable/found /usr/lib/modules/5.14.1-1-default//.debug/vmlinux.debug: not readable/found /usr/lib/debug/boot/vmlinux.debug: not readable/found crash: /var/tmp/vmlinux.xz_DZPxr5: no debugging data available crash: vmlinux.debug: debuginfo file not found crash: either install the appropriate kernel debuginfo package, or copy vmlinux.debug to this machine -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lnussel@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 Ludwig Nussel <lnussel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1029961 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c2 --- Comment #2 from Fabian Vogt <fvogt@suse.com> --- Ping - any news here? The suggested SR was declined. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c3 --- Comment #3 from Fabian Vogt <fvogt@suse.com> --- Ping again. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c4 --- Comment #4 from Fabian Vogt <fvogt@suse.com> --- Ping. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c5 --- Comment #5 from Ludwig Nussel <lnussel@suse.com> --- looks like I rebased the patch and made it conditional but looks like I didn't submit it again. Not sure what happened. Anyway https://build.opensuse.org/request/show/966765 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c6 --- Comment #6 from openQA Review <openqa-review@suse.de> --- This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: jeos-extra@64bit_virtio-2G https://openqa.opensuse.org/tests/2303420#step/kdump_and_crash/1 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234` Expect the next reminder at the earliest in 28 days if nothing changes in this ticket. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c7 --- Comment #7 from openQA Review <openqa-review@suse.de> --- This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: jeos-extra@64bit_virtio-2G https://openqa.opensuse.org/tests/2398376#step/kdump_and_crash/1 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234` Expect the next reminder at the earliest in 56 days if nothing changes in this ticket. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 David Mair <dmair@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c17 --- Comment #17 from David Mair <dmair@suse.com> --- I reproduced the following with Tumbleweed's current crash version 7.3.1 and kernel 6.0.8-1:
Dwarf error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)
I'm working on a crash 8.0.2 upgrade and as I expected, that error is resolved with crash 8.0.2 and kernel 6.0.8-1 vmcore. However, I do have a subsequent problem where crash can't read the linux banner from core memory leading to a claimed mismatched vmcore and vmlinux even though they have filenames with the same version number. Still working on it... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c18 --- Comment #18 from David Mair <dmair@suse.com> --- (In reply to David Mair from comment #17)
I reproduced the following with Tumbleweed's current crash version 7.3.1 and kernel 6.0.8-1:
Dwarf error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)
I'm working on a crash 8.0.2 upgrade and as I expected, that error is resolved with crash 8.0.2 and kernel 6.0.8-1 vmcore. However, I do have a subsequent problem where crash can't read the linux banner from core memory leading to a claimed mismatched vmcore and vmlinux even though they have filenames with the same version number.
The good news is that new problem of mine is known to upstream crash and has an accepted patch. I'm trying it today. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c19 David Mair <dmair@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo? --- Comment #19 from David Mair <dmair@suse.com> --- (In reply to David Mair from comment #18)
(In reply to David Mair from comment #17)
I reproduced the following with Tumbleweed's current crash version 7.3.1 and kernel 6.0.8-1:
Dwarf error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)
I'm working on a crash 8.0.2 upgrade and as I expected, that error is resolved with crash 8.0.2 and kernel 6.0.8-1 vmcore. However, I do have a subsequent problem where crash can't read the linux banner from core memory leading to a claimed mismatched vmcore and vmlinux even though they have filenames with the same version number.
The good news is that new problem of mine is known to upstream crash and has an accepted patch. I'm trying it today.
Well, I'm afraid not... I have built a version of crash that appears to support dwarf 5, crash 8.0.2. However, it fails verifying the kernel and coredump are the same kernel (6.0.x) version. The upstream fix for the reported error message ("linux_banner is an invalid address) is both of: 1) Already present in the source of crash I'm building; and 2) Was a change to support all representations of a symbol being present in the initialized data section of the kernel binary (the patch added 'd' location type support to existing 'D' location type only). BUT, when I dump the symbols from the kernel binary it is in the 'D' initialized data section anyway so the patch wasn't needed for the Tumbleweed kernel (the difference occurs based on the compiler used). It should be noted that when creating a coredump on Tumbleweed (kernel 6.0.8 and 6.0.10), makedumpfile reports that "this kernel isn't supported by makedumpfile" and if my end-result in crash is that a valid symbol in the kernel binary named linux_banner has a Kernel Address in the coredump, accessed via use of KVADDR() in crash, that is considered an invalid address and fails, preventing the verification that the linux version in the binary and coredump are the same then I offer the opinion that makedumpfile appears to be not up-to-date for kernel 6. As observed when using a crash version in a home project that is up-to-date for kernel 6. Unless someone corrects my explanation of my opinion of the current error I observe in crash 8.0.2 then I believe at least makedumpfile needs to be fixed as well as crash to wholly resolve the reported problem. Input welcomed... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c20 --- Comment #20 from David Mair <dmair@suse.com> --- I have a home project version of crash that appears to decode dwarf 5 debug file format. However, I don't believe it is enough to resolve the problem using coredumps on Tumbleweed. Using kernel 6.0.10-1-default when I trigger a coredump creation then during the creation of the coredump the following is displayed twice after beginning to start the kdump kernel:
The kernel version is unsupported The makedumpfile operation may be incomplete
Then, when I attempt to use the created coredump file the apparent memory for the symbol linux_banner appears not to be present in the coredump. The result being that crash will not start because it can't verify the kernel binary and dumped kernel are the same version. As a matter of care I repeated the test multiple times, using all the kernel dump format options that create a file. In every case crash reports the coredump has missing memory and fails to start. Given that the creating of the coredump reported that the kernel version is "unsupported" and that the makedumpfile may be "incomplete" and that crash appears to find that there is missing memory in the coredump. I think we need to consider why the attempt to create the coredump behaves the way it does before further attempts to resolve crash. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 https://bugzilla.suse.com/show_bug.cgi?id=1190434#c21 --- Comment #21 from David Mair <dmair@suse.com> --- On my test system kexec is upstream latest release and makedumpfle is upstream latest release - 1 (each as present in Tumbleweed patched to-date). The test system is a kvm/qemu x86_64 VM, though I doubt that would cause the outcome of creating the coredump to report that the kernel version itself is unsupported. Instruction is welcome. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1190434 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |petr.vorel@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com