[Bug 1220501] New: Kernel panics on EEVDF
https://bugzilla.suse.com/show_bug.cgi?id=1220501 Bug ID: 1220501 Summary: Kernel panics on EEVDF Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: l4rryc0n5014@gmail.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 873070 --> https://bugzilla.suse.com/attachment.cgi?id=873070&action=edit Compressed copy of /var/log/journal, accessed from Windows with WinBTRFS My laptop has been kernel panicking quite frequently as of late. At first I thought it was because of the boot option `pcie_aspm=force acpi_backlight=native` but it wasn't. I dropped into recovery mode to check `journalctl` and I found the recent panics were due to: EEVDF scheduling fail, picking leftmost BUG: kernel NULL pointer dereference, address: 00000000000000a0 I saw a lot of `EEVDF scheduling fail` before the entries cuts off for the next boot session which was me ACPI shutting down the laptop each time it grinds to a halt. Sometimes, I saw two additional entries in the journal: #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page A quick search on the first two entries yielded <https://lkml.org/lkml/2024/2/19/678>. Not sure how to read the thing considering it's Linux kernel stuffs. I also thought it was due to data corruption or something on the `/` and `/home` but `btrfs check --force --check-data-csum` said both were healthy, and I reinstalled openSUSE back Feb 19 after zeroing the two partitions for good measure (due to mishaps that compromised data integrity). The only things that has anything to do with the kernel I installed were VBox, Docker, KVM and QEMU, and the NVIDIA driver with blacklisting `nouveau` using https://en.opensuse.org/SDB:NVIDIA_the_hard_way, the rest was from https://en.opensuse.org/SDB:NVIDIA_drivers for G06 with a bunch more G06 stuff installed. Included is a compressed file or /var/log/journal which should include my laptop model; but for safety measure, I am using an Acer Nitro 5 (AN515-46-R6QR) running AMD Ryzen 7 6800H with an NVIDIA 3060 Mobile, upgraded to 32GB of RAM from stock configuration of 16GB with an additional 2TB drive added to the spare M.2 slot. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c1 Chiyu Miyuki <l4rryc0n5014@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #873070|0 |1 is obsolete| | --- Comment #1 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- Created attachment 873072 --> https://bugzilla.suse.com/attachment.cgi?id=873072&action=edit Compressed copy of /var/log/journal, including the latest panic with proper dump The latest crash could be found in the boot session prior to 96b349deff0c4e2caaf2620e0901e70b. I was browsing Discord when the panic happened. I let the laptop carry on for around 2 minutes before ACPI shutting down, then dropped into Recovery mode to verify the log's existence. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c4 --- Comment #4 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- (In reply to Takashi Iwai from comment #2)
Could you rather extract only the relevant message instead of the whole journals?
Also, have you tried the patch mentioned in the LKML thread?
See attachment `EEVDF_Panic.log` -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c5 --- Comment #5 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- Created attachment 873125 --> https://bugzilla.suse.com/attachment.cgi?id=873125&action=edit EEVDF panic's log extracted from journal -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c6 --- Comment #6 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- (In reply to Takashi Iwai from comment #3)
(In reply to Takashi Iwai from comment #2)
Also, have you tried the patch mentioned in the LKML thread?
The patch seems applicable only to 6.8-rc, though, and you're running TW kernels, so likely no-go with it.
There have been already many changes in EEVDF code between 6.7 and 6.8-rc. If you want more stability and need the downstream stuff that can't be easily built with the latest *-rc kernel, better to stick with other traditional scheduler.
OTOH, if you'd like to help for debugging and development, you'd need to switch to the latest kernel (6.8-rc), and you'll have to deal with other downstream stuff by yourself. If you want it, let us know. I can provide a patched kernel package, too.
I have not tried the patch, considering that I prefer *not* touching the kernel if I can help it. And what do you mean by "the downstream stuff"? Also, I tried updating to snapshot 20240226. Still saw a panic. It's frustrating that I cannot reliably have the thing panic. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c8 --- Comment #8 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- I...think I'll wait for 6.8 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1220501 https://bugzilla.suse.com/show_bug.cgi?id=1220501#c9 Chiyu Miyuki <l4rryc0n5014@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Chiyu Miyuki <l4rryc0n5014@gmail.com> --- Issue appears resolve. I'm just a dumbo that forgot to update the bug report. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com