[Bug 1236380] New: BUG: Unable to handle kernel data access on read at 0x17f9eb0000

https://bugzilla.suse.com/show_bug.cgi?id=1236380 Bug ID: 1236380 Summary: BUG: Unable to handle kernel data access on read at 0x17f9eb0000 Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.6 Hardware: PowerPC-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: gaurav.pathak@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 879937 --> https://bugzilla.suse.com/attachment.cgi?id=879937&action=edit hwinfo The issue is happening on POWER8E (PowerNV 8247-22L) running Kernel 6.4.0-150600.23.33-default causing the machine unreachable. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c1 --- Comment #1 from Gaurav Pathak <gaurav.pathak@suse.com> --- Created attachment 879938 --> https://bugzilla.suse.com/attachment.cgi?id=879938&action=edit error log -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c4 --- Comment #4 from Gaurav Pathak <gaurav.pathak@suse.com> --- The machine crashed again, let me know if you want IPMI access for investigating or collecting logs. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c6 --- Comment #6 from Gaurav Pathak <gaurav.pathak@suse.com> --- Created attachment 880152 --> https://bugzilla.suse.com/attachment.cgi?id=880152&action=edit crash-log -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c7 --- Comment #7 from Gaurav Pathak <gaurav.pathak@suse.com> --- This time it is `watchdog: BUG: soft lockup` something similar and maybe related to https://bugzilla.opensuse.org/show_bug.cgi?id=1236379. Please refer attached log file -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c8 Gaurav Pathak <gaurav.pathak@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard| |https://progress.opensuse.o | |rg/issues/169939 --- Comment #8 from Gaurav Pathak <gaurav.pathak@suse.com> --- This Bug is related to - https://bugzilla.suse.com/show_bug.cgi?id=1227616 - https://bugzilla.suse.com/show_bug.cgi?id=1235450 Related whiteboard: https://progress.opensuse.org/issues/162296 Original whiteboard: https://progress.opensuse.org/issues/169939 I filed one more bug since it happened on the same machine, I think this is also related https://bugzilla.opensuse.org/show_bug.cgi?id=1236379 -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |okurz@suse.com -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c10 --- Comment #10 from Gaurav Pathak <gaurav.pathak@suse.com> --- (In reply to Takashi Iwai from comment #9)
(In reply to Gaurav Pathak from comment #8)
This Bug is related to - https://bugzilla.suse.com/show_bug.cgi?id=1227616 - https://bugzilla.suse.com/show_bug.cgi?id=1235450
Hmm, how do you conclude that...? Is the system with firewalld? And if you turn it off, it goes away?
Since, @Dominik Heidler already encountered this issue on x86, I tried his solution mentioned in https://progress.opensuse.org/issues/162296#note-33, disabled firewalld and added custom nftables systemd service. Even though the Power8 machine is crashing with this solution, not immediately within 20 minutes of booting up but after approximately more than 12-18 hours.
This pattern of crash is often rather a side-effect of memory corruption or such, too.
As Michal pointed in another entry, there have been a few powerpc issues. Could you verify with the latest SLE15-SP6 KOTD?
Right now, the machine is running kernel 6.4.0-150600.23.33-default, I have restored firewalld (2.0.1-150600.3.5.1), disabled custom nftables systemd service to verify if the power8 machine is still getting into similar kernel crash issue. I will try your suggestion to use SLE15-SP6 KOTD if I encounter issue with above mentioned attempt. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c12 --- Comment #12 from Gaurav Pathak <gaurav.pathak@suse.com> --- Created attachment 880383 --> https://bugzilla.suse.com/attachment.cgi?id=880383&action=edit kernel-6.4.0-150600.23.38-default-log -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c13 --- Comment #13 from Gaurav Pathak <gaurav.pathak@suse.com> --- The qa-power8 machine crashed again, since we have a continuous update service that keeps running, the kernel got upgraded to 6.4.0-150600.23.38-default -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c17 --- Comment #17 from Gaurav Pathak <gaurav.pathak@suse.com> --- Created attachment 880546 --> https://bugzilla.suse.com/attachment.cgi?id=880546&action=edit Kernel Crash - vmcore and dmesg -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c18 --- Comment #18 from Gaurav Pathak <gaurav.pathak@suse.com> --- (In reply to Gaurav Pathak from comment #17)
Created attachment 880546 [details] Kernel Crash - vmcore and dmesg
The vmcore file is about 188MB, so, I uploaded to google drive. -- You are receiving this mail because: You are on the CC list for the bug.

https://bugzilla.suse.com/show_bug.cgi?id=1236380 https://bugzilla.suse.com/show_bug.cgi?id=1236380#c19 --- Comment #19 from Gaurav Pathak <gaurav.pathak@suse.com> --- (In reply to Gaurav Pathak from comment #18)
(In reply to Gaurav Pathak from comment #17)
Created attachment 880546 [details] Kernel Crash - vmcore and dmesg
The vmcore file is about 188MB, so, I uploaded to google drive. It's around 280MB -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com