[Bug 1201644] New: Kernel crashes on Dell PowerEdge R340 and R440 after latest kernel upgrade
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 Bug ID: 1201644 Summary: Kernel crashes on Dell PowerEdge R340 and R440 after latest kernel upgrade Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.3 Hardware: x86-64 OS: openSUSE Leap 15.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: r.ronneburger@fio.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- On several Dell PowerEdge Servers that ran without problems on 15.3 for a year we now have crashes on boot after upgrading to 5.3.18-150300.59.81.1. Sometimes it's barely possible to login before the kernel crashes. Booting with the last kernel (.76) works flawlessly. Firmware- and BIOS Upgrades did not help to solve this. The message before the crash in /var/log/messages on one machine is: 2022-07-19T13:39:27.440906+02:00 hostname kernel: [ 50.541883] BUG: kernel NULL pointer dereference, address: 0000000000000000 2022-07-19T13:39:27.440917+02:00 hostname kernel: [ 50.541887] #PF: supervisor instruction fetch in kernel mode 2022-07-19T13:39:27.440918+02:00 hostname kernel: [ 50.541888] #PF: error_code(0x0010) - not-present page 2022-07-19T13:39:27.440919+02:00 hostname kernel: [ 50.541890] PGD 0 P4D 0 2022-07-19T13:39:27.440919+02:00 hostname kernel: [ 50.541891] Oops: 0010 [#2] SMP PTI 2022-07-19T13:39:27.440920+02:00 hostname kernel: [ 50.541893] CPU: 0 PID: 6879 Comm: iptables Tainted: G D X N 5.3.18-150300.59.81-default #1 SLE15-SP3 2022-07-19T13:39:27.440920+02:00 hostname kernel: [ 50.541895] Hardware name: Dell Inc. PowerEdge R340/045M96, BIOS 2.9.1 03/23/2022 2022-07-19T13:39:27.440920+02:00 hostname kernel: [ 50.541898] RIP: 0010:0x0 2022-07-19T13:39:27.440922+02:00 hostname kernel: [ 50.541900] Code: Bad RIP value. 2022-07-19T13:39:27.440922+02:00 hostname kernel: [ 50.541901] RSP: 0000:fffffe0000009ee0 EFLAGS: 00010046 2022-07-19T13:39:27.440922+02:00 hostname kernel: [ 50.541903] RAX: 0000000000000001 RBX: ffff99f41e800000 RCX: 0000000000000048 2022-07-19T13:39:27.440923+02:00 hostname kernel: [ 50.541904] RDX: 0000000000000000 RSI: ffffffffb9c018af RDI: 00005586ead591a0 2022-07-19T13:39:27.440923+02:00 hostname kernel: [ 50.541906] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 2022-07-19T13:39:27.440924+02:00 hostname kernel: [ 50.541907] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 2022-07-19T13:39:27.440924+02:00 hostname kernel: [ 50.541909] R13: 0000000000000000 R14: 0000000858600001 R15: 0000000000000001 2022-07-19T13:39:27.440924+02:00 hostname kernel: [ 50.541911] FS: 00007ffa5ea22440(0000) GS:ffff99f41e800000(0000) knlGS:0000000000000000 2022-07-19T13:39:27.440925+02:00 hostname kernel: [ 50.541912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-07-19T13:39:27.440925+02:00 hostname kernel: [ 50.541914] CR2: ffffffffffffffd6 CR3: 0000000858600001 CR4: 00000000003706f0 2022-07-19T13:39:27.440925+02:00 hostname kernel: [ 50.541915] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2022-07-19T13:39:27.440926+02:00 hostname kernel: [ 50.541917] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2022-07-19T13:39:27.440926+02:00 hostname kernel: [ 50.541918] Call Trace: 2022-07-19T13:39:27.440926+02:00 hostname kernel: [ 50.541920] <NMI> 2022-07-19T13:39:27.440927+02:00 hostname kernel: [ 50.541924] ? end_repeat_nmi+0x7/0x6d 2022-07-19T13:39:27.440927+02:00 hostname kernel: [ 50.541926] ? page_fault+0x8/0x50 2022-07-19T13:39:27.440927+02:00 hostname kernel: [ 50.541928] </NMI> 2022-07-19T13:39:27.440928+02:00 hostname kernel: [ 50.541929] <ENTRY_TRAMPOLINE> 2022-07-19T13:39:27.440928+02:00 hostname kernel: [ 50.541931] ? error_entry+0x8b/0x150 2022-07-19T13:39:27.440928+02:00 hostname kernel: [ 50.541933] ? page_fault+0x8/0x50 2022-07-19T13:39:27.440929+02:00 hostname kernel: [ 50.541935] </ENTRY_TRAMPOLINE> 2022-07-19T13:39:27.440929+02:00 hostname kernel: [ 50.541936] Modules linked in: xt_policy xt_recent nf_log_ipv6 xt_MASQUERADE xt_limit xt_nat nf_log_ipv4 nf_log_common iptable_nat xt_LOG mpt3sas raid_class scfkill xt_pkttype xt_tcpudp ip6t_REJECT ipt_REJECT iptable_filter bpfilter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp ip_tables xt_conntrack nf_conntrack nf_(X) dmi_sysfs msr intel_rapl_msr intel_rapl_common intel_pmc_core_pltdrv(N) intel_pmc_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm dcdbas(X) mgag20_kms_helper idma64 aesni_intel tg3 cec rc_core cdc_ether syscopyarea usbnet crypto_simd mei_me cryptd sysfillrect sysimgblt mii intel_lpss_pci i40e pcspkr fb_sys_fops libphy i2c_i801 ipmi_si glue_helper 2022-07-19T13:39:27.440930+02:00 hostname kernel: [ 50.541953] intel_lpss mei ipmi_devintf ipmi_msghandler intel_pch_thermal ie31200_edac ac button fuse drm configfs btrfs libcrc32c xor raid6_pq sd_mod t10_pi video pinctrl_cannonlake sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod 2022-07-19T13:39:27.440930+02:00 hostname kernel: [ 50.541975] Supported: No, Unsupported modules are loaded 2022-07-19T13:39:27.440930+02:00 hostname kernel: [ 50.541977] CR2: 0000000000000000 2022-07-19T13:39:27.440931+02:00 hostname kernel: [ 50.541978] ---[ end trace 929840a74a61d4f6 ]--- 2022-07-19T13:39:27.440931+02:00 hostname kernel: [ 50.544609] RIP: 0010:0x0 2022-07-19T13:39:27.440931+02:00 hostname kernel: [ 50.544612] Code: Bad RIP value. 2022-07-19T13:39:27.440932+02:00 hostname kernel: [ 50.544613] RSP: 0018:fffffe000018aee0 EFLAGS: 00010046 2022-07-19T13:39:27.440932+02:00 hostname kernel: [ 50.544615] RAX: 0000000000000001 RBX: ffff99f41e9c0000 RCX: 0000000000000048 2022-07-19T13:39:27.440932+02:00 hostname kernel: [ 50.544617] RDX: 0000000000000000 RSI: ffffffffb9c018af RDI: ffff99ecc7152800 2022-07-19T13:39:27.440933+02:00 hostname kernel: [ 50.544618] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 2022-07-19T13:39:27.440933+02:00 hostname kernel: [ 50.544620] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 2022-07-19T13:39:27.440933+02:00 hostname kernel: [ 50.544621] R13: 0000000000000000 R14: 0000000847cd8005 R15: 0000000000000001 2022-07-19T13:39:27.440934+02:00 hostname kernel: [ 50.544623] FS: 00007ffa5ea22440(0000) GS:ffff99f41e800000(0000) knlGS:0000000000000000 2022-07-19T13:39:27.440934+02:00 hostname kernel: [ 50.544625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-07-19T13:39:27.440934+02:00 hostname kernel: [ 50.544626] CR2: ffffffffffffffd6 CR3: 0000000858600001 CR4: 00000000003706f0 2022-07-19T13:39:27.440934+02:00 hostname kernel: [ 50.544642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2022-07-19T13:39:27.440935+02:00 hostname kernel: [ 50.544643] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c2 Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |admin@fph.physik.uni-karlsr | |uhe.de --- Comment #2 from Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> --- Same here on a simple desktop (Fujitsu P958) with kernel 5.3.18-150300.59.81: crashes after some minutes. Reverting to previous kernel (5.3.18-150300.59.76): ok. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 Stanley Miller <suse@stanmiller.info> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |suse@stanmiller.info -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c11 --- Comment #11 from Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> --- May I just ask: Is kernel-*-5.3.18-150300.59.81.1 *still* in the repos and offered to openSuSE 15.3 systems? To me it looks like this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c12 Volker Koehne <koehnevolker@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |koehnevolker@gmail.com --- Comment #12 from Volker Koehne <koehnevolker@gmail.com> --- Same problem here: Dell XPS 8960 Intel i7-8700, OpenSuse leap 15-3 (all mandatory patches installed). kernel hangs after ~5minutes. Older kernel -76 is stable. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 Andreas Vetter <vetter@physik.uni-wuerzburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vetter@physik.uni-wuerzburg | |.de -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c14 Henk van Velden <henk.vanvelden@xs4all.nl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |henk.vanvelden@xs4all.nl --- Comment #14 from Henk van Velden <henk.vanvelden@xs4all.nl> --- Thanks for the solution. BTW it is not restricted to DELL systems. I have problems of the kind on a HP Pavilion. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c15 --- Comment #15 from Borislav Petkov <bpetkov@suse.com> --- Ok, new kernel here: https://download.opensuse.org/repositories/home:/bpetkov:/15sp3/pool/ Pls test. Thx. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c16 --- Comment #16 from Frank Steiner <steiner-reg@bio.ifi.lmu.de> --- I could test one HP AIO 800 G3 which doesn't show any kernel oops and still runs after 45 minutes. And one Dell R540 that did always crash during boot and now reaches multi-user.target and is still running after some minutes, also without any oops in dmesg. So, wrt this very small test set, your kernel seems to work fine! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c17 --- Comment #17 from Borislav Petkov <bpetkov@suse.com> --- Thanks for testing! Lemme know should you see any hickups in the coming days. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c18 Moritz Duge <duge@pre-sense.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |duge@pre-sense.de --- Comment #18 from Moritz Duge <duge@pre-sense.de> --- Probably related: bug 1201681, bug 1201632, bug 1201665, bug 1201664 May it be that spectre_v2=retpoline is the better choice regarding security? (spectre_v2=retpoline and spectre_v2=off work both for me) See: ���https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/spectre.html ���https://www.phoronix.com/scan.php?page=article&item=amd-retpoline-2022&num=1 ���https://unix.stackexchange.com/questions/617276/spectre-v2-retpoline-and-per... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c19 --- Comment #19 from Borislav Petkov <bpetkov@suse.com> --- (In reply to Moritz Duge from comment #18)
May it be that spectre_v2=retpoline is the better choice regarding security?
Regarding protection against which one of the CPU bugs and what machine? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 Borislav Petkov <bpetkov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |bpetkov@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 http://bugzilla.opensuse.org/show_bug.cgi?id=1201644#c21 Alexander Kruppa <akruppa@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |akruppa@gmail.com --- Comment #21 from Alexander Kruppa <akruppa@gmail.com> --- Same problem. CPU: Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz (Signature: Type 0, Family 6, Model 94, Stepping 3) Mainboard: HP MS-7957 (taken from a HP ProDesk 400 G3) RAM: 2x8GB GSkill F4-2133C15-8GNT Problem occurred with various BIOS versions. Downgrading kernel to .76 fixed it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1201644 Moritz Duge <duge@pre-sense.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|duge@pre-sense.de | -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com