[Bug 1189469] New: Linux Kernel 5.13.8 Crashes
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 Bug ID: 1189469 Summary: Linux Kernel 5.13.8 Crashes Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: mmanno@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I'm running k3d (4.4.6) on Tumbleweed's 5.13.8-1 kernel to install Kubernetes in Docker 20.10.6_ce-1.2. After installing Kubernetes with `k3d cluster create`, I try to install Epinio for testing. After some time, the kernel crashes and the machine reboots when the timeout in `/proc/sys/kernel/panic` is reached. Sometimes the crash occurs during the first installation of Epinio and its components, sometimes the crash occurs later, during testing. Even an incomplete installation will eventually crash the kernel. Other team members report a similar problem with the latest kernel. Going back to kernel 5.3.12-1 is the only workaround. I tried to create a kdump for the crash. Not sure if I should attach that, as it's 184mb? Here is the backtrace from dmesg: [ 1433.295401] general protection fault, probably for non-canonical address 0xb00fcd7a229657cd: 0000 [#1] SMP NOPTI [ 1433.295414] CPU: 0 PID: 26175 Comm: runc Kdump: loaded Tainted: G W OE 5.13.8-1-debug #1 openSUSE Tumbleweed [ 1433.295422] Hardware name: Dell Inc. Precision 5820 Tower X-Series/0X75JG, BIOS 2.4.0 07/06/2020 [ 1433.295426] RIP: 0010:kmem_cache_alloc_node_trace+0x79/0x2d0 [ 1433.295438] Code: 89 c6 48 85 c0 0f 84 b4 00 00 00 0f 1f 44 00 00 48 c7 44 24 10 00 00 00 00 e9 2a 01 00 00 0f 1f 44 00 00 41 8b 56 28 48 01 c2 <4c> 8b 02 48 89 d1 4d 33 86 b8 00 00 00 48 0f c9 49 31 c8 48 8d 4b [ 1433.295444] RSP: 0018:ffffab13052e3c88 EFLAGS: 00010282 [ 1433.295450] RAX: b00fcd7a229656cd RBX: 0000000000010a86 RCX: 0000000000000400 [ 1433.295454] RDX: b00fcd7a229657cd RSI: 0000000000000dc0 RDI: ffff8a7f00042a00 [ 1433.295458] RBP: 0000000000000dc0 R08: ffff8a8e52e34140 R09: ffff8a7f0809f800 [ 1433.295462] R10: 0000000000000011 R11: 0000000001320122 R12: ffff8a7f00042a00 [ 1433.295466] R13: 0000000000000000 R14: ffff8a7f00042a00 R15: ffffffffa34e6905 [ 1433.295470] FS: 00007f12e5e31f20(0000) GS:ffff8a8e52e00000(0000) knlGS:0000000000000000 [ 1433.295475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1433.295479] CR2: 00007f4f471adf08 CR3: 000000030dcfe006 CR4: 00000000003706f0 [ 1433.295483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1433.295486] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1433.295490] Call Trace: [ 1433.295498] alloc_fair_sched_group+0xf5/0x1d0 [ 1433.295511] sched_create_group+0x2f/0x80 [ 1433.295522] cpu_cgroup_css_alloc+0xf/0x30 [ 1433.295529] cgroup_apply_control_enable+0x14e/0x330 [ 1433.295541] cgroup_mkdir+0x21f/0x470 [ 1433.295549] kernfs_iop_mkdir+0x54/0x80 [ 1433.295557] vfs_mkdir+0x12c/0x1e0 [ 1433.295565] do_mkdirat+0x127/0x150 [ 1433.295573] do_syscall_64+0x5e/0xb0 [ 1433.295585] ? syscall_exit_to_user_mode+0x18/0x40 [ 1433.295591] ? do_syscall_64+0x6e/0xb0 [ 1433.295599] ? syscall_exit_to_user_mode+0x18/0x40 [ 1433.295603] ? do_syscall_64+0x6e/0xb0 [ 1433.295610] ? syscall_exit_to_user_mode+0x18/0x40 [ 1433.295615] ? do_syscall_64+0x6e/0xb0 [ 1433.295622] ? do_syscall_64+0x6e/0xb0 [ 1433.295628] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1433.295637] RIP: 0033:0x4becdb [ 1433.295642] Code: fa ff eb bd e8 26 be fa ff e9 61 ff ff ff cc e8 3b 8d fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 1433.295647] RSP: 002b:000000c0000e8670 EFLAGS: 00000202 ORIG_RAX: 0000000000000102 [ 1433.295652] RAX: ffffffffffffffda RBX: 000000c00002e000 RCX: 00000000004becdb [ 1433.295656] RDX: 00000000000001ed RSI: 000000c0001401b0 RDI: ffffffffffffff9c [ 1433.295659] RBP: 000000c0000e86c8 R08: 0000000000000001 R09: 0000000000000001 [ 1433.295662] R10: 000000c0001401b0 R11: 0000000000000202 R12: ffffffffffffffff [ 1433.295665] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000038 [ 1433.295674] Modules linked in: xt_owner xt_REDIRECT ipt_REJECT vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_statistic xt_mark vxlan ip6_udp_tunnel udp_tunnel xt_multiport xt_comment overlay xt_nat xt_tcpudp veth af_packet xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib br_netfilter bridge stp llc nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat dm_thin_pool nf_conntrack dm_persistent_data dm_bio_prison dm_bufio nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter rfkill ip6_tables iptable_filter ip_tables x_tables bpfilter intel_rapl_msr intel_rapl_common isst_if_common dmi_sysfs squashfs pktcdvd nfit libnvdimm snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt [ 1433.295771] intel_pmc_bxt x86_pkg_temp_thermal intel_powerclamp ledtrig_audio snd_hda_codec_hdmi coretemp iTCO_vendor_support dell_smm_hwmon kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec dell_wmi snd_hda_core snd_hwdep dell_smbios kvm dcdbas snd_pcm mei_me snd_timer sparse_keymap xfs irqbypass video e1000e snd dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof efi_pstore mei i2c_i801 ioatdma soundcore i2c_smbus dca nls_iso8859_1 nls_cp437 tiny_power_button vfat fat libcrc32c acpi_tad button loop fuse configfs uas usb_storage amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel drm_ttm_helper ttm ghash_clmulni_intel iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops xhci_pci_renesas cec xhci_hcd rc_core drm nvme usbcore nvme_core aesni_intel sr_mod crypto_simd cryptd cdrom serio_raw wmi vmd sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c2 --- Comment #2 from Mario Manno <mmanno@suse.com> --- Created attachment 851836 --> http://bugzilla.opensuse.org/attachment.cgi?id=851836&action=edit Dmesg 08-16 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c3 --- Comment #3 from Mario Manno <mmanno@suse.com> --- Created attachment 851837 --> http://bugzilla.opensuse.org/attachment.cgi?id=851837&action=edit Dmesg 08-17 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c4 --- Comment #4 from Mario Manno <mmanno@suse.com> --- The crashes are a bit different, not sure if I set up kdump correctly. I took a photo when this first happened in July, there the trace starts with "memcg_alloc_page_obj_cgroups". Do you prefer any internal storage for the kdump data? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c5 --- Comment #5 from Takashi Iwai <tiwai@suse.com> --- Hrm, both cases seem hitting at the slab allocation. Now I noticed that you're running kernel-debug. Is it intentional? And since TW is a fast moving target, it's better to test with the later kernel that is available in OBS Kernel:stable repo. Now it's already 5.13.11. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c6 --- Comment #6 from Mario Manno <mmanno@suse.com> --- I'm running the debug kernel, because I tried to use 'crash' to look at the kdump files and it complained about SMP and CRC mismatches. I'll update to the latest kernel and switch to default. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c7 --- Comment #7 from Mario Manno <mmanno@suse.com> --- Created attachment 851896 --> http://bugzilla.opensuse.org/attachment.cgi?id=851896&action=edit Dmesg 08-17 Linux version 5.13.11-1.g8c13a2d-default -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c8 --- Comment #8 from Mario Manno <mmanno@suse.com> --- Created attachment 851900 --> http://bugzilla.opensuse.org/attachment.cgi?id=851900&action=edit Dmesg 08-18 (2) Linux version 5.13.11-1.g8c13a2d-default -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c10 Tim Hardeck <thardeck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |thardeck@suse.com --- Comment #10 from Tim Hardeck <thardeck@suse.com> --- I am experiencing the same issue regularly on my Dell Precision 5530 with the latest Tumbleweed kernels. Often it even happens without actively using docker or k3d when the system is started, just when the docker service is enabled. The workaround for me was to disable docker on boot but of course as soon as I start the service my system might crash. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c11 --- Comment #11 from Vlastimil Babka <vbabka@suse.com> --- Please try booting with kernel parameter "slub_debug". Note there's some performance and memory overhead. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c12 Mario Manno <mmanno@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo? --- Comment #12 from Mario Manno <mmanno@suse.com> --- I'm not getting kdump in /var/crash anymore, but I can see this while booting Aug 24 15:49:42 ws kernel: ------------[ cut here ]------------ Aug 24 15:49:42 ws kernel: rq->tmp_alone_branch != &rq->leaf_cfs_rq_list Aug 24 15:49:42 ws kernel: WARNING: CPU: 0 PID: 0 at kernel/sched/fair.c:401 enqueue_task_fair+0x530/0x5a0 Aug 24 15:49:42 ws kernel: Modules linked in: xt_owner xt_REDIRECT vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_mark ipt_REJECT xt_multiport xt_comment overlay veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE af_packet nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security dm_thin_pool dm_persistent_data dm_bio_prison ip_set dm_bufio nfnetlink ebtable_filter ebtables ip6table_filter vboxnetadp(OE) vboxnetflt(OE) rfkill ip6_tables iptable_filter ip_tables x_tables bpfilter vboxdrv(OE) dmi_sysfs intel_rapl_msr intel_rapl_common isst_if_common snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt intel_pmc_bxt ledtrig_audio squashfs iTCO_vendor_support Aug 24 15:49:42 ws kernel: pktcdvd dell_smm_hwmon snd_hda_codec_hdmi nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp dell_wmi coretemp kvm_intel snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi dell_smbios snd_hda_codec dcdbas snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass snd mei_me i2c_i801 ioatdma sparse_keymap video wmi_bmof dell_wmi_descriptor intel_wmi_thunderbolt efi_pstore e1000e soundcore i2c_smbus mei dca tiny_power_button acpi_tad button xfs nls_iso8859_1 nls_cp437 libcrc32c vfat fat loop fuse configfs ext4 mbcache jbd2 uas usb_storage amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm aesni_intel xhci_pci crypto_simd xhci_pci_renesas nvme cryptd xhci_hcd nvme_core sr_mod cdrom usbcore wmi vmd sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs Aug 24 15:49:42 ws kernel: CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G OE 5.13.12-5.g33df9c6-default #1 openSUSE Tumbleweed (unreleased) Aug 24 15:49:42 ws kernel: Hardware name: Dell Inc. Precision 5820 Tower X-Series/0X75JG, BIOS 2.4.0 07/06/2020 Aug 24 15:49:42 ws kernel: RIP: 0010:enqueue_task_fair+0x530/0x5a0 Aug 24 15:49:42 ws kernel: Code: 0f 1f 44 00 00 e9 15 fc ff ff 80 3d 45 b9 b5 01 00 0f 85 23 fc ff ff 48 c7 c7 38 69 34 a6 c6 05 31 b9 b5 01 01 e8 ba 95 8d 00 <0f> 0b e9 09 fc ff ff 80 3d 21 b9 b5 01 00 0f 85 34 fe ff ff 48 c7 Aug 24 15:49:42 ws kernel: RSP: 0018:ffffb87940003f48 EFLAGS: 00010086 Aug 24 15:49:42 ws kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 Aug 24 15:49:42 ws kernel: RDX: ffff91b052e1a788 RSI: 0000000000000001 RDI: ffff91b052e1a780 Aug 24 15:49:42 ws kernel: RBP: ffff91b052e2ed40 R08: ffff91b09fecb328 R09: 00000000ffff7fff Aug 24 15:49:42 ws kernel: R10: ffff91b09fb4b340 R11: ffff91b09fb4b340 R12: ffff91b052e2ecc0 Aug 24 15:49:42 ws kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff91b052e2ecc0 Aug 24 15:49:42 ws kernel: FS: 0000000000000000(0000) GS:ffff91b052e00000(0000) knlGS:0000000000000000 Aug 24 15:49:42 ws kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 24 15:49:42 ws kernel: CR2: 00000000099a81f0 CR3: 000000014c332005 CR4: 00000000003706f0 Aug 24 15:49:42 ws kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 24 15:49:42 ws kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 24 15:49:42 ws kernel: Call Trace: Aug 24 15:49:42 ws kernel: <IRQ> Aug 24 15:49:42 ws kernel: ttwu_do_activate+0x72/0x180 Aug 24 15:49:42 ws kernel: sched_ttwu_pending+0xd2/0x160 Aug 24 15:49:42 ws kernel: __sysvec_call_function_single+0x2c/0x90 Aug 24 15:49:42 ws kernel: sysvec_call_function_single+0x6d/0x90 Aug 24 15:49:42 ws kernel: </IRQ> Aug 24 15:49:42 ws kernel: asm_sysvec_call_function_single+0x12/0x20 Aug 24 15:49:42 ws kernel: RIP: 0010:nohz_run_idle_balance+0x2c/0x60 Aug 24 15:49:42 ws kernel: Code: 44 00 00 49 c7 c0 c0 ec 02 00 48 63 ff 4c 89 c2 48 03 14 fd 00 49 41 a6 8b 42 64 48 83 c2 64 89 c1 89 c6 83 e1 fb f0 0f b1 0a <75> f3 83 fe 04 74 01 c3 65 48 8b 04 25 c0 9b 01 00 48 8b 00 a8 08 Aug 24 15:49:42 ws kernel: RSP: 0018:ffffffffa6a03ea8 EFLAGS: 00000246 Aug 24 15:49:42 ws kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Aug 24 15:49:42 ws kernel: RDX: ffff91b052e2ed24 RSI: 0000000000000000 RDI: 0000000000000000 Aug 24 15:49:42 ws kernel: RBP: ffffffffa6a1a940 R08: 000000000002ecc0 R09: 00000000000001e9 Aug 24 15:49:42 ws kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 Aug 24 15:49:42 ws kernel: R13: ffffffffa71722e0 R14: 00000000000000cd R15: 00000000000000cd Aug 24 15:49:42 ws kernel: do_idle+0x38/0x2a0 Aug 24 15:49:42 ws kernel: cpu_startup_entry+0x19/0x20 Aug 24 15:49:42 ws kernel: start_kernel+0x7c5/0x7ec Aug 24 15:49:42 ws kernel: secondary_startup_64_no_verify+0xc2/0xcb Aug 24 15:49:42 ws kernel: ---[ end trace 67d9e2d1c01b2797 ]--- -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 Tim Hardeck <thardeck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(vbabka@suse.com) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c14 --- Comment #14 from Vlastimil Babka <vbabka@suse.com> --- (In reply to Tim Hardeck from comment #13)
Created attachment 852043 [details] Dmesg 08-25 with slub_debug
Unfortunately this triggered a bug before the slub debugging could spot an issue. Could you give it more tries, maybe it will eventually find something, or we'll see some common pattern in the oopses. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c15 --- Comment #15 from Mario Manno <mmanno@suse.com> --- Created attachment 852258 --> http://bugzilla.opensuse.org/attachment.cgi?id=852258&action=edit 5.14.0-2.gfdea7b9-debug -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 Tim Hardeck <thardeck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vbabka@suse.com) | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 Tim Hardeck <thardeck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Linux Kernel 5.13.8 Crashes |Linux Kernel 5.14.6 Crashes | |during docker container | |usage -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189469 http://bugzilla.opensuse.org/show_bug.cgi?id=1189469#c17 Tim Hardeck <thardeck@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(vbabka@suse.com) --- Comment #17 from Tim Hardeck <thardeck@suse.com> --- Created attachment 852986 --> http://bugzilla.opensuse.org/attachment.cgi?id=852986&action=edit Dmesg extract of Crash with 5.14.6 and slub_debug -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com