http://bugzilla.novell.com/show_bug.cgi?id=1039737 Bug ID: 1039737 Summary: Kernel BUG at ../mm/huge_memory.c / split_huge_page Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: x86-64 OS: openSUSE 42.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: hansper@t-online.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- There seems to be a kernel bug in openSuSE 42.2, kernel 4.4.62-18.6-default. It is also present in kernel 4.4.57-18.3.1-default. It leads to processes hanging that can't be killed and the system becomes unstable, reboot fails (hangs forever; reset switch has to be pressed) The system uses 128GB of non-ECC RAM. Kernel command line is BOOT_IMAGE=/boot/vmlinuz-4.4.62-18.6-default root=UUID=xxx quiet splash showopts kvm-intel.nested=1 pci=noaer The bug has occurred multiple times now (usually after 100-150h uptime). It seems to be independent of how much system RAM is used, it occurs if just 30GB are used as well as 90+GB used. The kernel is tainted because the official nvidia driver (from openSuSE repositories) is present. Here is the most recent inexact backtrace (this time a Java program was affected, other times Evolution or other programs were affected): May 18 06:27:00 mframe kernel: ------------[ cut here ]------------ May 18 06:27:00 mframe kernel: kernel BUG at ../mm/huge_memory.c:1983! May 18 06:27:00 mframe kernel: invalid opcode: 0000 [#1] SMP May 18 06:27:00 mframe kernel: Modules linked in: isofs xt_nat xt_tcpudp veth nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ipt_MASQ May 18 06:27:00 mframe kernel: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ir_lirc_codec lirc_dev ir_xmp_dec May 18 06:27:00 mframe kernel: ehci_pci drm_kms_helper xhci_hcd syscopyarea ehci_hcd ahci sysfillrect sysimgblt libahci fb_sys_fops usbcor May 18 06:27:00 mframe kernel: CPU: 4 PID: 8713 Comm: java Tainted: P W O 4.4.62-18.6-default #1 May 18 06:27:00 mframe kernel: Hardware name: Gigabyte Technology Co., Ltd. Default string/X99-UD4-CF, BIOS F22 06/13/2016 May 18 06:27:00 mframe kernel: task: ffff880035ed1300 ti: ffff8817f536c000 task.ti: ffff8817f536c000 May 18 06:27:00 mframe kernel: RIP: 0010:[<ffffffff811f1177>] [<ffffffff811f1177>] __split_huge_page+0x657/0x710 May 18 06:27:00 mframe kernel: RSP: 0018:ffff8817f536fb80 EFLAGS: 00010286 May 18 06:27:00 mframe kernel: RAX: 0000001f77ce8067 RBX: 00000000f9b00000 RCX: 8000000cccd00867 May 18 06:27:00 mframe kernel: RDX: ffff881f77ce8800 RSI: 00003ffffffff000 RDI: 0000001f77ce8000 May 18 06:27:00 mframe kernel: RBP: ffffea0033334000 R08: 0000000000000800 R09: ffffea0000000000 May 18 06:27:00 mframe kernel: R10: 0000000000000000 R11: 0000000000000e68 R12: ffff8817f5a16590 May 18 06:27:00 mframe kernel: R13: ffff8817db87ee68 R14: ffff880000000000 R15: 0000160000000000 May 18 06:27:00 mframe kernel: FS: 00007ff685e07700(0000) GS:ffff881faf500000(0000) knlGS:0000000000000000 May 18 06:27:00 mframe kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 18 06:27:00 mframe kernel: CR2: 00007ff6add3c768 CR3: 0000001f62a51000 CR4: 00000000003426e0 May 18 06:27:00 mframe kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 18 06:27:00 mframe kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 18 06:27:00 mframe kernel: Stack: May 18 06:27:00 mframe kernel: ffff881fac5ff800 00000000f9c00000 00000000000f9a00 0000000001f77ce8 May 18 06:27:00 mframe kernel: 00000000f9a00000 ffffea0033330000 ffff881f7c292e80 0000001f77ce8067 May 18 06:27:00 mframe kernel: 0000000100000000 ffffea005f6e1fb0 0000001f77ce8067 ffffea0033330000 May 18 06:27:00 mframe kernel: Call Trace: May 18 06:27:00 mframe kernel: [<ffffffff811f1292>] split_huge_page_to_list+0x62/0xc0 May 18 06:27:00 mframe kernel: [<ffffffff811f1d2a>] __split_huge_page_pmd+0x1ca/0x4d0 May 18 06:27:00 mframe kernel: [<ffffffff811f2d93>] vma_adjust_trans_huge+0x93/0xe0 May 18 06:27:00 mframe kernel: [<ffffffff811c15c8>] vma_adjust+0x148/0x700 May 18 06:27:00 mframe kernel: [<ffffffff811c1c96>] __split_vma.isra.32+0x116/0x1d0 May 18 06:27:00 mframe kernel: [<ffffffff811c2b5d>] do_munmap+0xfd/0x390 May 18 06:27:00 mframe kernel: [<ffffffff811c351c>] mmap_region+0x1dc/0x620 May 18 06:27:00 mframe kernel: [<ffffffff811c3c5c>] do_mmap+0x2fc/0x3d0 May 18 06:27:00 mframe kernel: [<ffffffff811a96bf>] vm_mmap_pgoff+0x8f/0xc0 May 18 06:27:00 mframe kernel: [<ffffffff811c21fa>] SyS_mmap_pgoff+0xfa/0x240 May 18 06:27:00 mframe kernel: [<ffffffff8160e072>] entry_SYSCALL_64_fastpath+0x16/0x71 May 18 06:27:00 mframe kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 May 18 06:27:00 mframe kernel: May 18 06:27:00 mframe kernel: Leftover inexact backtrace: May 18 06:27:00 mframe kernel: Code: 5c 41 5d 41 5e 41 5f c3 31 c0 eb ba 65 ff 0d b9 ab e1 7e f3 90 49 8b 06 a9 00 00 80 00 75 f4 65 ff 05 May 18 06:27:00 mframe kernel: RIP [<ffffffff811f1177>] __split_huge_page+0x657/0x710 May 18 06:27:00 mframe kernel: RSP <ffff8817f536fb80> May 18 06:27:00 mframe kernel: ---[ end trace 86bf182adf5f11e5 ]--- May 18 06:27:00 mframe kernel: BUG: sleeping function called from invalid context at ../include/linux/sched.h:2872 May 18 06:27:00 mframe kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 8713, name: java May 18 06:27:00 mframe kernel: CPU: 4 PID: 8713 Comm: java Tainted: P D W O 4.4.62-18.6-default #1 May 18 06:27:00 mframe kernel: Hardware name: Gigabyte Technology Co., Ltd. Default string/X99-UD4-CF, BIOS F22 06/13/2016 May 18 06:27:00 mframe kernel: 0000000000000000 ffffffff81328a87 ffff880035ed1300 0000000000002209 May 18 06:27:00 mframe kernel: ffffffff8108ddb1 0000000000000246 000000000000000b 0000000000002209 May 18 06:27:00 mframe kernel: ffffffff81081351 ffffffff00000000 0000000000000010 ffff8817f536fa00 May 18 06:27:00 mframe kernel: Call Trace: May 18 06:27:00 mframe kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 May 18 06:27:00 mframe kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 May 18 06:27:00 mframe kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 May 18 06:27:00 mframe kernel: [<ffffffff81328a87>] dump_stack+0x5c/0x85 May 18 06:27:00 mframe kernel: [<ffffffff8108ddb1>] exit_signals+0x21/0x130 May 18 06:27:00 mframe kernel: [<ffffffff81081351>] do_exit+0xb1/0xb60 May 18 06:27:00 mframe kernel: [<ffffffff8101a92c>] oops_end+0x9c/0xd0 May 18 06:27:00 mframe kernel: [<ffffffff81018470>] do_error_trap+0x70/0xd0 May 18 06:27:00 mframe kernel: [<ffffffff8160fdbe>] invalid_op+0x1e/0x30 May 18 06:27:00 mframe kernel: DWARF2 unwinder stuck at invalid_op+0x1e/0x30 May 18 06:27:00 mframe kernel: May 18 06:27:00 mframe kernel: Leftover inexact backtrace: May 18 06:27:00 mframe kernel: [<ffffffff811f1177>] ? __split_huge_page+0x657/0x710 May 18 06:27:00 mframe kernel: [<ffffffff811f1292>] ? split_huge_page_to_list+0x62/0xc0 May 18 06:27:00 mframe kernel: [<ffffffff811f1d2a>] ? __split_huge_page_pmd+0x1ca/0x4d0 May 18 06:27:00 mframe kernel: [<ffffffff8160989b>] ? thread_return+0x38/0x6bd May 18 06:27:00 mframe kernel: [<ffffffff811f2d93>] ? vma_adjust_trans_huge+0x93/0xe0 May 18 06:27:00 mframe kernel: [<ffffffff811c15c8>] ? vma_adjust+0x148/0x700 May 18 06:27:00 mframe kernel: [<ffffffff811c1c96>] ? __split_vma.isra.32+0x116/0x1d0 May 18 06:27:00 mframe kernel: [<ffffffff811c2b5d>] ? do_munmap+0xfd/0x390 May 18 06:27:00 mframe kernel: [<ffffffff811008d9>] ? get_futex_key+0x199/0x390 May 18 06:27:00 mframe kernel: [<ffffffff811c351c>] ? mmap_region+0x1dc/0x620 May 18 06:27:00 mframe kernel: [<ffffffff81100f54>] ? futex_wake+0x84/0x150 May 18 06:27:00 mframe kernel: [<ffffffff811c3c5c>] ? do_mmap+0x2fc/0x3d0 May 18 06:27:00 mframe kernel: [<ffffffff811a96bf>] ? vm_mmap_pgoff+0x8f/0xc0 May 18 06:27:00 mframe kernel: [<ffffffff811c21fa>] ? SyS_mmap_pgoff+0xfa/0x240 May 18 06:27:00 mframe kernel: [<ffffffff81079329>] ? syscall_slow_exit_work+0x39/0xc6 May 18 06:27:00 mframe kernel: [<ffffffff8160e072>] ? entry_SYSCALL_64_fastpath+0x16/0x71 May 18 06:27:00 mframe kernel: note: java[8713] exited with preempt_count 1 ------- Any help is greatly appreciated Jochen -- You are receiving this mail because: You are on the CC list for the bug.