Bug ID 1039737
Summary Kernel BUG at ../mm/huge_memory.c / split_huge_page
Classification openSUSE
Product openSUSE Distribution
Version Leap 42.2
Hardware x86-64
OS openSUSE 42.2
Status NEW
Severity Major
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter hansper@t-online.de
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

There seems to be a kernel bug in openSuSE 42.2, kernel 4.4.62-18.6-default.

It is also present in kernel 4.4.57-18.3.1-default.


It leads to processes hanging that can't be killed and the system becomes
unstable, reboot fails (hangs forever; reset switch has to be pressed)

The system uses 128GB of non-ECC RAM. Kernel command line is

BOOT_IMAGE=/boot/vmlinuz-4.4.62-18.6-default root=UUID=xxx quiet splash
showopts kvm-intel.nested=1 pci=noaer

The bug has occurred multiple times now (usually after 100-150h uptime). It
seems to be independent of how much system RAM is used, it occurs if just 30GB
are used as well as 90+GB used.

The kernel is tainted because the official nvidia driver (from openSuSE
repositories) is present.

Here is the most recent inexact backtrace (this time a Java program was
affected, other times Evolution or other programs were affected):

May 18 06:27:00 mframe kernel: ------------[ cut here ]------------
May 18 06:27:00 mframe kernel: kernel BUG at ../mm/huge_memory.c:1983!
May 18 06:27:00 mframe kernel: invalid opcode: 0000 [#1] SMP 
May 18 06:27:00 mframe kernel: Modules linked in: isofs xt_nat xt_tcpudp veth
nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ipt_MASQ
May 18 06:27:00 mframe kernel:  kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel ir_lirc_codec lirc_dev ir_xmp_dec
May 18 06:27:00 mframe kernel:  ehci_pci drm_kms_helper xhci_hcd syscopyarea
ehci_hcd ahci sysfillrect sysimgblt libahci fb_sys_fops usbcor
May 18 06:27:00 mframe kernel: CPU: 4 PID: 8713 Comm: java Tainted: P        W 
O     4.4.62-18.6-default #1
May 18 06:27:00 mframe kernel: Hardware name: Gigabyte Technology Co., Ltd.
Default string/X99-UD4-CF, BIOS F22 06/13/2016
May 18 06:27:00 mframe kernel: task: ffff880035ed1300 ti: ffff8817f536c000
task.ti: ffff8817f536c000
May 18 06:27:00 mframe kernel: RIP: 0010:[<ffffffff811f1177>] 
[<ffffffff811f1177>] __split_huge_page+0x657/0x710
May 18 06:27:00 mframe kernel: RSP: 0018:ffff8817f536fb80  EFLAGS: 00010286
May 18 06:27:00 mframe kernel: RAX: 0000001f77ce8067 RBX: 00000000f9b00000 RCX:
8000000cccd00867
May 18 06:27:00 mframe kernel: RDX: ffff881f77ce8800 RSI: 00003ffffffff000 RDI:
0000001f77ce8000
May 18 06:27:00 mframe kernel: RBP: ffffea0033334000 R08: 0000000000000800 R09:
ffffea0000000000
May 18 06:27:00 mframe kernel: R10: 0000000000000000 R11: 0000000000000e68 R12:
ffff8817f5a16590
May 18 06:27:00 mframe kernel: R13: ffff8817db87ee68 R14: ffff880000000000 R15:
0000160000000000
May 18 06:27:00 mframe kernel: FS:  00007ff685e07700(0000)
GS:ffff881faf500000(0000) knlGS:0000000000000000
May 18 06:27:00 mframe kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 18 06:27:00 mframe kernel: CR2: 00007ff6add3c768 CR3: 0000001f62a51000 CR4:
00000000003426e0
May 18 06:27:00 mframe kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
May 18 06:27:00 mframe kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
May 18 06:27:00 mframe kernel: Stack:
May 18 06:27:00 mframe kernel:  ffff881fac5ff800 00000000f9c00000
00000000000f9a00 0000000001f77ce8
May 18 06:27:00 mframe kernel:  00000000f9a00000 ffffea0033330000
ffff881f7c292e80 0000001f77ce8067
May 18 06:27:00 mframe kernel:  0000000100000000 ffffea005f6e1fb0
0000001f77ce8067 ffffea0033330000
May 18 06:27:00 mframe kernel: Call Trace:
May 18 06:27:00 mframe kernel:  [<ffffffff811f1292>]
split_huge_page_to_list+0x62/0xc0
May 18 06:27:00 mframe kernel:  [<ffffffff811f1d2a>]
__split_huge_page_pmd+0x1ca/0x4d0
May 18 06:27:00 mframe kernel:  [<ffffffff811f2d93>]
vma_adjust_trans_huge+0x93/0xe0
May 18 06:27:00 mframe kernel:  [<ffffffff811c15c8>] vma_adjust+0x148/0x700
May 18 06:27:00 mframe kernel:  [<ffffffff811c1c96>]
__split_vma.isra.32+0x116/0x1d0
May 18 06:27:00 mframe kernel:  [<ffffffff811c2b5d>] do_munmap+0xfd/0x390
May 18 06:27:00 mframe kernel:  [<ffffffff811c351c>] mmap_region+0x1dc/0x620
May 18 06:27:00 mframe kernel:  [<ffffffff811c3c5c>] do_mmap+0x2fc/0x3d0
May 18 06:27:00 mframe kernel:  [<ffffffff811a96bf>] vm_mmap_pgoff+0x8f/0xc0
May 18 06:27:00 mframe kernel:  [<ffffffff811c21fa>] SyS_mmap_pgoff+0xfa/0x240
May 18 06:27:00 mframe kernel:  [<ffffffff8160e072>]
entry_SYSCALL_64_fastpath+0x16/0x71
May 18 06:27:00 mframe kernel: DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x16/0x71
May 18 06:27:00 mframe kernel: 
May 18 06:27:00 mframe kernel: Leftover inexact backtrace:
May 18 06:27:00 mframe kernel: Code: 5c 41 5d 41 5e 41 5f c3 31 c0 eb ba 65 ff
0d b9 ab e1 7e f3 90 49 8b 06 a9 00 00 80 00 75 f4 65 ff 05 
May 18 06:27:00 mframe kernel: RIP  [<ffffffff811f1177>]
__split_huge_page+0x657/0x710
May 18 06:27:00 mframe kernel:  RSP <ffff8817f536fb80>
May 18 06:27:00 mframe kernel: ---[ end trace 86bf182adf5f11e5 ]---
May 18 06:27:00 mframe kernel: BUG: sleeping function called from invalid
context at ../include/linux/sched.h:2872
May 18 06:27:00 mframe kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 8713,
name: java
May 18 06:27:00 mframe kernel: CPU: 4 PID: 8713 Comm: java Tainted: P      D W 
O     4.4.62-18.6-default #1
May 18 06:27:00 mframe kernel: Hardware name: Gigabyte Technology Co., Ltd.
Default string/X99-UD4-CF, BIOS F22 06/13/2016
May 18 06:27:00 mframe kernel:  0000000000000000 ffffffff81328a87
ffff880035ed1300 0000000000002209
May 18 06:27:00 mframe kernel:  ffffffff8108ddb1 0000000000000246
000000000000000b 0000000000002209
May 18 06:27:00 mframe kernel:  ffffffff81081351 ffffffff00000000
0000000000000010 ffff8817f536fa00
May 18 06:27:00 mframe kernel: Call Trace:
May 18 06:27:00 mframe kernel:  [<ffffffff81019ea9>] dump_trace+0x59/0x320
May 18 06:27:00 mframe kernel:  [<ffffffff8101a26a>]
show_stack_log_lvl+0xfa/0x180
May 18 06:27:00 mframe kernel:  [<ffffffff8101b011>] show_stack+0x21/0x40
May 18 06:27:00 mframe kernel:  [<ffffffff81328a87>] dump_stack+0x5c/0x85
May 18 06:27:00 mframe kernel:  [<ffffffff8108ddb1>] exit_signals+0x21/0x130
May 18 06:27:00 mframe kernel:  [<ffffffff81081351>] do_exit+0xb1/0xb60
May 18 06:27:00 mframe kernel:  [<ffffffff8101a92c>] oops_end+0x9c/0xd0
May 18 06:27:00 mframe kernel:  [<ffffffff81018470>] do_error_trap+0x70/0xd0
May 18 06:27:00 mframe kernel:  [<ffffffff8160fdbe>] invalid_op+0x1e/0x30
May 18 06:27:00 mframe kernel: DWARF2 unwinder stuck at invalid_op+0x1e/0x30
May 18 06:27:00 mframe kernel: 
May 18 06:27:00 mframe kernel: Leftover inexact backtrace:
May 18 06:27:00 mframe kernel:  [<ffffffff811f1177>] ?
__split_huge_page+0x657/0x710
May 18 06:27:00 mframe kernel:  [<ffffffff811f1292>] ?
split_huge_page_to_list+0x62/0xc0
May 18 06:27:00 mframe kernel:  [<ffffffff811f1d2a>] ?
__split_huge_page_pmd+0x1ca/0x4d0
May 18 06:27:00 mframe kernel:  [<ffffffff8160989b>] ? thread_return+0x38/0x6bd
May 18 06:27:00 mframe kernel:  [<ffffffff811f2d93>] ?
vma_adjust_trans_huge+0x93/0xe0
May 18 06:27:00 mframe kernel:  [<ffffffff811c15c8>] ? vma_adjust+0x148/0x700
May 18 06:27:00 mframe kernel:  [<ffffffff811c1c96>] ?
__split_vma.isra.32+0x116/0x1d0
May 18 06:27:00 mframe kernel:  [<ffffffff811c2b5d>] ? do_munmap+0xfd/0x390
May 18 06:27:00 mframe kernel:  [<ffffffff811008d9>] ?
get_futex_key+0x199/0x390
May 18 06:27:00 mframe kernel:  [<ffffffff811c351c>] ? mmap_region+0x1dc/0x620
May 18 06:27:00 mframe kernel:  [<ffffffff81100f54>] ? futex_wake+0x84/0x150
May 18 06:27:00 mframe kernel:  [<ffffffff811c3c5c>] ? do_mmap+0x2fc/0x3d0
May 18 06:27:00 mframe kernel:  [<ffffffff811a96bf>] ? vm_mmap_pgoff+0x8f/0xc0
May 18 06:27:00 mframe kernel:  [<ffffffff811c21fa>] ?
SyS_mmap_pgoff+0xfa/0x240
May 18 06:27:00 mframe kernel:  [<ffffffff81079329>] ?
syscall_slow_exit_work+0x39/0xc6
May 18 06:27:00 mframe kernel:  [<ffffffff8160e072>] ?
entry_SYSCALL_64_fastpath+0x16/0x71
May 18 06:27:00 mframe kernel: note: java[8713] exited with preempt_count 1
-------

Any help is greatly appreciated

Jochen


You are receiving this mail because: