Bug ID 917060
Summary Build system lock-ups with kernel 3.18.5 and 3.18.6
Classification openSUSE
Product openSUSE Factory
Version 201502*
Hardware Other
OS Other
Status NEW
Severity Major
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter dimstar@opensuse.org
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

Since the checkin of kernel 3.18.5 (previous was 3.18.4) into openSUSE:Factory,
we see a lot of stuck build jobs, especially when building python3-base and
ruby2.1

In a fortunate case, I got a kernel stack trace:

[ 1640.851086] ------------[ cut here ]------------
[ 1680s] [ 1640.851110] kernel BUG at ../arch/x86/mm/fault-xen.c:922!
[ 1680s] [ 1640.851114] invalid opcode: 0000 [#2] SMP 
[ 1680s] [ 1640.851120] Modules linked in: xenblk cdrom nls_iso8859_1 nls_cp437
vfat fat nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack reiserfs squashfs fuse
dm_snapshot dm_bufio dm_mod binfmt_misc loop
[ 1680s] [ 1640.851146] CPU: 0 PID: 4317 Comm: python Tainted: G      D W     
3.18.5-3-xen #1
[ 1680s] [ 1640.851150] Hardware name: Xen 4.1.2_17-5.2.3 PV guest
[ 1680s] [ 1640.851154] task: ffff8800025e4150 ti: ffff88002b764000 task.ti:
ffff88002b764000
[ 1680s] [ 1640.851158] RIP: e030:[<ffffffff8059f60f>]  [<ffffffff8059f60f>]
mm_fault_error+0x13c/0x16a
[ 1680s] [ 1640.851170] RSP: e02b:ffff88002b767e18  EFLAGS: 00010246
[ 1680s] [ 1640.851173] RAX: ffff8800025e4150 RBX: 0000000000000006 RCX:
0000000000000040
[ 1680s] [ 1640.851176] RDX: 00007fff6cebbe80 RSI: 0000000000000000 RDI:
ffff88002b767f58
[ 1680s] [ 1640.851180] RBP: 0000000000000040 R08: 0000000000000000 R09:
0000000000000000
[ 1680s] [ 1640.851183] R10: ffff88002bb70b38 R11: ffff8800000005d8 R12:
00007fff6cebbe80
[ 1680s] [ 1640.851186] R13: ffff88002b767f58 R14: ffff8800025ecbc0 R15:
ffff8800025e4150
[ 1680s] [ 1640.851194] FS:  00007ff1203a7700(0000) GS:ffff88002de00000(0000)
knlGS:0000000000000000
[ 1680s] [ 1640.851197] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1680s] [ 1640.851201] CR2: 00007fff6cebbe80 CR3: 000000002b780000 CR4:
0000000000000660
[ 1680s] [ 1640.851204] Stack:
[ 1680s] [ 1640.851207]  00000000000000a9 0000000000000006 00007fff6cebbe80
ffff88002b767f58
[ 1680s] [ 1640.851213]  ffff8800025ecbc0 ffffffff8002944f ffffffff800c304e
ffff8800025ecc28
[ 1680s] [ 1640.851220]  0000000000000002 00000000000d3862 0000000000000002
0000018f000041ed
[ 1680s] [ 1640.851226] Call Trace:
[ 1680s] [ 1640.851249]  [<ffffffff8002944f>] __do_page_fault+0x46f/0x530
[ 1680s] [ 1640.851260]  [<ffffffff805abd68>] page_fault+0x28/0x30
[ 1680s] [ 1640.851268]  [<00007ff11fdf9ee7>] 0x7ff11fdf9ee7
[ 1680s] [ 1640.851272] Code: c7 c7 b8 1b 79 80 e8 ba 07 00 00 be 04 00 03 00
5b 41 89 e8 4c 89 e2 4c 89 f1 5d 41 5c 41 5d 41 5e bf 07 00 00 00 e9 a9 f2 ff
ff <0f> 0b f6 c3 04 0f 85 f5 fe ff ff 48 8b b8 b8 03 00 00 48 83 c7 
[ 1680s] [ 1640.851332] RIP  [<ffffffff8059f60f>] mm_fault_error+0x13c/0x16a
[ 1680s] [ 1640.851337]  RSP <ffff88002b767e18>
[ 1680s] [ 1640.851358] ---[ end trace 729f7244c440f301 ]---

in most other cases I 'only' see
[ 1387s] [381/382] test_multiprocessing_fork
[ 1943s] [ 1920.184633] INFO: task khugepaged:26 blocked for more than 480
seconds.
[ 1944s] [ 1920.185429]       Not tainted 3.18.5-3-default #1
[ 1944s] [ 1920.185935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1944s] [ 1920.187064] INFO: task python:393 blocked for more than 480
seconds.
[ 1944s] [ 1920.191965]       Not tainted 3.18.5-3-default #1
[ 1944s] [ 1920.192526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2423s] [ 2400.195910] INFO: task khugepaged:26 blocked for more than 480
seconds.
[ 2423s] [ 2400.196728]       Not tainted 3.18.5-3-default #1
[ 2423s] [ 2400.197422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2423s] [ 2400.198542] INFO: task python:393 blocked for more than 480
seconds.
[ 2423s] [ 2400.199242]       Not tainted 3.18.5-3-default #1
[ 2423s] [ 2400.199836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2903s] [ 2880.200432] INFO: task khugepaged:26 blocked for more than 480
seconds.
[ 2903s] [ 2880.201397]       Not tainted 3.18.5-3-default #1
[ 2903s] [ 2880.207970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2903s] [ 2880.209339] INFO: task python:393 blocked for more than 480
seconds.
[ 2903s] [ 2880.210274]       Not tainted 3.18.5-3-default #1
[ 2903s] [ 2880.210962] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3383s] [ 3360.212469] INFO: task khugepaged:26 blocked for more than 480
seconds.
[ 3383s] [ 3360.213351]       Not tainted 3.18.5-3-default #1
[ 3383s] [ 3360.213935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3383s] [ 3360.215026] INFO: task python:393 blocked for more than 480
seconds.
[ 3383s] [ 3360.216342]       Not tainted 3.18.5-3-default #1
[ 3383s] [ 3360.216928] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3863s] [ 3840.216462] INFO: task khugepaged:26 blocked for more than 480
seconds.
[ 3863s] [ 3840.217344]       Not tainted 3.18.5-3-default #1
[ 3863s] [ 3840.217928] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3863s] [ 3840.219243] INFO: task python:393 blocked for more than 480
seconds.
[ 3863s] [ 3840.220141]       Not tainted 3.18.5-3-default #1
[ 3863s] [ 3840.220730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[32670s] qemu: terminating on signal 15 from pid 7056
[32670s] ### WATCHDOG MARKER END ###

This kind of issue has not been observed with the previous kernel 3.18.3 (.4
was skipped)

To reproduce:
- Create a new project with the kernel 3.18.5 and have it build (especially
kernel-obs-build), then branch python3-base into the same project.
home:dimstar:python is such a setup, just with kernel 3.18.3 (which works fine)


You are receiving this mail because: