Bug ID 1190482
Summary kernel BUG at mm/slub.c creates stale workers
Classification openSUSE
Product openSUSE.org
Version unspecified
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component BuildService
Assignee screening-team-bugs@suse.de
Reporter code@bnavigator.de
QA Contact adrian.schroeter@suse.com
Found By ---
Blocker ---

Starting last week, many (but not all) builds of spyder from
openSUSE:Factory/spyder, devel:languages:python:numeric/spyder and branched
projects became stale for days because of kernel crashes.

The following code is in spyder.spec:

%check
...
function testspyder() {
   xvfb-run --server-args "-screen 0 1920x1080x24" python3 runtests.py -m "not
no_xvfb" --timeout 1800 -ra -k "not (${donttest:4})" $@
   # wait a bit until we can start the next xvfb
   sleep 5
}
testspyder
testspyder --run-slow


Relevant buildlogs:

x86_64 and i596:

[  928s] = 921 passed, 91 skipped, 176 deselected, 2 xfailed, 26 warnings in
862.01s (0:14:22) =
[  931s] + sleep 5
[  936s] [  927.261691][ T3247] kernel BUG at mm/slub.c:321!
[  936s] [  927.263137][ T3247] invalid opcode: 0000 [#1] SMP NOPTI
[  936s] [  927.264717][ T3247] CPU: 2 PID: 3247 Comm: sh Not tainted
5.14.1-1-default #1 openSUSE Tumbleweed
77bbc82e23666d88b5be1f7477a6fc9946523f12
[  936s] [  927.265564][ T3247] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[  936s] [  927.265564][ T3247] RIP: 0010:__slab_free+0x22d/0x420
[  936s] [  927.265564][ T3247] Code: 00 44 8b 44 24 14 44 0f b6 54 24 26 48 8b
54 24 18 8b 74 24 20 48 89 44 24 08 44 0f b6 4c 24 27 48 8b 7c 24 28 e9 8d fe
ff ff <0f> 0b 49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
[  936s] [  927.265564][ T3247] RSP: 0018:ffffab1244adbb80 EFLAGS: 00010046
[  936s] [  927.265564][ T3247] RAX: ffff9b84c3528c60 RBX: ffff9b84c3528c00
RCX: ffff9b84c3528c00
[  936s] [  927.265564][ T3247] RDX: 0000000080150005 RSI: fffff7fd840d4a00
RDI: ffff9b84c0042800
[  936s] [  927.265564][ T3247] RBP: ffffab1244adbc30 R08: 0000000000000001
R09: ffffffff8cac69d5
[  936s] [  927.265564][ T3247] R10: ffffab1244adbcd7 R11: 0000000000000000
R12: fffff7fd840d4a00
[  936s] [  927.265564][ T3247] R13: ffff9b84c3528c00 R14: ffff9b84c0042800
R15: ffff9b84c3528c00
[  936s] [  927.265564][ T3247] FS:  00007fe60dabbc00(0000)
GS:ffff9b85f7c80000(0000) knlGS:0000000000000000
[  936s] [  927.265564][ T3247] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[  936s] [  927.265564][ T3247] CR2: 000055c3d786eb0c CR3: 00000001035b2000
CR4: 00000000003506e0
[  936s] [  927.265564][ T3247] Call Trace:
[  936s] [  927.265564][ T3247]  ? put_ucounts+0x75/0x90
[  936s] [  927.265564][ T3247]  kfree+0x352/0x3c0
[  936s] [  927.265564][ T3247]  put_ucounts+0x75/0x90
[  936s] [  927.265564][ T3247]  __sigqueue_free.part.0+0x3e/0x60
[  936s] [  927.265564][ T3247]  dequeue_signal+0x12a/0x1f0
[  936s] [  927.265564][ T3247]  get_signal+0x206/0x8b0
[  936s] [  927.265564][ T3247]  ? kmem_cache_free+0x1d0/0x3e0
[  936s] [  927.265564][ T3247]  ? call_rcu+0xdd/0x7d0
[  936s] [  927.265564][ T3247]  arch_do_signal_or_restart+0xfd/0x730
[  936s] [  927.265564][ T3247]  ? do_sigaction+0x116/0x280
[  936s] [  927.265564][ T3247]  ? queued_spin_unlock+0x5/0x10
[  936s] [  927.265564][ T3247]  ? wp_page_reuse+0x61/0x70
[  936s] [  927.265564][ T3247]  ? __handle_mm_fault+0xd66/0x1520
[  936s] [  927.265564][ T3247]  exit_to_user_mode_prepare+0x12c/0x230
[  936s] [  927.265564][ T3247]  syscall_exit_to_user_mode+0x18/0x40
[  936s] [  927.265564][ T3247]  do_syscall_64+0x69/0x80
[  936s] [  927.265564][ T3247]  ? handle_mm_fault+0xcf/0x2a0
[  936s] [  927.265564][ T3247]  ? do_user_addr_fault+0x1d5/0x670
[  936s] [  927.265564][ T3247]  ? do_syscall_64+0x69/0x80
[  936s] [  927.265564][ T3247]  ? exc_page_fault+0x68/0x130
[  936s] [  927.265564][ T3247]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  936s] [  927.265564][ T3247] RIP: 0033:0x7fe60db73d8b
[  936s] [  927.265564][ T3247] Code: 48 85 f6 74 15 48 b9 00 00 00 80 01 00 00
00 48 8b 06 48 85 c8 75 48 49 89 f0 41 ba 08 00 00 00 4c 89 c6 b8 0e 00 00 00
0f 05 <89> c2 f7 da 3d 00 f0 ff ff b8 00 00 00 00 0f 47 c2 48 8b 94 24 88
[  936s] [  927.265564][ T3247] RSP: 002b:00007ffe6e8c36d0 EFLAGS: 00000246
ORIG_RAX: 000000000000000e
[  936s] [  927.265564][ T3247] RAX: 0000000000000000 RBX: 0000000000000000
RCX: 00007fe60db73d8b
[  936s] [  927.265564][ T3247] RDX: 0000000000000000 RSI: 00007ffe6e8c37c0
RDI: 0000000000000002
[  936s] [  927.265564][ T3247] RBP: 000055c3d787a810 R08: 00007ffe6e8c37c0
R09: 000055c3d7873480
[  936s] [  927.265564][ T3247] R10: 0000000000000008 R11: 0000000000000246
R12: 000055c3d787a810
[  936s] [  927.265564][ T3247] R13: 000055c3d787a810 R14: 0000000000000000
R15: 00007ffe6e8c37c0
[  936s] [  927.265564][ T3247] Modules linked in: ata_generic crc32_pclmul
ata_piix qemu_fw_cfg overlay e1000 nls_iso8859_1 nls_cp437 vfat fat virtio_blk
virtio_mmio xfs btrfs blake2b_generic xor raid6_pq libcrc32c reiserfs ext4
crc32c_intel mbcache jbd2 squashfs fuse dm_snapshot dm_bufio dm_crypt essiv
authenc trusted asn1_encoder tee dm_mod binfmt_misc loop sg virtio_rng
[  936s] [  927.265564][ T3247] ---[ end trace 62125b68cc9ddb14 ]---
[  936s] [  927.265564][ T3247] RIP: 0010:__slab_free+0x22d/0x420
[  936s] [  927.265564][ T3247] Code: 00 44 8b 44 24 14 44 0f b6 54 24 26 48 8b
54 24 18 8b 74 24 20 48 89 44 24 08 44 0f b6 4c 24 27 48 8b 7c 24 28 e9 8d fe
ff ff <0f> 0b 49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
[  936s] [  927.265564][ T3247] RSP: 0018:ffffab1244adbb80 EFLAGS: 00010046
[  936s] [  927.265564][ T3247] RAX: ffff9b84c3528c60 RBX: ffff9b84c3528c00
RCX: ffff9b84c3528c00
[  936s] [  927.265564][ T3247] RDX: 0000000080150005 RSI: fffff7fd840d4a00
RDI: ffff9b84c0042800
[  936s] [  927.265564][ T3247] RBP: ffffab1244adbc30 R08: 0000000000000001
R09: ffffffff8cac69d5
[  936s] [  927.265564][ T3247] R10: ffffab1244adbcd7 R11: 0000000000000000
R12: fffff7fd840d4a00
[  936s] [  927.265564][ T3247] R13: ffff9b84c3528c00 R14: ffff9b84c0042800
R15: ffff9b84c3528c00
[  936s] [  927.265564][ T3247] FS:  00007fe60dabbc00(0000)
GS:ffff9b85f7c80000(0000) knlGS:0000000000000000
[  936s] [  927.265564][ T3247] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[  936s] [  927.265564][ T3247] CR2: 000055c3d786eb0c CR3: 00000001035b2000
CR4: 00000000003506e0
[  996s] [  987.270511][    C5] rcu: INFO: rcu_sched detected stalls on
CPUs/tasks:
[  996s] [  987.272683][    C5] rcu:     2-...0: (2 GPs behind)
idle=7be/1/0x4000000000000000 softirq=120217/120217 fqs=7500 
[  996s] [  987.274486][    C5]     (detected by 5, t=15003 jiffies, g=290705,
q=559)
[  996s] [  987.274486][    C5] Sending NMI from CPU 5 to CPUs 2:
[  996s] [  987.274486][    C5] NMI backtrace for cpu 2 skipped: idling at
native_halt+0xa/0x10


armv7l:
...
[ 1059s] [ 1015.375123][ T3213] kernel BUG at mm/slub.c:321!
[ 1059s] [ 1015.375420][ T3213] Internal error: Oops - BUG: 0 [#1] SMP
..


The workers then chew on the crash for days.

I cannot reproduce the behavior on a local osc build.

Whatever is wrong with the specfile, Spyder or Xvfb, it should not be able to
crash the kernel and block an obs worker for days.


You are receiving this mail because: