[Bug 1173819] New: 'BUG: workqueue lockup' on ThunderX2 machines with kernel 4.12.14 in OBS

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 Bug ID: 1173819 Summary: 'BUG: workqueue lockup' on ThunderX2 machines with kernel 4.12.14 in OBS Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: aarch64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: guillaume.gardet@arm.com QA Contact: qa-bugs@suse.de CC: adrian.schroeter@suse.com, afaerber@suse.com, dmueller@suse.com Found By: --- Blocker: --- obs-arm-7, -8 and -9 are often down in OBS these days due to kernel issues. One issue is a workqueue lockup: BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 92265s! -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 Ismail Dönmez <idonmez@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |idonmez@suse.com -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c1 --- Comment #1 from Ismail Dönmez <idonmez@suse.com> --- Log from yesterday: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.526080] INFO: rcu_sched self-detected stall on CPU 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.531221] #011205-...: (5976 ticks this GP) idle=902/140000000000001/0 softirq=33742/33742 fqs=2851 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.540252] #011 (t=6001 jiffies g=28000 c=27999 q=176626) 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.545486] Task dump for CPU 205: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.546079] INFO: rcu_sched detected stalls on CPUs/tasks: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.548877] qemu-system-aar R 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.554351] running task 0 19576 6280 0x00000006 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.562872] Call trace: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.565315] dump_backtrace+0x0/0x188 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.568968] show_stack+0x24/0x30 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.572277] sched_show_task+0xec/0x138 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.576105] dump_cpu_task+0x48/0x58 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.579680] rcu_dump_cpu_stacks+0xa0/0xe8 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.583770] rcu_check_callbacks+0x6e4/0x938 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.588037] update_process_times+0x34/0x60 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.592216] tick_sched_handle.isra.6+0x38/0x70 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.596735] tick_sched_timer+0x4c/0x98 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.600561] __hrtimer_run_queues+0xc4/0x278 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.604819] hrtimer_interrupt+0xa8/0x228 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.608828] arch_timer_handler_phys+0x38/0x58 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.613269] handle_percpu_devid_irq+0x90/0x248 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.617789] generic_handle_irq+0x34/0x50 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.621786] __handle_domain_irq+0x68/0xc0 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.625873] gic_handle_irq+0x80/0x18c 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.629612] el1_irq+0xb0/0x140 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.632744] osq_lock+0x108/0x1b8 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.636048] rwsem_optimistic_spin+0x70/0x130 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.640398] rwsem_down_write_failed+0x48/0x200 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.644915] down_write+0x58/0x70 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.648228] ext4_file_write_iter+0x74/0x388 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.652492] __vfs_write+0xd0/0x148 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.655970] vfs_write+0xac/0x1b8 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.659274] SyS_pwrite64+0x8c/0xa8 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.662752] el0_svc_naked+0x44/0x48 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.666328] #011205-...: (5976 ticks this GP) idle=902/140000000000001/0 softirq=33742/33742 fqs=2852 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.675368] #011(detected by 17, t=6014 jiffies, g=28000, c=27999, q=177177) 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.675390] Task dump for CPU 205: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.685554] qemu-system-aar R running task 0 19576 6280 0x00000006 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692597] Call trace: 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692608] __switch_to+0xe4/0x150 2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692616] 0xffff89bcbd10 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.166037] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 47s! 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.166090] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 59s! 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.182238] Showing busy workqueues and worker pools: 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.182249] workqueue events: flags=0x0 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.191156] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.191167] pending: cache_reap 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.201884] pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.208960] in-flight: 1375:wait_rcu_exp_gp 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.213560] pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.220609] pending: cache_reap 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.224370] workqueue mm_percpu_wq: flags=0x8 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.228730] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.235865] pending: vmstat_update 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.250910] workqueue kblockd: flags=0x18 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.254959] pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.262179] pending: blk_mq_run_work_fn 2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.267886] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 22586 528 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.645557] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 205-... } 6205 jiffies s: 6281 root: 0x1000/. 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.656101] blocking rcu_node structures: l=1:192-207:0x2000/. 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.661946] Task dump for CPU 205: 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.665342] qemu-system-aar R running task 0 19576 6280 0x00000006 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.672427] Call trace: 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.674880] __switch_to+0xe4/0x150 2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.674884] 0xffff89bcbd10 2020-07-06T10:48:26+00:00 obs-arm-9 systemd-udevd[22482]: seq 12812 '/devices/virtual/block/loop2' is taking a long time 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.882768] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 78s! 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.890815] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=-20 stuck for 59s! 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.899089] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 90s! 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.907227] Showing busy workqueues and worker pools: 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.907238] workqueue events: flags=0x0 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916130] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916143] pending: cache_reap 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916185] pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.926769] in-flight: 1375:wait_rcu_exp_gp 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.938369] pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.945423] pending: cache_reap 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.945624] workqueue events_freezable_power_: flags=0x84 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.954332] pwq 356: cpus=178 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.961469] in-flight: 1612:disk_events_workfn 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.966298] pwq 322: cpus=161 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.973443] in-flight: 1595:disk_events_workfn 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.978308] workqueue mm_percpu_wq: flags=0x8 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.982663] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.989798] pending: vmstat_update 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.993654] pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.000700] pending: vmstat_update 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.004854] workqueue kblockd: flags=0x18 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.008903] pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.016124] pending: blk_mq_run_work_fn 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.021841] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 22586 528 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.029928] pool 322: cpus=161 node=1 flags=0x0 nice=0 hung=0s workers=3 idle: 199260 123733 2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.038410] pool 356: cpus=178 node=1 flags=0x0 nice=0 hung=0s workers=3 idle: 199074 122038 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.609482] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 108s! 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.617578] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=-20 stuck for 90s! 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.625795] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 121s! 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.633988] Showing busy workqueues and worker pools: 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.633994] workqueue events: flags=0x0 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634001] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634006] pending: cache_reap 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634048] pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.653482] in-flight: 1375:wait_rcu_exp_gp 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.653503] pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.665051] pending: cache_reap 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.665276] workqueue mm_percpu_wq: flags=0x8 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.679941] pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.687070] pending: vmstat_update 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.687133] pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.697891] pending: vmstat_update 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.698298] workqueue kblockd: flags=0x18 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.705690] pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.705695] pending: blk_mq_run_work_fn 2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.707145] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=1s workers=3 idle: 22586 528 -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c2 --- Comment #2 from Ismail Dönmez <idonmez@suse.com> --- Created attachment 839442 --> http://bugzilla.opensuse.org/attachment.cgi?id=839442&action=edit Log from yesterday -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 Andreas Färber <afaerber@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jcheung@suse.com, | |yousaf.kaukab@suse.com See Also| |https://bugzilla.suse.com/s | |how_bug.cgi?id=1165467 -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c3 --- Comment #3 from Adrian Schröter <adrian.schroeter@suse.com> --- both systems did hang up again similar. Kernel reports quite soon internal errors while handling ext4 jobs. Will attach full dmesg files. -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c4 --- Comment #4 from Adrian Schröter <adrian.schroeter@suse.com> --- Created attachment 839757 --> http://bugzilla.opensuse.org/attachment.cgi?id=839757&action=edit dmesg of obs-arm-8 -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c5 --- Comment #5 from Adrian Schröter <adrian.schroeter@suse.com> --- Created attachment 839758 --> http://bugzilla.opensuse.org/attachment.cgi?id=839758&action=edit dmesg of obs-arm-9 -- You are receiving this mail because: You are on the CC list for the bug.

http://bugzilla.opensuse.org/show_bug.cgi?id=1173819 http://bugzilla.opensuse.org/show_bug.cgi?id=1173819#c8 Guillaume GARDET <guillaume.gardet@arm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #8 from Guillaume GARDET <guillaume.gardet@arm.com> --- Leap 15.1 is EOL and I think we did not encountered this problem for a while now (likely because hosts have been upgraded). -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com