On 04/06/2018 08:07 PM, Stefan Priebe - Profihost AG wrote:
Hello,
under memory pressure on a hypervisor running ksmd i had two deadlocks today where the machines rebootet due to lockups.
I see some of them wait in lru_add_drain_all(), I recall some recent issues with that upstream, Michal might know more, adding to CC.
Kernel was build from commit: 4fe1fb26557d69a4d0397113eb89d2dd7d6021b4
What i could grab from netconsole is:
Let me paste after running via tac to fix the reversed order of lines: 18:17:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:17:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:17:45 ksmd D ffff883f7fbb7c08 0 409 2 0x00080000 18:17:45 ffff883f7fbb7c08 ffff887f7dad81c0 ffff883f00000031 ffff883f7fbfa580 18:17:45 ffff883f7fbb8000 ffff883f7fbb7d58 ffff883f7fbb7d50 ffff883f7fbfa580 18:17:45 0000000000000038 ffff883f7fbb7c20 ffffffff816e15f5 7fffffffffffffff 18:17:45 Call Trace: 18:17:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:17:45 [<ffffffff816e4a4a>] schedule_timeout+0x23a/0x2d0 18:17:45 [<ffffffff816e2998>] wait_for_completion+0xa8/0x110 18:17:45 [<ffffffff8109ecae>] flush_work+0xfe/0x170 18:17:45 [<ffffffff8116b18d>] lru_add_drain_all+0x13d/0x190 18:17:45 [<ffffffff811b2adb>] ksm_do_scan+0x69b/0xe00 18:17:45 [<ffffffff811b32c5>] ksm_scan_thread+0x85/0x1b0 18:17:45 [<ffffffff810a4add>] kthread+0xed/0x110 18:17:45 [<ffffffff816e60c5>] ret_from_fork+0x55/0x80 18:17:45 DWARF2 unwinder stuck at ret_from_fork+0x55/0x80 18:17:45 18:17:45 Leftover inexact backtrace: 18:17:45 [<ffffffff810a49f0>] ? kthread_park+0x60/0x60 18:17:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:17:45 Not tainted 4.4.126+557-ph #1 18:17:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:17:45 ksmtuned D ffff883e8d0afd08 0 2259 1 0x00080000 18:17:45 ffff883e8d0afd08 ffffffff811d3ca2 ffff883e0000001e ffff887f7a228000 18:17:45 ffff883e8d0b0000 ffffffff81e8d584 ffff887f7a228000 00000000ffffffff 18:17:45 ffffffff81e8d588 ffff883e8d0afd20 ffffffff816e15f5 ffffffff81e8d580 18:17:45 Call Trace: 18:17:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:17:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:17:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:17:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:17:45 [<ffffffff811b21c8>] run_store+0x48/0x2c0 18:17:45 [<ffffffff813d44d2>] kobj_attr_store+0x12/0x20 18:17:45 [<ffffffff81250d5c>] sysfs_kf_write+0x3c/0x50 18:17:45 [<ffffffff81250903>] kernfs_fop_write+0x143/0x180 18:17:45 [<ffffffff811d548b>] __vfs_write+0x2b/0x130 18:17:45 [<ffffffff811d60a9>] vfs_write+0xa9/0x190 18:17:45 [<ffffffff811d71c6>] SyS_write+0x46/0xa0 18:17:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:17:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:17:45 18:17:45 Leftover inexact backtrace: 18:15:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:15:45 Not tainted 4.4.126+557-ph #1 18:15:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:15:45 ksmd D ffff883f7fbb7c08 0 409 2 0x00080000 18:15:45 ffff883f7fbb7c08 ffff887f7dad81c0 ffff883f00000031 ffff883f7fbfa580 18:15:45 ffff883f7fbb8000 ffff883f7fbb7d58 ffff883f7fbb7d50 ffff883f7fbfa580 18:15:45 0000000000000038 ffff883f7fbb7c20 ffffffff816e15f5 7fffffffffffffff 18:15:45 Call Trace: 18:15:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:15:45 [<ffffffff816e4a4a>] schedule_timeout+0x23a/0x2d0 18:15:45 [<ffffffff816e2998>] wait_for_completion+0xa8/0x110 18:15:45 [<ffffffff8109ecae>] flush_work+0xfe/0x170 18:15:45 [<ffffffff8116b18d>] lru_add_drain_all+0x13d/0x190 18:15:45 [<ffffffff811b2adb>] ksm_do_scan+0x69b/0xe00 18:15:45 [<ffffffff811b32c5>] ksm_scan_thread+0x85/0x1b0 18:15:45 [<ffffffff810a4add>] kthread+0xed/0x110 18:15:45 [<ffffffff816e60c5>] ret_from_fork+0x55/0x80 18:15:45 DWARF2 unwinder stuck at ret_from_fork+0x55/0x80 18:15:45 18:15:45 Leftover inexact backtrace: 18:15:45 [<ffffffff810a49f0>] ? kthread_park+0x60/0x60 18:15:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 Not tainted 4.4.126+557-ph #1 18:15:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:15:45 ksmtuned D ffff883e8d0afd08 0 2259 1 0x00080000 18:15:45 ffff883e8d0afd08 ffffffff811d3ca2 ffff883e0000001e ffff887f7a228000 18:15:45 ffff883e8d0b0000 ffffffff81e8d584 ffff887f7a228000 00000000ffffffff 18:15:45 ffffffff81e8d588 ffff883e8d0afd20 ffffffff816e15f5 ffffffff81e8d580 18:15:45 Call Trace: 18:15:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:15:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:15:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:15:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:15:45 [<ffffffff811b21c8>] run_store+0x48/0x2c0 18:15:45 [<ffffffff813d44d2>] kobj_attr_store+0x12/0x20 18:15:45 [<ffffffff81250d5c>] sysfs_kf_write+0x3c/0x50 18:15:45 [<ffffffff81250903>] kernfs_fop_write+0x143/0x180 18:15:45 [<ffffffff811d548b>] __vfs_write+0x2b/0x130 18:15:45 [<ffffffff811d60a9>] vfs_write+0xa9/0x190 18:15:45 [<ffffffff811d71c6>] SyS_write+0x46/0xa0 18:15:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:15:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:15:45 18:15:45 Leftover inexact backtrace: 18:15:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:15:45 Not tainted 4.4.126+557-ph #1 18:15:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:15:45 pve-firewall D ffff887f69793c98 0 2914 1 0x00080000 18:15:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:15:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:15:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:15:45 Call Trace: 18:15:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:15:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:15:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:15:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:15:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:15:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:15:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:15:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:15:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:15:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:15:45 18:15:45 Leftover inexact backtrace: 18:13:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:13:45 Not tainted 4.4.126+557-ph #1 18:13:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:13:45 pve-firewall D ffff887f69793c98 0 2914 1 0x00080000 18:13:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:13:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:13:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:13:45 Call Trace: 18:13:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:13:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:13:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:13:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:13:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:13:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:13:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:13:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:13:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:13:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:13:45 18:13:45 Leftover inexact backtrace: 18:11:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:11:45 Not tainted 4.4.126+557-ph #1 18:11:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:11:45 pve-firewall D ffff887f69793c98 0 2914 1 0x00080000 18:11:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:11:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:11:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:11:45 Call Trace: 18:11:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:11:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:11:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:11:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:11:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:11:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:11:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:11:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:11:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:11:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:11:45 18:11:45 Leftover inexact backtrace: 18:09:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:09:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:09:45 pve-firewall D ffff887f69793c98 0 2914 1 0x00080000 18:09:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:09:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:09:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:09:45 Call Trace: 18:09:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:09:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:09:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:09:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:09:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:09:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:09:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:09:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:09:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:09:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:09:45 18:09:45 Leftover inexact backtrace: 18:07:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:07:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:07:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:07:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:07:45 Call Trace: 18:07:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:07:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:07:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:07:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:07:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:07:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:07:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:07:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:07:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:07:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:07:45 18:07:45 Leftover inexact backtrace: 18:05:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:05:45 Not tainted 4.4.126+557-ph #1 18:05:45 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 18:05:45 pve-firewall D ffff887f69793c98 0 2914 1 0x00080000 18:05:45 ffff887f69793c98 ffff887f69793c68 ffffffff00000035 ffff887f735a0000 18:05:45 ffff887f69794000 ffff887f69748364 ffff887f735a0000 00000000ffffffff 18:05:45 ffff887f69748368 ffff887f69793cb0 ffffffff816e15f5 ffff887f69748360 18:05:45 Call Trace: 18:05:45 [<ffffffff816e15f5>] schedule+0x35/0x80 18:05:45 [<ffffffff816e18fe>] schedule_preempt_disabled+0xe/0x10 18:05:45 [<ffffffff816e3325>] __mutex_lock_slowpath+0x95/0x110 18:05:45 [<ffffffff816e33b7>] mutex_lock+0x17/0x30 18:05:45 [<ffffffff811e2ef4>] path_openat+0x3b4/0x1200 18:05:45 [<ffffffff811e591e>] do_filp_open+0x7e/0xe0 18:05:45 [<ffffffff811d504c>] do_sys_open+0x12c/0x210 18:05:45 [<ffffffff811d514e>] SyS_open+0x1e/0x20 18:05:45 [<ffffffff816e5c85>] entry_SYSCALL_64_fastpath+0x1e/0xc9 18:05:45 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xc9 18:05:45 18:05:45 Leftover inexact backtrace: -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org