Am 10.04.2018 um 17:01 schrieb Petr Mladek:
On Tue 2018-04-10 15:44:09, Stefan Priebe - Profihost AG wrote:
Am 10.04.2018 um 09:37 schrieb Petr Mladek:
On Mon 2018-04-09 16:05:09, Vlastimil Babka wrote:
On 04/06/2018 08:07 PM, Stefan Priebe - Profihost AG wrote:
Hello,
under memory pressure on a hypervisor running ksmd i had two deadlocks today where the machines rebootet due to lockups.
I just wonder if the system rebooted on its own or if some human rebooted it.
Let me paste after running via tac to fix the reversed order of lines:
18:17:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:17:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:15:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:13:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:11:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:09:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:07:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:05:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds.
It seems that pve-firewall was unblocked after 10 minutes. It is possible that ksmd and ksmtuned would get unblocked after several more minutes as well if the system was not rebooted.
Good question. I cannot guarantee that but we've set kernel.hung_task_panic = 0
The last message in the log is from the hung_task daemon. Therefore it should not cause the reboot.
kernel.softlockup_panic = 0
kernel.hardlockup_panic = 1 kernel.panic_on_oops = 1 kernel.unknown_nmi_panic = 1
kernel.panic = 20
This should give some time to see panic() messages on the console. They were not in the log. It means that either panic() was not called or it was not able to show them on the net console.
I am not sure how reliable is netconsole during panic(). Anyway, consoles never 100% reliable. The kernel on SLE12 even defers the console handling to a kthread to avoid softlockups. This might cause even longer delay. It tries harder to flush them during panic() but it is still not guaranteed.
Alternative way to see the entire log is the kernel crash dump. I wonder if you have it by chance.
No i don't. I appreciate all your help and comments. I think i just cannot provide enough information in this case. I'm sorry about this. I'll report back and have more information if i see this again. Greets, Stefan
Best Regards, Petr
PS: We should probably move to bugzilla.
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org