Am 10.04.2018 um 09:37 schrieb Petr Mladek:
On Mon 2018-04-09 16:05:09, Vlastimil Babka wrote:
On 04/06/2018 08:07 PM, Stefan Priebe - Profihost AG wrote:
Hello,
under memory pressure on a hypervisor running ksmd i had two deadlocks today where the machines rebootet due to lockups.
I just wonder if the system rebooted on its own or if some human rebooted it.
Let me paste after running via tac to fix the reversed order of lines:
18:17:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:17:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:15:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:13:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:11:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:09:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:07:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:05:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds.
It seems that pve-firewall was unblocked after 10 minutes. It is possible that ksmd and ksmtuned would get unblocked after several more minutes as well if the system was not rebooted.
Good question. I cannot guarantee that but we've set kernel.hung_task_panic = 0 kernel.softlockup_panic = 0 kernel.hardlockup_panic = 1 kernel.panic_on_oops = 1 kernel.unknown_nmi_panic = 1 kernel.panic = 20 May be it's related to this?
I had this trace on 3 different Servers in a row while they all had memory pressure due to a lot of virtual machine migrations. All of them are production servers so i'm not really willing to reproduce this...
It might be similar to the traffic jam once a sports match or a concert finishes. It might cause unusually long delays that disappear once the unusual amount of people moves outside the small area.
I do not say that there is not a bug somewhere or that the kernel could not do better. I just wonder if it might be reasonable to somehow limit also the number of migrations that happen at the same time.
Best Regards, Petr
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org