On Mon 2018-04-09 16:05:09, Vlastimil Babka wrote:
On 04/06/2018 08:07 PM, Stefan Priebe - Profihost AG wrote:
Hello,
under memory pressure on a hypervisor running ksmd i had two deadlocks today where the machines rebootet due to lockups.
I just wonder if the system rebooted on its own or if some human rebooted it.
Let me paste after running via tac to fix the reversed order of lines:
18:17:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:17:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task ksmd:409 blocked for more than 120 seconds. 18:15:45 INFO: task ksmtuned:2259 blocked for more than 120 seconds. 18:15:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:13:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:11:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:09:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:07:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds. 18:05:45 INFO: task pve-firewall:2914 blocked for more than 120 seconds.
It seems that pve-firewall was unblocked after 10 minutes. It is possible that ksmd and ksmtuned would get unblocked after several more minutes as well if the system was not rebooted.
I had this trace on 3 different Servers in a row while they all had memory pressure due to a lot of virtual machine migrations. All of them are production servers so i'm not really willing to reproduce this...
It might be similar to the traffic jam once a sports match or a concert finishes. It might cause unusually long delays that disappear once the unusual amount of people moves outside the small area. I do not say that there is not a bug somewhere or that the kernel could not do better. I just wonder if it might be reasonable to somehow limit also the number of migrations that happen at the same time. Best Regards, Petr -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org