Re: [opensuse-kernel] SLES12-SP3 or opensuse 42.3 deadlocks under memory pressure?

10 Apr 2018

      Am 10.04.2018 um 17:01 schrieb Petr Mladek:
...
On Tue 2018-04-10 15:44:09, Stefan Priebe - Profihost AG wrote:
...
Am 10.04.2018 um 09:37 schrieb Petr Mladek:
...
On Mon 2018-04-09 16:05:09, Vlastimil Babka wrote:
...
On 04/06/2018 08:07 PM, Stefan Priebe - Profihost AG wrote:
...
Hello,
under memory pressure on a hypervisor running ksmd i had two deadlocks
today where the machines rebootet due to lockups.
I just wonder if the system rebooted on its own or if some human
rebooted it.
...
Let me paste after running via tac to fix the reversed order of lines:
18:17:45     INFO: task ksmd:409 blocked for more than 120 seconds.
18:17:45     INFO: task ksmtuned:2259 blocked for more than 120 seconds.
18:15:45     INFO: task ksmd:409 blocked for more than 120 seconds.
18:15:45     INFO: task ksmtuned:2259 blocked for more than 120 seconds.
18:15:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
18:13:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
18:11:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
18:09:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
18:07:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
18:05:45     INFO: task pve-firewall:2914 blocked for more than 120 seconds.
It seems that pve-firewall was unblocked after 10 minutes. It is
possible that ksmd and ksmtuned would get unblocked after several
more minutes as well if the system was not rebooted.
Good question. I cannot guarantee that but we've set
kernel.hung_task_panic = 0
The last message in the log is from the hung_task daemon. Therefore it
should not cause the reboot.
...
kernel.softlockup_panic = 0
kernel.hardlockup_panic = 1
kernel.panic_on_oops = 1
kernel.unknown_nmi_panic = 1
kernel.panic = 20
This should give some time to see panic() messages on the console.
They were not in the log. It means that either panic() was not called
or it was not able to show them on the net console.
I am not sure how reliable is netconsole during panic(). Anyway,
consoles never 100% reliable. The kernel on SLE12 even defers
the console handling to a kthread to avoid softlockups. This might
cause even longer delay. It tries harder to flush them during
panic() but it is still not guaranteed.
Alternative way to see the entire log is the kernel crash dump.
I wonder if you have it by chance.
No i don't. I appreciate all your help and comments. I think i just
cannot provide enough information in this case. I'm sorry about this.

I'll report back and have more information if i see this again.

Greets,
Stefan
...
Best Regards,
Petr
PS: We should probably move to bugzilla.
-- 
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org