Mailinglist Archive: opensuse-bugs (4655 mails)

< Previous Next >
[Bug 1042933] kernel panic due to nmi caused by systemd-watchdog test
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Thu, 22 Jun 2017 06:07:28 +0000
  • Message-id: <bug-1042933-21960-1Is28hN4MT@http.bugzilla.suse.com/>
http://bugzilla.suse.com/show_bug.cgi?id=1042933
http://bugzilla.suse.com/show_bug.cgi?id=1042933#c14

--- Comment #14 from Thomas Blume <thomas.blume@xxxxxxxx> ---
(In reply to Borislav Petkov from comment #13)

Do you see anything in dmesg from the watchdog while the test runs, some
failure messages or so?

Unfortunately dmesg doesn't show any hint about the error.

If not, you could simply go and add pr_err() calls to
drivers/watchdog/hpwdt.c, more specifically hpwdt_ping() and dump
the reload variable there, hpwdt_change_timer() and a couple more
interesting.

I did the following tweak:

-->
@@ -452,18 +452,19 @@
static void hpwdt_ping(void)
{
iowrite16(reload, hpwdt_timer_reg);
+ pr_err("tblume: reload variable is: %d", reload);
}

static int hpwdt_change_timer(int new_margin)
{
if (new_margin < 1 || new_margin > HPWDT_MAX_TIMER) {
- pr_warn("New value passed in is invalid: %d seconds\n",
+ pr_err("tblume: New value passed in is invalid: %d seconds\n",
new_margin);
return -EINVAL;
}

soft_margin = new_margin;
- pr_debug("New timer passed in is %d seconds\n", new_margin);
+ pr_err("tblume: New timer passed in is %d seconds\n", new_margin);
reload = SECS_TO_TICKS(soft_margin);

return 0;
@@ -495,6 +496,9 @@
if (allow_kdump)
hpwdt_stop();

+ //tblume: suppress NMI
+ return NMI_HANDLED;
+
if (!is_icru && !is_uefi) {
if (cmn_regs.u1.ral == 0) {
nmi_panic(regs, "An NMI occurred, but unable to determine
source.\n");
--<

and got this result:

-->
# strace -r -f -o /tmp/strace-test-watchdog ./test-watchdog
Hardware watchdog 'HPE iLO2+ HW Watchdog Timer', version 0
[58494.591725] hpwdt: tblume: reload variable is: 234
[58494.592304] hpwdt: tblume: New timer passed in is 10 seconds
Set hardware watchdog to 10s.
[58494.594422] hpwdt: tblume: reload variable is: 78
[58494.594741] hpwdt: tblume: reload variable is: 78
Pinging...
[58494.595855] hpwdt: tblume: reload variable is: 78
Pinging...
[58494.597223] hpwdt: tblume: reload variable is: 78
Pinging...
[58499.598933] hpwdt: tblume: reload variable is: 78
Pinging...
[58504.600657] hpwdt: tblume: reload variable is: 78
[58509.602239] hpwdt: tblume: reload variable is: 78
[58509.604433] systemd-journald[376]: Sent WATCHDOG=1 notification.
Pinging...
[58514.603684] hpwdt: tblume: reload variable is: 78
teviot:/systemd-testsuite/run # [58519.604118] hpwdt: tblume: reload variable
is: 78
[58624.200498] systemd-journald[376]: Sent WATCHDOG=1 notification.
--<

Can you make any sense ouf of this or do I need to add more debugging?

--
You are receiving this mail because:
You are on the CC list for the bug.
< Previous Next >
References