Mailinglist Archive: opensuse-bugs (4655 mails)

< Previous Next >
[Bug 1042933] kernel panic due to nmi caused by systemd-watchdog test
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Thu, 22 Jun 2017 17:56:31 +0000
  • Message-id: <bug-1042933-21960-3KZ9098jTy@http.bugzilla.suse.com/>
http://bugzilla.suse.com/show_bug.cgi?id=1042933
http://bugzilla.suse.com/show_bug.cgi?id=1042933#c17

--- Comment #17 from Borislav Petkov <bpetkov@xxxxxxxx> ---
(In reply to Thomas Blume from comment #16)
teviot:~ # /systemd-testsuite/run/test-watchdog
Hardware watchdo[ 356.497530] hpwdt: tblume: reload variable is: 234
[ 356.497597] hpwdt: tblume: New timer passed in is 10 seconds
g 'HPE iLO2+ HW [ 356.499686] hpwdt: tblume: reload variable is: 78
Watchdog Timer',[ 356.499717] hpwdt: tblume: reload variable is: 78
version 0
Set [ 356.500778] hpwdt: tblume: reload variable is: 78
hardware watchdog to 10s.
Pinging...
[ 356.501780] hpwdt: tblume: reload variable is: 78
[ 357.745055] hpwdt: hpwdt_pretimeout: NMI raised

WTF?

That's a second after you start the watchdog, right? At least this is
what the kernel timestamps are saying:

357.745055 - 356.497530 =~ 1.24

Btw, from looking at that box (teviot) it does start the HW NMI watchdog:

[ 0.128080] NMI watchdog: enabled on all CPUs, permanently consumes one
hw-PMU counter.

Can you disable the watchdog before you run the test:

# echo 0 > /proc/sys/kernel/nmi_watchdog

as root.

See if the NMI gets raised still.

If it does, do this:

static void hpwdt_ping(void)
{
iowrite16(reload, hpwdt_timer_reg);

pr_err("%s: reload: %d, time left: %d\n", __func__, reload,
hpwdt_time_left());
}

so that we can see what *actually* gets written into the timer each time.

Also, the third thing to try is try to reproduce on another HP box.
Maybe this one's hpwdt BIOS crap is busted (wouldn't be a stretch).

Thanks.

--
You are receiving this mail because:
You are on the CC list for the bug.
< Previous Next >
References