Comment # 17 on bug 1042933 from
(In reply to Thomas Blume from comment #16)
> teviot:~ # /systemd-testsuite/run/test-watchdog 
> Hardware watchdo[  356.497530] hpwdt: tblume: reload variable is: 234
> [  356.497597] hpwdt: tblume: New timer passed in is 10 seconds
> g 'HPE iLO2+ HW [  356.499686] hpwdt: tblume: reload variable is: 78
> Watchdog Timer',[  356.499717] hpwdt: tblume: reload variable is: 78
>  version 0
> Set [  356.500778] hpwdt: tblume: reload variable is: 78
> hardware watchdog to 10s.
> Pinging...
> [  356.501780] hpwdt: tblume: reload variable is: 78
> [  357.745055] hpwdt: hpwdt_pretimeout: NMI raised

WTF?

That's a second after you start the watchdog, right? At least this is
what the kernel timestamps are saying:

357.745055 - 356.497530 =~ 1.24

Btw, from looking at that box (teviot) it does start the HW NMI watchdog:

[    0.128080] NMI watchdog: enabled on all CPUs, permanently consumes one
hw-PMU counter.

Can you disable the watchdog before you run the test:

# echo 0 > /proc/sys/kernel/nmi_watchdog

as root.

See if the NMI gets raised still.

If it does, do this:

static void hpwdt_ping(void)
{
        iowrite16(reload, hpwdt_timer_reg);

        pr_err("%s: reload: %d, time left: %d\n", __func__, reload,
hpwdt_time_left());
}

so that we can see what *actually* gets written into the timer each time.

Also, the third thing to try is try to reproduce on another HP box.
Maybe this one's hpwdt BIOS crap is busted (wouldn't be a stretch).

Thanks.


You are receiving this mail because: