Comment # 24 on bug 1042933 from
The timeout programmed into the hpwdt is the time in seconds before the
Automated Server Recovery, ASR, resets the system.  You also need to consider
the pre-timeout.

Quoting Documentation/watchdog/watchdog-api.txt:

"Some watchdog timers can be set to have a trigger go off before the
actual time they will reset the system.  This can be done with an NMI,
interrupt, or other mechanism.  This allows Linux to record useful
information (like panic information and kernel coredumps) before it
resets."

Hpwdt implements the pretimeout feature. For Proliant systems the length of
time for the pre-timeout is 9 seconds.  That is 9 seconds before the ASR would
reset the system, an NMI is sent to it.  Receipt of the NMI allows hpwdt to
initiate a crash dump.

In your example, by setting the timeout to 10 seconds, and updating only every
5 seconds, the system should get an NMI sent to it 1 second after your app
pings the timer.  (the driver implicitly pings once when starting the timer.)

> And in order to avoid the panicking, change hpwdt_pretimeout() to do:
>
>         if (allow_kdump)
>                 hpwdt_stop();
>
>       return NMI_HANDLED;
>

Minor nit:  return NMI_DONE;

This allows io_check_error to re-enable the IOCK.  This allows repeated testing
as otherwise subsequent NMIs would be blocked from delivery.

You also might want to consider:
    tools/testing/selftests/watchdog/watchdog-test.c

This tool allows tweaking timeout/ping rates as a parameter.


You are receiving this mail because: