Comment # 7 on bug 1042933 from Thomas Blume

(In reply to Borislav Petkov from comment #6)
> Well, someone had the brilliant idea to panic the system unconditionally
> in the hpwdt driver:
> 
> hpwdt_pretimeout
> |-> nmi_panic
> 
>         nmi_panic(regs, "An NMI occurred. Depending on your system the
> reason "
>                 "for the NMI is logged in any one of the following "
>                 "resources:\n"
>                 "1. Integrated Management Log (IML)\n"
>                 "2. OA Syslog\n"
>                 "3. OA Forward Progress Log\n"
>                 "4. iLO Event Log");
> 
> Apparently, that driver goes down into the BIOS to ask what the NMI
> reason was but since the test is causing the NMI and doesn't do any
> special dancing to tell the BIOS that it is a test running and that when
> asked, the BIOS should reply something so that the watchdog doesn't
> panic the system, this happens.
> 
> But from looking at your testcase again, you only want to ping it once:
> 
> 	ioctl(watchdog_fd, WDIOC_KEEPALIVE, 0);
> 
> and then close it. Right?

No sorry, I left out parts of the code that comes after the call that causes
the crash.
The full code is:

-->
        r = watchdog_set_timeout(&t);
        if (r < 0)
               printf("Failed to open watchdog");

        for (i = 0; i < 5; i++) {
                printf("Pinging...");
                r = watchdog_ping();
                if (r < 0)
                        printf("Failed to ping watchdog: %m", r);

                usleep(t/2);
        }

        watchdog_close(true);
        return 0;
--<

> If so, from looking at the code, it apparently expects a magical 'V'
> written to /dev/watchdog so that when you close /dev/watchdog (or your
> test exists) it will stop the timer and won't trigger an NMI.
> 
> So, IINM, if you write a 'V' at the end of open_watchdog(), it should
> work.
> Alternatively, you can simply send it WDIOS_DISABLECARD flag with
> WDIOC_SETOPTIONS and it'll stop the watchdog timer too.

Thanks for the hints, this seems to be included in the watchdog_close function:

-->
void watchdog_close(bool disarm) {
[...]
         if (disarm) {
                int flags;

                /* Explicitly disarm it */
                flags = WDIOS_DISABLECARD;
                r = ioctl(watchdog_fd, WDIOC_SETOPTIONS, &flags);
                if (r < 0)
                        log_warning_errno(errno, "Failed to disable hardware
watchdog: %m");

                /* To be sure, use magic close logic, too */
                for (;;) {
                        static const char v = 'V';

                        if (write(watchdog_fd, &v, 1) > 0)
                                break;

                        if (errno != EINTR) {
                                log_error_errno(errno, "Failed to disarm
watchdog timer: %m");
                                break;
                        }
                }
--<

Still, the  kernel panics in watchdog_set_timeout, e.g. before watchdog_close
is reached.
I saw another message on the console like:

"unexpected close, not stopping watchdog"

So, I guess this is the reason why the watchdog timer expires, triggering the
panic.
However, this seems to be a bug in the systemd testcode, not a kernel issue.
Thanks for the background, I tanking back this bug.

> I still fail to see what this is then even testing but whatever...

The test was introduced with this commit:

-->
commit e96d6be763014be75d480fde503d0b77f41194a0
Author: Lennart Poettering <lennart@poettering.net>
Date:   Thu Apr 5 22:08:10 2012 +0200

    systemd: add hardware watchdog support

    This adds minimal hardware watchdog support to PID 1. The idea is that
    PID 1 supervises and watchdogs system services, while the hardware
    watchdog is used to supervise PID 1.

    This adds two hardware watchdog configuration options, for the runtime
    watchdog and for a shutdown watchdog. The former is active during normal
    operation, the latter only at reboots to ensure that if a clean reboot
    times out we reboot nonetheless.

    If the runtime watchdog is enabled PID 1 will automatically wake up at
    half the configured interval and write to the watchdog daemon.
[...]
--<