http://bugzilla.suse.com/show_bug.cgi?id=1042933
http://bugzilla.suse.com/show_bug.cgi?id=1042933#c7
--- Comment #7 from Thomas Blume
Well, someone had the brilliant idea to panic the system unconditionally in the hpwdt driver:
hpwdt_pretimeout |-> nmi_panic
nmi_panic(regs, "An NMI occurred. Depending on your system the reason " "for the NMI is logged in any one of the following " "resources:\n" "1. Integrated Management Log (IML)\n" "2. OA Syslog\n" "3. OA Forward Progress Log\n" "4. iLO Event Log");
Apparently, that driver goes down into the BIOS to ask what the NMI reason was but since the test is causing the NMI and doesn't do any special dancing to tell the BIOS that it is a test running and that when asked, the BIOS should reply something so that the watchdog doesn't panic the system, this happens.
But from looking at your testcase again, you only want to ping it once:
ioctl(watchdog_fd, WDIOC_KEEPALIVE, 0);
and then close it. Right?
No sorry, I left out parts of the code that comes after the call that causes the crash. The full code is: --> r = watchdog_set_timeout(&t); if (r < 0) printf("Failed to open watchdog"); for (i = 0; i < 5; i++) { printf("Pinging..."); r = watchdog_ping(); if (r < 0) printf("Failed to ping watchdog: %m", r); usleep(t/2); } watchdog_close(true); return 0; --<
If so, from looking at the code, it apparently expects a magical 'V' written to /dev/watchdog so that when you close /dev/watchdog (or your test exists) it will stop the timer and won't trigger an NMI.
So, IINM, if you write a 'V' at the end of open_watchdog(), it should work. Alternatively, you can simply send it WDIOS_DISABLECARD flag with WDIOC_SETOPTIONS and it'll stop the watchdog timer too.
Thanks for the hints, this seems to be included in the watchdog_close function: --> void watchdog_close(bool disarm) { [...] if (disarm) { int flags; /* Explicitly disarm it */ flags = WDIOS_DISABLECARD; r = ioctl(watchdog_fd, WDIOC_SETOPTIONS, &flags); if (r < 0) log_warning_errno(errno, "Failed to disable hardware watchdog: %m"); /* To be sure, use magic close logic, too */ for (;;) { static const char v = 'V'; if (write(watchdog_fd, &v, 1) > 0) break; if (errno != EINTR) { log_error_errno(errno, "Failed to disarm watchdog timer: %m"); break; } } --< Still, the kernel panics in watchdog_set_timeout, e.g. before watchdog_close is reached. I saw another message on the console like: "unexpected close, not stopping watchdog" So, I guess this is the reason why the watchdog timer expires, triggering the panic. However, this seems to be a bug in the systemd testcode, not a kernel issue. Thanks for the background, I tanking back this bug.
I still fail to see what this is then even testing but whatever...
The test was introduced with this commit:
-->
commit e96d6be763014be75d480fde503d0b77f41194a0
Author: Lennart Poettering