(In reply to Borislav Petkov from comment #6) > Well, someone had the brilliant idea to panic the system unconditionally > in the hpwdt driver: > > hpwdt_pretimeout > |-> nmi_panic > > nmi_panic(regs, "An NMI occurred. Depending on your system the > reason " > "for the NMI is logged in any one of the following " > "resources:\n" > "1. Integrated Management Log (IML)\n" > "2. OA Syslog\n" > "3. OA Forward Progress Log\n" > "4. iLO Event Log"); > > Apparently, that driver goes down into the BIOS to ask what the NMI > reason was but since the test is causing the NMI and doesn't do any > special dancing to tell the BIOS that it is a test running and that when > asked, the BIOS should reply something so that the watchdog doesn't > panic the system, this happens. > > But from looking at your testcase again, you only want to ping it once: > > ioctl(watchdog_fd, WDIOC_KEEPALIVE, 0); > > and then close it. Right? No sorry, I left out parts of the code that comes after the call that causes the crash. The full code is: --> r = watchdog_set_timeout(&t); if (r < 0) printf("Failed to open watchdog"); for (i = 0; i < 5; i++) { printf("Pinging..."); r = watchdog_ping(); if (r < 0) printf("Failed to ping watchdog: %m", r); usleep(t/2); } watchdog_close(true); return 0; --< > If so, from looking at the code, it apparently expects a magical 'V' > written to /dev/watchdog so that when you close /dev/watchdog (or your > test exists) it will stop the timer and won't trigger an NMI. > > So, IINM, if you write a 'V' at the end of open_watchdog(), it should > work. > Alternatively, you can simply send it WDIOS_DISABLECARD flag with > WDIOC_SETOPTIONS and it'll stop the watchdog timer too. Thanks for the hints, this seems to be included in the watchdog_close function: --> void watchdog_close(bool disarm) { [...] if (disarm) { int flags; /* Explicitly disarm it */ flags = WDIOS_DISABLECARD; r = ioctl(watchdog_fd, WDIOC_SETOPTIONS, &flags); if (r < 0) log_warning_errno(errno, "Failed to disable hardware watchdog: %m"); /* To be sure, use magic close logic, too */ for (;;) { static const char v = 'V'; if (write(watchdog_fd, &v, 1) > 0) break; if (errno != EINTR) { log_error_errno(errno, "Failed to disarm watchdog timer: %m"); break; } } --< Still, the kernel panics in watchdog_set_timeout, e.g. before watchdog_close is reached. I saw another message on the console like: "unexpected close, not stopping watchdog" So, I guess this is the reason why the watchdog timer expires, triggering the panic. However, this seems to be a bug in the systemd testcode, not a kernel issue. Thanks for the background, I tanking back this bug. > I still fail to see what this is then even testing but whatever... The test was introduced with this commit: --> commit e96d6be763014be75d480fde503d0b77f41194a0 Author: Lennart Poettering <lennart@poettering.net> Date: Thu Apr 5 22:08:10 2012 +0200 systemd: add hardware watchdog support This adds minimal hardware watchdog support to PID 1. The idea is that PID 1 supervises and watchdogs system services, while the hardware watchdog is used to supervise PID 1. This adds two hardware watchdog configuration options, for the runtime watchdog and for a shutdown watchdog. The former is active during normal operation, the latter only at reboots to ensure that if a clean reboot times out we reboot nonetheless. If the runtime watchdog is enabled PID 1 will automatically wake up at half the configured interval and write to the watchdog daemon. [...] --<