What | Removed | Added |
---|---|---|
CC | bruno.premont@restena.lu |
Got it here too on a serer (virtual, under VMWare) which is executing a rather large amount of nrpe checks (and thus sees a lot of batched forking) glbc complains via kernel log: systemd[1]: segfault at 1010514 ip 000000000047912e sp 00007fff9a1c5670 error 4 in systemd[400000+ed000] Started happending with update from systemd-208-23.3.x86_64 to systemd-208-28.1.x86_64. Looking up IP via addr2line using debuginfo&debugsource packages I get: addr2line -e /usr/lib/systemd/systemd 0x47912e /usr/src/debug/systemd-208/src/core/unit.c:1682 /usr/src/debug/systemd-208/src/core/unit.c 1677: 1678: void unit_unwatch_pid(Unit *u, pid_t pid) { 1679: assert(u); 1680: assert(pid >= 1); 1681: 1682: hashmap_remove_value(u->manager->watch_pids, LONG_TO_PTR(pid), u); 1683: set_remove(u->pids, LONG_TO_PTR(pid)); 1684: } 1685: This seems to match report from comment #8 with NULL u->manager. Once systemd has crashed nrpe zombies start piling up until kernel refuses more processes (clone() returns -1 with errno=EAGAIN) due to rlimit on per-user process count. One possible reason why nrpe triggers this bug more than anything else is that it forks a few levels deep for each check and seems to have some of its children reparented to init. nrpe is running as a daemon and not xinetd service.