[Bug 958346] New: systemd hangs/dies randomly after weeks of runtime
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346 Bug ID: 958346 Summary: systemd hangs/dies randomly after weeks of runtime Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: x86-64 OS: openSUSE 42.1 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: robin.roth@kit.edu QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Symptom:
From our about 50 machines (all identical setup, different hardware) running 42.1, within 2 weeks of the last reboot about 10 are affected. At some point all calls to systemd fail, like dbus[882]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out There are no other related log messages hinting to a problem in systemd before.
This seems to happen independently of the use of the machine. Systemd won't respond to anything after that. All systemctl calls fail, ''kill 1'' and ''kill -9 1'' don't work, also reboot/shutdown won't work. Rebooting the machine fixes the problem temporarily. Do you have suggestions how to debug this? So far we haven't found a way to trigger the issue and waiting weeks with many machines potentially failing isn't a nice option. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c1
--- Comment #1 from Robin Roth
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c2
Bernhard Wiedemann
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c3
--- Comment #3 from Robin Roth
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c4
--- Comment #4 from Robin Roth
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c5
Howard Guo
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c6
--- Comment #6 from Robin Roth
Hello Robin.
I have got similar failures more consistently on a different hardware platform. I have several servers running on KVM, the ones with severely capped IO throughout can easily reproduce the issue:
1. Cap the IO throughout to about 5MB/s 2. Enable swap file (increase demand for IO throughout) 3. Create heavy IO congestion by launching an IO and memory intensive operation, it must be small enough not to trigger OOM but large enough to evict almost all file cache. The system load climbs to 20 for a single CPU system. 4. Issue a systemctl command such as stopping a unit, while the above operation is in progress, observe a timeout due to heavy system load. 5. Stop the IO congestion and wait several seconds, then reissue the systemctl command. There is a good chance of timeout and all further systemctl commands always timeout.
While I do not know enough about systemd to understand what went wrong, but I could work around it by running the operation in a systemd unit file with very low IO and CPU scheduling priority.
I'm curious to know, what sort of workload do the machines run ?
-- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=958346
http://bugzilla.opensuse.org/show_bug.cgi?id=958346#c7
Robin Roth
participants (1)
-
bugzilla_noreply@novell.com