Comment # 5 on bug 958346 from Howard Guo

Hello Robin.

I have got similar failures more consistently on a different hardware platform.
I have several servers running on KVM, the ones with severely capped IO
throughout can easily reproduce the issue:

1. Cap the IO throughout to about 5MB/s
2. Enable swap file (increase demand for IO throughout)
3. Create heavy IO congestion by launching an IO and memory intensive
operation, it must be small enough not to trigger OOM but large enough to evict
almost all file cache. The system load climbs to 20 for a single CPU system.
4. Issue a systemctl command such as stopping a unit, while the above operation
is in progress, observe a timeout due to heavy system load.
5. Stop the IO congestion and wait several seconds, then reissue the systemctl
command. There is a good chance of timeout and all further systemctl commands
always timeout.

While I do not know enough about systemd to understand what went wrong, but I
could work around it by running the operation in a systemd unit file with very
low IO and CPU scheduling priority.

I'm curious to know, what sort of workload do the machines run ?