On Wednesday 2021-10-20 22:00, Larry Len Rainey wrote:
Here is the output of the 1st strace [...]
[pid 26970] 14:51:03 poll([{fd=7, events=POLLIN}], 1, 25000) = 0 (Timeout) <25.025100>
I have seen that crap before. We had a program which would issue a volley of `systemctl status someunit`-like calls via dbus-1-python, and every now and then it would stop working for 25s. The moment we switched from making dbus calls ourselves to just actually running /bin/systemctl (as one or more subprocesses), the problem went away. From that, the closest hypothesis is that dbus has a silly rate limiter configured somewhere and that it applies per PID. Debugged it a little, but it was really frustrating - client (as an endpoint) requests unit status - dbus-daemon receives the message - dbus-daemon forwards the message to the systemd endpoint - systemd endpoint responds to dbus-daemon - ?? - client still waits on dbus-daemon answer - dbus-daemon (upon gdb attach) is waiting for network input from clients/endpoints so either dbus-daemon never noticed systemd responded, or dbus-daemon forgot to forward the reply back to the client, or the client library did not notice the reply. Needless to say, debugging never concluded with a satisfactory answer, because at some point, we also have real work to do.