On Monday, 14 September 2020 9:15:30 ACST Simon Lees wrote:
On 9/13/20 5:45 PM, Rodney Baker wrote:
Hi all,
For some time I've been seeing shutdown delays of up to 2min 30 sec on Tumbleweed. The system reports, "A stop job is running for /run/user/<user_id> min:sec (counting down)", where min:sec can be anywhere up to 2-3 minutes.
Once this timer runs down, shutdown proceeds as normal. If I log out of the desktop session first, then log into a VTY session as root and enter "Poweroff", shutdown takes less than 10 seconds.
My guess is that there is a race condition going on with some process still holding a file open in /run/user/<user_id> (whcih is a tmpfs file system, when the system is attempting to destroy it, but I have been so far completely unsuccessful in identifying what that could be. I've even tried enabling a debug terminal session on VTY9 but that's given no clue, either.
How should I go about debugging this? So far, I don't even have enough information to raise a useful bug report (if one is even justified, given that it could be a local config issue rather than something systemic with Tumbleweed).
It may not be a bug, openSUSE is configured so that when you log out graphically any background tasks and programs that you have running will remain running. This is useful for people who use screen etc and keep background processes running, however if you want to make sure that when someone logs out any of there remaining processes are killed you can set KillUserProcesses=yes in /etc/systemd/logind.conf This may be a quick and easy way of resolving the issue.
Thanks, Simon, but that didn't fix it (see the correction I posted further up the thread re the error message: it's actually "A Stop Job is running for User Manager for UID <user-id>". I've set KillUserProcesses=yes in /etc/systemd/logind.conf and rebooted, then tested again, but with the same result. This previously happened somewhere in a previous version, in the early days of the 'wicked' network daemon; the issue then was a race condition where the network was shut down before all network shares were closed, resulting in those processes timing out trying to close files. It was supposedly primarily related to nfs shares, but I've never had nfs shares mounted on this machine. Whatever was done to fix that did fix the problem for me back then. That's what makes me think that this might be a similar situation - a race condition involving a process timing out because a dependency is terminated first, but I have no clue as to exactly what that dependency is. It might be a simple matter of editing a systemd unit file or two to make sure that processes shut down in the correct order. I'd love to help debug it, if only I knew what to do. I did do an "lsof | grep '/run/user'" while that stop job was running last time I shut it down, and noted that Pulse Audio and a couple of other processes still had files open in /run/user/<user-id>, but I don't know if that's significant or not. Running lsof with no parameters showed a whole stack of fles still open, but I don't really know which ones are significant or not at that stage of the shutdown process. I guess I could check all of those owned by my user, but would that be useful? -- ============================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au CCNA #CSCO12880208 ============================================================== -- To unsubscribe, e-mail: opensuse-support+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-support+owner@opensuse.org