NetworkManager dispatcher script does not finish when going to sleep

Hi *, starting with the last two NetworkManager versions, my dispatcher script does not finish when going to sleep. It finishes on resume, though, causing some strange effects. It's runtime is 1 or 2 seconds only, so this is not a problem of hitting the dispatcher timeout. Anyway in this case it wouldn't continue after resume, I believe. Any idea, what is happening here, and how to avoid it? TIA. Bye. Michael.

On 13.07.2024 17:11, mh@mike.franken.de wrote:
Hi *,
starting with the last two NetworkManager versions, my dispatcher script
On which distribution?
Why do you believe it? How dispatcher script is different from any other running program that will continue to run across suspend/resume? Besides, quoting manual page: Dispatcher scripts are run one at a time ... once a script is queued, it will always be run, even if a later event renders it obsolete. Your script may not even begin execution before resume because some other scripts have been running.

On Samstag, 13. Juli 2024 16:25:16 MESZ Andrei Borzenkov wrote:
openSUSE Tumbleweed 20240711, sry
If it would have been killed before, because of running into the dispatcher timeout, I'd suspect it would be running afterwards!?!? I already had that situation, but the actual behavior is different.
It is the only dispatcher script. And the same happens, if it is linked into no-wait.d.

On Samstag, 13. Juli 2024 17:44:54 MESZ Andrei Borzenkov wrote:
Sry, I am not a native speaker, so sometimes it is not easy to find the correct wording. But: No, the system takes another about 50 seconds to finally enter sleep state. Even systemd scripts in /usr/lib/systemd/system-sleep/ are started and finished afterwards. And whats more: It worked for at least a few years without any problems. As I also wrote, the problems started with the before last NM version. My script didn't change in the last months. My conclusion: Something must have changed in NM.

On Samstag, 13. Juli 2024 19:06:31 MESZ Andrei Borzenkov wrote:
Is NetworkManager-dispatcher really a user process?
Yes, in my log I call date at start an end: 01-mh_dispatch-enp62s0-pre-down.log: + /bin/date Fri Jul 12 00:33:09 CEST 2024 ... ... + /bin/date Fri Jul 12 10:27:42 CEST 2024

On Samstag, 13. Juli 2024 22:29:32 MESZ Carlos E. R. wrote:
Ok. So is it possible, that the problem is neither my dispatcher script nor NetworkManager itself, but the fact, that NetworkManager also gets frozen too early? The follwing journal excerpt looks a bit strange to me, because it seems, that not only my dispatcher script, but also NM itself continues with disconnecting eth0 on resume: Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.2284] manager: sleep: sleep requested (sleeping: no enabled: yes) Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.4159] manager: NetworkManager state is now ASLEEP Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.4163] device (enp62s0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed') Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.4163] dhcp4 (enp62s0): canceled DHCP transaction Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.4163] dhcp4 (enp62s0): activation: beginning transaction (timeout in 45 seconds) Jul 13 22:24:41 client NetworkManager[2539]: <info> [1720902281.4164] dhcp4 (enp62s0): state changed no lease Jul 13 22:27:21 client NetworkManager[2539]: <info> [1720902441.2287] manager: sleep: wake requested (sleeping: yes enabled: yes) Jul 13 22:27:21 client NetworkManager[2539]: <info> [1720902441.2320] manager: NetworkManager state is now DISCONNECTING Jul 13 22:27:23 client NetworkManager[2539]: <info> [1720902443.0419] device (enp62s0): carrier: link connected Jul 13 22:28:41 client NetworkManager[2539]: <info> [1720902521.2619] device (enp62s0): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed') Jul 13 22:28:41 client NetworkManager[2539]: <info> [1720902521.3118] manager: NetworkManager state is now DISCONNECTED Jul 13 22:28:41 client NetworkManager[2539]: <info> [1720902521.3127] policy: auto-activating connection 'eth0_(LAN)' (25b6d1be-8a6b-4762-88da-cd27eb0a0c89) Jul 13 22:28:41 client NetworkManager[2539]: <info> [1720902521.3131] device (enp62s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0586] device (enp62s0): carrier: link connected Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0591] device (enp62s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0623] policy: auto-activating connection 'eth0_(LAN)' (25b6d1be-8a6b-4762-88da-cd27eb0a0c89) Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0630] device (enp62s0): Activation: starting connection 'eth0_(LAN)' (25b6d1be-8a6b-4762-88da-cd27eb0a0c89) Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0631] device (enp62s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.0634] manager: NetworkManager state is now CONNECTING Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.1110] device (enp62s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.1434] device (enp62s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:45 client NetworkManager[2539]: <info> [1720902525.1452] dhcp4 (enp62s0): activation: beginning transaction (timeout in 45 seconds) Jul 13 22:28:48 client NetworkManager[2539]: <info> [1720902528.8286] device (enp62s0): carrier: link connected Jul 13 22:28:49 client NetworkManager[2539]: <info> [1720902529.8293] dhcp4 (enp62s0): state changed new lease, address=192.168.1.42, acd pending Jul 13 22:28:49 client NetworkManager[2539]: <info> [1720902529.9641] dhcp4 (enp62s0): state changed new lease, address=192.168.1.42 Jul 13 22:28:49 client NetworkManager[2539]: <info> [1720902529.9649] policy: set 'eth0_(LAN)' (enp62s0) as default for IPv4 routing and DNS Jul 13 22:28:50 client NetworkManager[2539]: <info> [1720902530.2659] device (enp62s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:51 client NetworkManager[2539]: <info> [1720902531.3682] device (enp62s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:51 client NetworkManager[2539]: <info> [1720902531.3684] device (enp62s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Jul 13 22:28:51 client NetworkManager[2539]: <info> [1720902531.3688] manager: NetworkManager state is now CONNECTED_SITE Jul 13 22:28:51 client NetworkManager[2539]: <info> [1720902531.3692] device (enp62s0): Activation: successful, device activated. Jul 13 22:28:51 client NetworkManager[2539]: <info> [1720902531.7302] manager: NetworkManager state is now CONNECTED_GLOBAL

On Samstag, 13. Juli 2024 22:44:23 MESZ mh@mike.franken.de wrote:
It seems, that NM indeed gets frozen, before it could deactivate the network card, and continues after resume. Without my dispatcher script, NM can finish the deactivation before sleep state has been reached. Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2150] manager: sleep: sleep requested (sleeping: no enabled: yes) Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2166] manager: NetworkManager state is now ASLEEP Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2168] device (enp62s0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed') Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2169] dhcp4 (enp62s0): canceled DHCP transaction Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2169] dhcp4 (enp62s0): activation: beginning transaction (timeout in 45 seconds) Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.2169] dhcp4 (enp62s0): state changed no lease Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.4575] device (enp62s0): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed') Jul 15 12:35:16 client NetworkManager[2542]: <info> [1721039716.5085] device (enp62s0): state change: disconnected -> unmanaged (reason 'unmanaged-sleeping', sys-iface-state: 'managed') Jul 15 12:37:10 client NetworkManager[2542]: <info> [1721039830.2101] manager: sleep: wake requested (sleeping: yes enabled: yes) Jul 15 12:37:10 client NetworkManager[2542]: <info> [1721039830.2106] device (enp62s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Jul 15 12:37:10 client NetworkManager[2542]: <info> [1721039830.2426] manager: NetworkManager state is now DISCONNECTEDJul 15 12:37:13 client NetworkManager[2542]: <info> [1721039833.3308] device (enp62s0): carrier: link connected Jul 15 12:37:13 client NetworkManager[2542]: <info> [1721039833.3313] device (enp62s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') The question is why? My dispatcher script takes only 1-2 seconds - can this really be too long?

On Montag, 15. Juli 2024 12:45:54 MESZ mh@mike.franken.de wrote:
It has something to do with nfs, as it seems. Commenting out the part, where nfs shares should be umounted, NM finishes as expected before entering sleep state - even if I add an additional "wait 3" in the dispatcher script.

On Tuesday, 16 July 2024 02:10:43 ACST mh@mike.franken.de wrote:
Have you tried doing on-demand nfs mounts with systemd-mountd instead? I have that working here and it is seamless. It mounts the nfs shares when file/ folder access is requested by a user, and then unmounts them after a configurable period of time when no files are open (or when the system is shutting down). The same can be done with remote samba shares (I have both types of shares mounted that way). If users' home directories are mounted via nfs, it should still work because it will (or at least should) auto-mount at the appropriate time. It needs the appropriate .mount and .automount files created in /etc/systemd/ system, but the syntax of those is straightforward and it seems to "Just Work". It might be worth trying. -- ========================================================================================================== Rodney Baker rodney.baker@outlook.com.au ==========================================================================================================

On Dienstag, 16. Juli 2024 02:14:24 MESZ Rodney Baker wrote:
Yep, that is, what I am testing at the moment - but not with self created automount files. Instead I modified the entries in fstab and let systemd create the appropriate mount files. Found the solution in ArchWiki. Nevertheless I wonder, what caused my setup to fail, though it had worked for a long time, i.e. what changed to cause the problem.
My home dir isn't on nfs, because this is a notebook, which is not always connected to the network, where my nfs server lives.
See above. Thx and bye. Michael.

On Mittwoch, 17. Juli 2024 15:11:28 MESZ mh@mike.franken.de wrote:
doesn't work for me - the system doesn't go to sleep (suspend.target: Job suspend.target/start failed with result 'dependency'), if any file is open on one of the mounted filesystems, because the share can't be unmounted in this case. And one of the reasons I send the system to sleep is, that I can seamlessly continue working after resume.
Thx and bye. Michael.

On 2024-07-18 11:23, mh@mike.franken.de wrote:
Try "umount --lazy" perhaps. It returns fast. -l, --lazy Lazy unmount. Detach the filesystem from the file hierarchy now, and clean up all references to this filesystem as soon as it is not busy anymore. -- Cheers / Saludos, Carlos E. R. (from Elessar, using openSUSE Leap 15.5)

On 2024-07-19 11:40, mh@mike.franken.de wrote:
On Donnerstag, 18. Juli 2024 22:39:35 MESZ Carlos E. R. wrote:
On 2024-07-18 11:23, mh@mike.franken.de wrote:
I have a small old laptop that now I use to play videos that reside on the server, connected directly to HDMI port via 2 metre cable. Well, the laptop automounts the server directory via systemd automount in fstab. I went to another city, and took the laptop to play videos, just that the videos were locally stored on disk. One day I noticed messages about nfs not finding the server (different LAN). It kept bitching all week, never gave up. But the machine hibernated fine every day. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)

On Samstag, 13. Juli 2024 16:11:41 MESZ mh@mike.franken.de wrote:
It seems, that NM takes down the dhcp lease, even before the dispatcher service starts. So there is no chance to umount nfs shares due to a lack of connectivity. Jun 30 01:21:12 transformer systemd-logind[2036]: Lid closed. Jun 30 01:21:12 transformer systemd-logind[2036]: The system will suspend now! Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.8559] manager: sleep: sleep requested (sleeping: n o enabled: yes) Jun 30 01:21:12 transformer ModemManager[2236]: <info> [sleep-monitor-systemd] system is about to suspend Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.8561] device (wlp0s20f3): state change: unavailable -> unmanaged (reason 'unmanaged-sleeping', sys-iface-state: 'managed') Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.8566] device (wlp0s20f3): set-hw-addr: reset MAC address to 20:1E:88:D2:9F:5B (unmanage) Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.9772] manager: NetworkManager state is now ASLEEP Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.9774] device (enp62s0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed') Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.9775] dhcp4 (enp62s0): canceled DHCP transaction Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.9775] dhcp4 (enp62s0): activation: beginning transaction (timeout in 45 seconds) Jun 30 01:21:12 transformer NetworkManager[2543]: <info> [1719703272.9775] dhcp4 (enp62s0): state changed no lease Jun 30 01:21:12 transformer dbus-daemon[1994]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.13' (uid=0 pid=2543 comm="/usr/sbin/NetworkManager --no-daemon") Jun 30 01:21:12 transformer netconfig[12337]: Executing 'modify --service NetworkManager' for pid 2543 Jun 30 01:21:12 transformer NetworkManager[12337]: <13>Jun 30 01:21:12 netconfig: Executing 'modify --service NetworkManager' for pid 2543 Jun 30 01:21:12 transformer NetworkManager[12335]: debug: lockfile created (/run/netconfig.pid) for PID 12335 Jun 30 01:21:12 transformer NetworkManager[12335]: debug: lockfile created Jun 30 01:21:13 transformer NetworkManager[12335]: debug: write new STATE file /run/netconfig//NetworkManager.netconfig Jun 30 01:21:13 transformer NetworkManager[12335]: debug: Module order: dns-resolver dns-bind dns-dnsmasq nis ntp-runtime Jun 30 01:21:13 transformer systemd[1]: Starting Network Manager Script Dispatcher Service... Or do I misunderstand the journal entries?

On Fri, Jul 19, 2024 at 3:57 PM <mh@mike.franken.de> wrote:
Assuming you are talking about the standard nfs script which is part of openSUSE NetworkManager package - the very idea of this script is to unmount when the NFS server is no longer accessible. And yes, in the past my experience with NFS was - if the server does not answer, attempt to unmount hangs. One hack was to configure the server IP on a local interface, then "umount -l -f" would succeed. Quoting man page. --><-- -f, --force Force an unmount (in case of an unreachable NFS system). Note that this option does not guarantee that umount command does not hang. It’s strongly recommended to use absolute paths without symlinks to avoid unwanted readlink(2) and stat(2) system calls on unreachable NFS in umount. --><--

On 19.07.2024 16:28, mh@mike.franken.de wrote:
Without seeing this script everything will be just a wild guess.
And as already said - it worked for years until NM 1.46.
Did you verify that downgrading NetworkManager (and *only* NetworkManager) restores the previous behavior?

On 29.07.2024 19:48, mh@mike.franken.de wrote:
And you also followed the documentation?
For one, you can rollback to a previous snapshot and then update only NetworkManager. You could also look on https://download.opensuse.org/history/. Otherwise the option is to rebuild earlier revision of NetworkManager.

On Montag, 29. Juli 2024 19:21:14 MESZ Andrei Borzenkov wrote:
Yes, otherwise it wouldn't get started at all. As I wrote, the log shows, that it stops between two nfs umounts before sleep and continues after resume.
This system doesn't know about snapshots. As for the history packages: Trying to install NM from there, would lead to a downgrade of about 50 other packages. So I would have to rebuild it myself - I'll try that in the next days.

On Samstag, 13. Juli 2024 16:11:41 MESZ mh@mike.franken.de wrote:
seems to be a bug: https://bugzilla.suse.com/show_bug.cgi?id=1228154#c5
participants (4)
-
Andrei Borzenkov
-
Carlos E. R.
-
mh@mike.franken.de
-
Rodney Baker