http://bugzilla.opensuse.org/show_bug.cgi?id=1061339 Bug ID: 1061339 Summary: NetworkManager freezes at shutdown or when changing SSID Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Factory Status: NEW Severity: Normal Priority: P5 - None Component: Network Assignee: bnc-team-screening@forge.provo.novell.com Reporter: jimc@math.ucla.edu QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 742798 --> http://bugzilla.opensuse.org/attachment.cgi?id=742798&action=edit Syslog (debug) of shutdown showing segfault, near the end. On Tumbleweed dated 2017-10-01, package versions: NetworkManager-1.8.4-1.1.x86_64 NetworkManager-applet-1.8.2-1.1.x86_64 dbus-1-1.10.20-2.2.x86_64 systemd-233-1.1.x86_64 libsystemd0-234-4.1.x86_64 (I didn't do it, that's what's on Tumbleweed) The following symptoms started with NetworkManager-1.8.2 and continue with 1.8.4. 1. When you do "systemctl poweroff", systemd stops some services promptly but has a timeout of 1 to 2 minutes on many of them, and takes 20 to 30 minutes to shut down. "systemctl --force poweroff" speeds it up, taking only about 5 minutes. NetworkManager is one of the services that takes extra-long to shut down (or time out). Syslog (debug level) does not show suspicious messages. Syslog messages are grouped with timestamps in a 14 sec interval, though messages on the console appear a few at a time throughout a much longer interval. 2. I click on nm-applet, the icon in the "system tray", and select a different access point (SSID) (which has a credential preconfigured and would normally let me on). The icon reappears (as if it were still connected to the previous SSID, but "ping" shows that it's disconnected). If I'm impatient and click on the icon again once, nm-applet maps a window for the list of available connection targets (including, for Wi-fi, the result of an AP scan), but puts nothing in it for at least 5 minutes. I know this because I have a ping app that shows a stripchart of connectivity, and the section outside the mapped area moves normally, while the covered section is stationary. Eventually nm-applet will show its "connect-to" menu, but where there should be five or six APs within range, it shows none of them. NetworkManager is permanently hosed, and the only way to get back on the net is to reboot, see symptom #1. 3. If I do "systemctl stop NetworkManager" the systemctl process never completes (or maybe, completes after an outrageously long timeout). The process cannot be put into the background with ctrl-Z. If I start it in the background, e.g. "systemctl stop NetworkManager &", I can do "systemctl status NetworkManager" successfully, and its status is terminating-timeout (this from memory). If I send SIGKILL to the undead NM process, it remains undead -- I haven't seen that behavior before. I didn't try SIGNUKE. If I do "systemctl [--force] poweroff", it behaves as in symptom #1. The time to shut down was similar as if systemd had tried to stop NetworkManager itself as part of the shutdown. 4. This may or may not be related: Having issues populating the chroot jail, rather than using the LSB script /etc/init.d/named to start DNS, I extracted the relevant functions into my own startup script and made a systemd unit to call it. This unit is After=NetworkManager, so when NetworkManager claims to be started up, named.service starts immediately. Before NetworkManager-1.8.2 it started reliably. Now it complains of a parse error (fatal) in /etc/named.d/forwarders.conf (in forwarders {...};) every time. In named.service I put "ExecPre=/usr/bin/sleep 2" and the workaround was successful. This means there is a race condition and version 1.8.2 is being slowed down by something. (And it should swap in forwarders.conf atomically by doing a rename, not writing on the production file slowly from the beginning.) Could it be that NetworkManager gets into a death spiral with dbus, so NetworkManager is blocking waiting for a message from dbus, in such a way that it can only be killed after the socket read finishes, but dbus is single threaded and cannot serve other clients, until the socket read from NetworkManager finishes? That would explain why some services shut down at normal speed, while others need to use dbus when shutting down, and they get stuck. It is very hard to get any evidence on this bug because its major manifestation is during shutdown, and the machine is hosed as soon as you set off the bug. I'm hoping that the developers will be able to reproduce it in-house. Update: I rebooted my AP, and symptom #2 ensued when wpa-supplicant got it reconnected (but ping packets were not replied to; I don't know if they were actually going out). I stopped NetworkManager and rebooted (--force). libdbus-1.so.3.14.12 from libdbus-1-3-1.10.20-2.2.x86_64 got a segfault. No further progress was made on the shutdown for about 3.5 minutes, then it rebooted preemptively. See the attached syslog. -- You are receiving this mail because: You are on the CC list for the bug.