Bug ID 1061339
Summary NetworkManager freezes at shutdown or when changing SSID
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware x86-64
OS openSUSE Factory
Status NEW
Severity Normal
Priority P5 - None
Component Network
Assignee bnc-team-screening@forge.provo.novell.com
Reporter jimc@math.ucla.edu
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

Created attachment 742798 [details]
Syslog (debug) of shutdown showing segfault, near the end.

On Tumbleweed dated 2017-10-01, package versions:
NetworkManager-1.8.4-1.1.x86_64
NetworkManager-applet-1.8.2-1.1.x86_64
dbus-1-1.10.20-2.2.x86_64
systemd-233-1.1.x86_64
libsystemd0-234-4.1.x86_64 (I didn't do it, that's what's on Tumbleweed)

The following symptoms started with NetworkManager-1.8.2 and continue
with 1.8.4.  

1.  When you do "systemctl poweroff", systemd stops some services
promptly but has a timeout of 1 to 2 minutes on many of them, and takes
20 to 30 minutes to shut down.  "systemctl --force poweroff" speeds it
up, taking only about 5 minutes.  NetworkManager is one of the services
that takes extra-long to shut down (or time out).  Syslog (debug level)
does not show suspicious messages.  Syslog messages are grouped with
timestamps in a 14 sec interval, though messages on the console appear
a few at a time throughout a much longer interval.  

2.  I click on nm-applet, the icon in the "system tray", and select a
different access point (SSID) (which has a credential preconfigured 
and would normally let me on).  The icon reappears (as if it were 
still connected to the previous SSID, but "ping" shows that it's 
disconnected).  If I'm impatient and click on the icon again once, 
nm-applet maps a window for the list of available connection targets 
(including, for Wi-fi, the result of an AP scan), but puts
nothing in it for at least 5 minutes.  I know this because I have a
ping app that shows a stripchart of connectivity, and the section 
outside the mapped area moves normally, while the covered section is
stationary.  Eventually nm-applet will show its "connect-to" menu, 
but where there should be five or six APs within range, it shows none
of them.  NetworkManager is permanently hosed, and the only way to
get back on the net is to reboot, see symptom #1.  

3.  If I do "systemctl stop NetworkManager" the systemctl process 
never completes (or maybe, completes after an outrageously long 
timeout).  The process cannot be put into the background with ctrl-Z.
If I start it in the background, e.g. "systemctl stop NetworkManager &",
I can do "systemctl status NetworkManager" successfully, and its 
status is terminating-timeout (this from memory).  If I send SIGKILL
to the undead NM process, it remains undead -- I haven't seen that
behavior before.  I didn't try SIGNUKE.  If I do "systemctl [--force]
poweroff", it behaves as in symptom #1.  The time to shut down was
similar as if systemd had tried to stop NetworkManager itself as part
of the shutdown.  

4.  This may or may not be related:  Having issues populating the chroot
jail, rather than using the LSB script /etc/init.d/named to start DNS, I
extracted the relevant functions into my own startup script and made a
systemd unit to call it.  This unit is After=NetworkManager, so when
NetworkManager claims to be started up, named.service starts
immediately.  Before NetworkManager-1.8.2 it started reliably.  Now it
complains of a parse error (fatal) in /etc/named.d/forwarders.conf (in
forwarders {...};) every time.  In named.service I put
"ExecPre=/usr/bin/sleep 2" and the workaround was successful.  This means
there is a race condition and version 1.8.2 is being slowed down by
something.  (And it should swap in forwarders.conf atomically by doing a
rename, not writing on the production file slowly from the beginning.)


Could it be that NetworkManager gets into a death spiral with dbus,
so NetworkManager is blocking waiting for a message from dbus, in such
a way that it can only be killed after the socket read finishes, but
dbus is single threaded and cannot serve other clients, until the socket
read from NetworkManager finishes?  That would explain why some services
shut down at normal speed, while others need to use dbus when shutting
down, and they get stuck.  

It is very hard to get any evidence on this bug because its major 
manifestation is during shutdown, and the machine is hosed as soon as
you set off the bug.  I'm hoping that the developers will be able to
reproduce it in-house.  

Update: I rebooted my AP, and symptom #2 ensued when wpa-supplicant got
it reconnected (but ping packets were not replied to; I don't know if 
they were actually going out).  I stopped NetworkManager and rebooted
(--force).  libdbus-1.so.3.14.12 from libdbus-1-3-1.10.20-2.2.x86_64 got
a segfault.  No further progress was made on the shutdown for about 3.5
minutes, then it rebooted preemptively.  See the attached syslog.


You are receiving this mail because: