[Bug 1075878] New: Apocalyptic loss of proc/tmpfs/sysfs mounts after WIFI disconnect through NetworkManager
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 Bug ID: 1075878 Summary: Apocalyptic loss of proc/tmpfs/sysfs mounts after WIFI disconnect through NetworkManager Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Factory Status: NEW Severity: Critical Priority: P5 - None Component: Network Assignee: bnc-team-screening@forge.provo.novell.com Reporter: niels@nielsvm.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 Build Identifier: I reinstalled Tumbleweed last week and since then, a major bug grinds my entire system to a halt. So far I figured out that disconnecting from a WIFI network or connecting to a new network, triggers this rather apocalyptic symptom. As soon as the call to NM happens, procfs disappears rendering the entire system unusable. Commands like sudo/mount/reboot/halt etc no more work, sockets disappear and the only way to reboot is through Alt+SysRq. Kernel and processes keep running, but spawning processes and any other primitive operations fail. Reproducible: Always Steps to Reproduce: All these things trigger the behavior: - Disconnecting the auto-connected WIFI network through Plasma's network applet. - Connect to different WIFI network through Plasma's network applet. - nmcli connection down wlp3s0 <ESSID> - Logged in as a different user with less privilege groups, same thing happens. - Removed connections from /etc/NetworkManager/system-connections/ and readded, still happens. Actual Results: Mounts immediately disappear, logging no longer works, system basically melts down. Mouse cursor keeps moving and windows are still drag-gable, tabs in Konsole can still be closed/moved, but no new ones added for instance. Higher level services like D-BUS still seem active, anything low-level is gone. Expected Results: Basic WIFI operation. $ nmcli wlp3s0: connected to RNC "Intel Wireless 7260 (Dual Band Wireless-AC 7260)" wifi (iwlwifi), E8:B1:FC:F7:6A:A2, hw, mtu 1500 ip4 default inet4 10.0.1.22/24 inet6 fe80::6dd8:d933:d607:d8e1/64 $ dmesg 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b) 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) 00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b) 00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04) 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) 00:16.3 Serial controller: Intel Corporation 8 Series HECI KT (rev 04) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM (rev 04) 00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 6 (rev e4) 00:1c.1 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4) 00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04) 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01) 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) $ dmesg|grep "wlp3s0\|iwlwifi\|error\|firmware" [ 3.162392] psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3 [ 5.372976] tpm tpm0: A TPM error (6) occurred attempting to read a pcr value [ 5.530898] iwlwifi 0000:03:00.0: loaded firmware version 17.608620.0 op_mode iwlmvm [ 5.550854] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144 [ 5.570969] iwlwifi 0000:03:00.0: base HW address: e8:b1:fc:f7:6a:a2 [ 5.813741] iwlwifi 0000:03:00.0 wlp3s0: renamed from wlan0 [ 6.815055] psmouse serio3: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3 [ 7.723688] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready [ 7.974563] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready [ 8.259831] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready [ 8.350551] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready [ 11.603406] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready [ 14.554637] wlp3s0: authenticate with c0:a0:bb:ee:42:b4 [ 14.559795] wlp3s0: send auth to c0:a0:bb:ee:42:b4 (try 1/3) [ 14.563528] wlp3s0: authenticated [ 14.563979] wlp3s0: associate with c0:a0:bb:ee:42:b4 (try 1/3) [ 14.569912] wlp3s0: RX AssocResp from c0:a0:bb:ee:42:b4 (capab=0x431 status=0 aid=2) [ 14.571276] wlp3s0: associated [ 14.836520] IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready $ find /etc/NetworkManager/ /etc/NetworkManager/ /etc/NetworkManager/dispatcher.d /etc/NetworkManager/dispatcher.d/nfs /etc/NetworkManager/dispatcher.d/99tlp-rdw-nm /etc/NetworkManager/dispatcher.d/ntp /etc/NetworkManager/dispatcher.d/autofs /etc/NetworkManager/VPN /etc/NetworkManager/VPN/nm-vpnc-service.name /etc/NetworkManager/VPN/nm-openvpn-service.name /etc/NetworkManager/NetworkManager.conf /etc/NetworkManager/system-connections /etc/NetworkManager/system-connections/... (stripped for privacy reasons) $ cat /etc/NetworkManager/NetworkManager.conf [main] plugins=ifcfg-suse,keyfile [connectivity] uri=http://conncheck.opensuse.org -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 Niels van Mourik <niels@nielsvm.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocker|--- |Yes Found By|--- |Customer -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 Niels van Mourik <niels@nielsvm.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |niels@nielsvm.org -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c1 --- Comment #1 from Niels van Mourik <niels@nielsvm.org> --- Created attachment 755965 --> http://bugzilla.opensuse.org/attachment.cgi?id=755965&action=edit Terminal screenshot with mount output before (right pane;), and after WIFI disconnect (left). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c2 --- Comment #2 from Niels van Mourik <niels@nielsvm.org> --- $ uname -a Linux laptop 4.14.12-1-default #1 SMP PREEMPT Fri Jan 5 18:15:55 UTC 2018 (3cf399e) x86_64 x86_64 x86_64 GNU/Linux $ ls -l /var/crash total 0 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c3 --- Comment #3 from Niels van Mourik <niels@nielsvm.org> --- $ tail -n 30 /var/log/wpa_supplicant.log rfkill: WLAN soft blocked nl80211: deinit ifname=p2p-dev-wlp3s0 disabled_11b_rates=0 nl80211: deinit ifname=wlp3s0 disabled_11b_rates=0 wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=5220 MHz) wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=5220 MHz) wlp3s0: Associated with <REDACTED MAC> wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP] wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=] wlp3s0: CTRL-EVENT-DISCONNECTED bssid=<REDACTED MAC> reason=3 locally_generated=1 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD Successfully initialized wpa_supplicant wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz) wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz) wlp3s0: Associated with <REDACTED MAC> wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP] wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=] wlp3s0: CTRL-EVENT-DISCONNECTED bssid=<REDACTED MAC> reason=3 locally_generated=1 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD Successfully initialized wpa_supplicant wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz) wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz) wlp3s0: Associated with <REDACTED MAC> wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP] wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=] PLEASE NOTE: Regulatory domain COUNTRY_IE is incorrect as this device is located in NL, however, this is what my locale has been deliberately set to. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c4 --- Comment #4 from Niels van Mourik <niels@nielsvm.org> --- JOURNALCTL entries around the incident time: Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.3671] device (wlp3s0): state change: activated -> deactivating (reason 'user-requested', internal state 'managed') Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.3672] manager: NetworkManager state is now DISCONNECTING Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.3700] audit: op="connection-deactivate" uuid="bbaf9de8-dfaa-4637-86ff-d555a0d9b82d" name="<REDACTED ESSID>" pid=6133 uid=1000 result="success" Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=1162 comm="/usr/sbin/NetworkManager --no-daemon ") Jan 13 13:52:06 laptop systemd[1]: Starting Network Manager Script Dispatcher Service... Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.3736] device (wlp3s0): state change: deactivating -> disconnected (reason 'user-requested', internal state 'managed') Jan 13 13:52:06 laptop avahi-daemon[974]: Withdrawing address record for <REDACTED IP-ADDRESS> on wlp3s0. Jan 13 13:52:06 laptop avahi-daemon[974]: Leaving mDNS multicast group on interface wlp3s0.IPv6 with address <REDACTED IP-ADDRESS>. Jan 13 13:52:06 laptop avahi-daemon[974]: Interface wlp3s0.IPv6 no longer relevant for mDNS. Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jan 13 13:52:06 laptop systemd[1]: Started Network Manager Script Dispatcher Service. Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:1 'connectivity-change': new request (4 scripts) Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:1 'connectivity-change': start running ordered scripts... Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.4117] dhcp4 (wlp3s0): canceled DHCP transaction, DHCP client pid 1842 Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.4117] dhcp4 (wlp3s0): state changed bound -> done Jan 13 13:52:06 laptop kernel: wlp3s0: deauthenticating from <REDACTED MAC> by local choice (Reason: 3=DEAUTH_LEAVING) Jan 13 13:52:06 laptop avahi-daemon[974]: Withdrawing address record for <REDACTED IP-ADDRESS> on wlp3s0. Jan 13 13:52:06 laptop kernel: wlp3s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22) Jan 13 13:52:06 laptop avahi-daemon[974]: Leaving mDNS multicast group on interface wlp3s0.IPv4 with address <REDACTED IP-ADDRESS>. Jan 13 13:52:06 laptop avahi-daemon[974]: Interface wlp3s0.IPv4 no longer relevant for mDNS. Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.4382] device (wlp3s0): set-hw-addr: set MAC address to <REDACTED MAC> (scanning) Jan 13 13:52:06 laptop kernel: IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready Jan 13 13:52:06 laptop nscd[967]: 967 monitored file `/etc/resolv.conf` was written to Jan 13 13:52:06 laptop NetworkManager[1162]: nisdomainname: you must be root to change the domain name Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.5938] manager: NetworkManager state is now DISCONNECTED Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.5941] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device) Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.6' (uid=0 pid=1162 comm="/usr/sbin/NetworkManager --no-daemon ") Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:2 'down' [wlp3s0]: new request (4 scripts) Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:2 'down' [wlp3s0]: start running ordered scripts... Jan 13 13:52:06 laptop systemd[1]: Starting Hostname Service... Jan 13 13:52:06 laptop NetworkManager[1162]: <warn> [1515847926.6091] sup-iface[0x<REDACTED>,wlp3s0]: connection disconnected (reason -3) Jan 13 13:52:06 laptop NetworkManager[1162]: <info> [1515847926.6092] device (wlp3s0): supplicant interface state: completed -> disconnected Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Successfully activated service 'org.freedesktop.hostname1' Jan 13 13:52:06 laptop systemd[1]: Started Hostname Service. Jan 13 13:52:06 localhost.localdomain systemd-hostnamed[6277]: Changed host name to 'localhost.localdomain' Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: req:3 'hostname': new request (4 scripts) Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: ping: /dev/disk/by-path/pci-0000: Name or service not known Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: ping: /dev/disk/by-path/pci-0000: Name or service not known Jan 13 13:52:06 localhost.localdomain systemd[2044]: Failed to parse /proc/self/mountinfo: No such file or directory -- Reboot -- Jan 13 13:52:43 localhost kernel: microcode: microcode updated early to revision 0x20, date = 2017-01-27 Jan 13 13:52:43 localhost kernel: Linux version 4.14.12-1-default (geeko@buildhost) (gcc version 7.2.1 20171020 [gcc-7-branch revision 253932] (SUSE Linux)) #1 SMP PREEMPT Fri Jan 5 18:15:55 UTC 2018 (3cf399e) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c5 --- Comment #5 from Niels van Mourik <niels@nielsvm.org> --- I managed to isolate the problem down to NetworkManager. When I switch over to Wickd in Yast, none of the triggers still cause the system to lose its proc mount. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c8 --- Comment #8 from Johannes Pöhlmann <johannes@lst.de> --- I had "Apocalyptic loss of proc/tmpfs/sysfs mounts" also on openSUSE 15.1. By stopping open VPN or by pulling a usb network adapter. in fact every taking down of a network adapter. I found a possible trigger: I had managed somehow to have network manager AND wickedd active. After disabling the network manager packages and switching to wicked in Yast, the Problem has vanished. I am fine now. The remaining root questions are: * Why was i able to activate network manager AND wicked (Reasons on Level 1-7 ;-) * Why can the combination of network manager and wickedd do such damage ? * Should the kernel not defend itself better against such misconfiguration ? uname -a: Linux xxx 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020 (18849d1) x86_64 x86_64 x86_64 GNU/Linux -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1075878 http://bugzilla.opensuse.org/show_bug.cgi?id=1075878#c9 Fabian Vogt <fvogt@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fvogt@suse.com Flags| |needinfo?(johannes@lst.de) --- Comment #9 from Fabian Vogt <fvogt@suse.com> --- (In reply to Johannes Pöhlmann from comment #8)
I had "Apocalyptic loss of proc/tmpfs/sysfs mounts" also on openSUSE 15.1. By stopping open VPN or by pulling a usb network adapter. in fact every taking down of a network adapter.
I found a possible trigger: I had managed somehow to have network manager AND wickedd active. After disabling the network manager packages and switching to wicked in Yast, the Problem has vanished.
I am fine now. The remaining root questions are:
* Why was i able to activate network manager AND wicked (Reasons on Level 1-7 ;-)
Probably by not using network.service, but also wicked.service or NetworkManager.service. Or maybe dbus activation.
* Why can the combination of network manager and wickedd do such damage ?
Does /etc/NetworkManager/dispatcher.d/nfs tun0 down trigger the issue? You might be able to run it as non-root, in which case the errors should indicate whether it would've broken the mounts. If so, please provide the output of: bash -x /etc/NetworkManager/dispatcher.d/nfs tun0 down
* Should the kernel not defend itself better against such misconfiguration ?
The kernel is most likely innocent here and doesn't care about /proc and such at all. If userspace decides to break itself, it won't mind.
uname -a: Linux xxx 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020 (18849d1) x86_64 x86_64 x86_64 GNU/Linux
-- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com