[Bug 1201845] New: mt7921e 0000:03:00.0: driver own failed
https://bugzilla.suse.com/show_bug.cgi?id=1201845 Bug ID: 1201845 Summary: mt7921e 0000:03:00.0: driver own failed Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: kostas.peletidis@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Background: I was changing the firewall zone in the settings of my WLAN connection using NetworkManager. Suddenly I noticed many web sites were becoming unreachable and commands such as "ss -na4" would hang indefinitely (Ctrl-C wouldn't work). The processes of commands that involved the network could not be killed, not even with SIGKILL. It might also be worth mentioning that a VirtualBox VM was running at the time, again running Tumbleweed and NetworkManager. The VM had a network interface bridged to the wlan interface, wlp3s0. It was not possible to shut down or even kill the VM when the issue was noticed. Shutting down the actual machine wasn't possible either, a hard poweroff had to be done. The output of dmesg follows below, with a few lines before the "cut here" mark for context: [11249.676616] r8169 0000:02:00.0 enp2s0f0: Link is Down [11453.812782] mt7921e 0000:03:00.0: driver own failed [11454.986117] mt7921e 0000:03:00.0: driver own failed [11454.986134] mt7921e 0000:03:00.0: chip reset [11456.170894] mt7921e 0000:03:00.0: driver own failed [11456.278532] pcieport 0000:00:02.3: pciehp: Slot(0): Link Down [11456.278536] pcieport 0000:00:02.3: pciehp: Slot(0): Card not present [11456.313973] wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING) [11457.286206] mt7921e 0000:03:00.0: Timeout for driver own [11458.400420] mt7921e 0000:03:00.0: driver own failed [11458.400442] ------------[ cut here ]------------ [11458.400443] WARNING: CPU: 2 PID: 8597 at kernel/kthread.c:659 kthread_park+0x81/0x90 [11458.400454] Modules linked in: mptcp_diag tcp_diag udp_diag raw_diag inet_diag tun rfcomm nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast ccm cmac algif_hash ecb algif_skcipher af_alg af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct bridge stp llc nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw vboxnetadp(O) vboxnetflt(O) iptable_security nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter bnep btusb btrtl btbcm btintel btmtk bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common vboxdrv(O) videodev ecdh_generic mc qrtr dmi_sysfs msr nls_iso8859_1 nls_cp437 vfat fat mt7921e mt7921_common snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic mt76_connac_lib snd_sof_amd_renoir snd_ctl_led [11458.400527] snd_sof_amd_acp mt76 snd_sof_pci snd_sof snd_hda_codec_realtek snd_sof_utils mac80211 snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core snd_hda_intel snd_compress libarc4 snd_intel_dspcfg snd_pcm_dmaengine snd_intel_sdw_acpi snd_acp_pci snd_hda_codec pcspkr r8169 snd_pci_acp6x snd_hda_core cfg80211 efi_pstore snd_pci_acp5x joydev snd_hwdep think_lmi realtek snd_rn_pci_acp3x mdio_devres wmi_bmof thinkpad_acpi firmware_attributes_class snd_acp_config snd_pcm snd_soc_acpi ledtrig_audio libphy snd_pci_acp3x platform_profile snd_timer rfkill i2c_piix4 k10temp thermal tiny_power_button snd soundcore ac acpi_cpufreq button fuse configfs ip_tables x_tables usbhid amdgpu drm_ttm_helper ttm xhci_pci iommu_v2 xhci_pci_renesas xhci_hcd ucsi_acpi hid_multitouch nvme gpu_sched typec_ucsi hid_generic usbcore ccp drm_dp_helper serio_raw roles nvme_core sp5100_tco typec wmi battery video i2c_hid_acpi i2c_hid btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg [11458.400598] dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs [11458.400605] CPU: 2 PID: 8597 Comm: kworker/u32:11 Tainted: G O 5.18.11-1-default #1 openSUSE Tumbleweed dfca0849ead84314c7d214c3a61e8eeef78902fa [11458.400611] Hardware name: LENOVO 20XGS0V508/20XGS0V508, BIOS R1NET47W (1.17) 12/21/2021 [11458.400613] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common] [11458.400627] RIP: 0010:kthread_park+0x81/0x90 [11458.400633] Code: 00 48 85 c0 74 2d 31 c0 5b 5d c3 cc cc cc cc 0f 0b 48 8b ab a8 0a 00 00 a8 04 74 ac 0f 0b b8 da ff ff ff 5b 5d c3 cc cc cc cc <0f> 0b b8 f0 ff ff ff eb d5 0f 0b eb cf 66 90 0f 1f 44 00 00 41 55 [11458.400636] RSP: 0018:ffffb026c964fe00 EFLAGS: 00010202 [11458.400640] RAX: 0000000000000004 RBX: ffff9ae989282900 RCX: 0000000000000000 [11458.400642] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff9ae989282900 [11458.400643] RBP: ffff9ae9944d8580 R08: 0000000000000000 R09: 00000000fffffff0 [11458.400645] R10: 0000000000000003 R11: ffff9aec6e2c38a8 R12: ffff9ae9871308e0 [11458.400646] R13: ffff9ae9871320e0 R14: ffff9ae987138610 R15: ffff9ae987132430 [11458.400649] FS: 0000000000000000(0000) GS:ffff9aec5ee80000(0000) knlGS:0000000000000000 [11458.400651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11458.400653] CR2: 0000561d5a80d1b0 CR3: 00000002a73ce000 CR4: 0000000000750ee0 [11458.400655] PKRU: 55555554 [11458.400657] Call Trace: [11458.400659] <TASK> [11458.400664] mt7921e_mac_reset+0x9e/0x2d0 [mt7921e 3c18704b3a7506787a289600f5bf8b01993445b3] [11458.400672] mt7921_mac_reset_work+0x9f/0x14a [mt7921_common 3c26fb2d3d9614b27e4189eb21b64c97b9b38876] [11458.400681] process_one_work+0x20f/0x3d0 [11458.400689] worker_thread+0x4a/0x3b0 [11458.400692] ? process_one_work+0x3d0/0x3d0 [11458.400695] kthread+0xda/0x100 [11458.400699] ? kthread_complete_and_exit+0x20/0x20 [11458.400704] ret_from_fork+0x22/0x30 [11458.400711] </TASK> [11458.400712] ---[ end trace 0000000000000000 ]--- [11658.821384] vboxnetflt: 134306 out of 1469774 packets were not sent (directed to host) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kostas.peletidis@suse.com, | |tiwai@suse.com Flags| |needinfo?(kostas.peletidis@ | |suse.com) --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Is the bug reproducible? The kernel WARNING appears to be a non-crucial case that is triggered by the repeated "driver own failed" error. Also, is it the first appearance of the error? Better to attach the full dmesg output. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c2 --- Comment #2 from Kostas Peletidis <kostas.peletidis@suse.com> --- Created attachment 860426 --> https://bugzilla.suse.com/attachment.cgi?id=860426&action=edit Output of dmesg for first occurrence of kernel warning -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c3 --- Comment #3 from Kostas Peletidis <kostas.peletidis@suse.com> --- Created attachment 860427 --> https://bugzilla.suse.com/attachment.cgi?id=860427&action=edit Output of dmesg for last occurrence of kernel warning -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c4 --- Comment #4 from Kostas Peletidis <kostas.peletidis@suse.com> --- It is not easy to reproduce this issue because I wasn't running any test. I was just using the machine (it is my work laptop). From what I see here it happened once more before. I have attached the dmesg output for both cases (one not tainted, the other tainted probably because of vbox and friends). It looks like there is a correlation between chip reset attempts that timed out and the kernel warning. For example: Chip reset OK, no warning ------------------------- Jul 04 13:06:33 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: chip reset Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: WM Firmware Version: ____010000, Build Time: 20220311230931 Jul 04 13:06:36 savra kernel: wlp3s0: Driver requested disconnection from AP f8:5b:3b:0f:2b:9f Chip reset timeout, warning --------------------------- Jul 08 08:47:17 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 08 08:47:18 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 08 08:47:18 savra kernel: mt7921e 0000:03:00.0: chip reset Jul 08 08:47:19 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 08 08:47:19 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Link Down Jul 08 08:47:19 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Card not present Jul 08 08:47:19 savra kernel: wlp3s0: deauthenticating from e6:5b:3b:0f:2b:9e by local choice (Reason: 3=DEAUTH_LEAVING) Jul 08 08:47:20 savra kernel: mt7921e 0000:03:00.0: Timeout for driver own Jul 08 08:47:21 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 08 08:47:21 savra kernel: ------------[ cut here ]------------ Jul 08 08:47:21 savra kernel: WARNING: CPU: 7 PID: 113 at kernel/kthread.c:659 kthread_park+0x7b/0x90 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c5 --- Comment #5 from Kostas Peletidis <kostas.peletidis@suse.com> --- Created attachment 860580 --> https://bugzilla.suse.com/attachment.cgi?id=860580&action=edit dmesg output of bug occurrence with kernel 5.18.12-1-default Saw this bug again just now, with kernel 5.18.12-1-default. I am attaching the dmesg output. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c6 --- Comment #6 from Takashi Iwai <tiwai@suse.com> --- Does this happen after some operations, or does it appear at boot? There has been issues with the power management, and a workaround was somehow to do cold boot from the complete power off (so that BIOS resets the chip state), and I wonder whether this is a similar case. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c7 --- Comment #7 from Kostas Peletidis <kostas.peletidis@suse.com> --- (In reply to Takashi Iwai from comment #6)
Does this happen after some operations, or does it appear at boot?
There has been issues with the power management, and a workaround was somehow to do cold boot from the complete power off (so that BIOS resets the chip state), and I wonder whether this is a similar case.
The bug happens during "normal use". As I mentioned earlier, I can see the chip reset in older logs: ---- Jul 04 13:06:33 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: driver own failed Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: chip reset Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a ---- But in some cases, like today's, the chip reset doesn't seem to have the desired effect (or maybe it is being probed too soon after the reset?) so we end up with: [11910.884024] mt7921e 0000:03:00.0: driver own failed [11912.046411] mt7921e 0000:03:00.0: driver own failed [11912.046429] mt7921e 0000:03:00.0: chip reset [11913.218988] mt7921e 0000:03:00.0: driver own failed [11913.327042] pcieport 0000:00:02.3: pciehp: Slot(0): Link Down [11913.327054] pcieport 0000:00:02.3: pciehp: Slot(0): Card not present [11913.362898] wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING) [11914.341792] mt7921e 0000:03:00.0: Timeout for driver own [11915.546880] mt7921e 0000:03:00.0: driver own failed [11915.546903] ------------[ cut here ]------------ [11915.546905] WARNING: CPU: 6 PID: 26340 at kernel/kthread.c:659 kthread_park+0x81/0x90 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1201845 https://bugzilla.suse.com/show_bug.cgi?id=1201845#c8 --- Comment #8 from Takashi Iwai <tiwai@suse.com> --- Thanks. The warning seems triggered when the driver starts resetting and the driver-own failure happens during the reset. The kernel warning itself might be a red herring. So rather we need to ask to upstream devs why this "driver own failed" error happens repeatedly at all. It could be reported to bugzilla.kernel.org, but I suspect it's badly watched out by the upstream devs. Maybe better to report to linux-wireless ML instead. Care to post to linux-wireless ML (and put Cc to me)? -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com