[Bug 1186194] New: Many processes and up in D state after system resume (5.12.3-1-default, git 25d4ec7)
https://bugzilla.suse.com/show_bug.cgi?id=1186194 Bug ID: 1186194 Summary: Many processes and up in D state after system resume (5.12.3-1-default, git 25d4ec7) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: lpechacek@suse.com QA Contact: qa-bugs@suse.de CC: oneukum@suse.com Found By: --- Blocker: --- Created attachment 849440 --> https://bugzilla.suse.com/attachment.cgi?id=849440&action=edit Output of SysRq-T I'm using a Dell laptop (Inspiron 7373) with a USB ethernet adapter (17ef:7205 Lenovo Thinkpad LAN, driver r8152). Sometimes, after resume, network-related utilities (e.g. ip, ping) freeze in D state. No communication over IP networks is possible. Attached is SysRq-T from the malfunctioning state. From my POV, the following stand out: May 18 11:00:34 fmn kernel: task:teams state:D stack: 0 pid: 316 ppid: 3234 flags:0x00000000 May 18 11:00:34 fmn kernel: Call Trace: May 18 11:00:34 fmn kernel: __schedule+0x2ee/0x950 May 18 11:00:34 fmn kernel: schedule+0x46/0xb0 May 18 11:00:34 fmn kernel: rpm_resume+0x19c/0x7b0 May 18 11:00:34 fmn kernel: ? wait_woken+0x80/0x80 May 18 11:00:34 fmn kernel: rpm_resume+0x2e7/0x7b0 May 18 11:00:34 fmn kernel: ? kernfs_fop_open+0x2a8/0x3a0 May 18 11:00:34 fmn kernel: __pm_runtime_resume+0x4a/0x80 May 18 11:00:34 fmn kernel: usb_autopm_get_interface+0x18/0x50 [usbcore] May 18 11:00:34 fmn kernel: rtl8152_get_link_ksettings+0x27/0x80 [r8152] May 18 11:00:34 fmn kernel: speed_show+0x6e/0xb0 May 18 11:00:34 fmn kernel: dev_attr_show+0x19/0x40 May 18 11:00:34 fmn kernel: sysfs_kf_seq_show+0xa6/0xe0 May 18 11:00:34 fmn kernel: seq_read_iter+0x1e1/0x510 May 18 11:00:34 fmn kernel: new_sync_read+0x115/0x1a0 May 18 11:00:34 fmn kernel: vfs_read+0x14b/0x1a0 May 18 11:00:34 fmn kernel: ksys_read+0x5f/0xe0 May 18 11:00:34 fmn kernel: do_syscall_64+0x33/0x80 May 18 11:00:34 fmn kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae AND May 18 11:00:34 fmn kernel: task:kworker/3:2 state:D stack: 0 pid:23489 ppid: 2 flags:0x00004000 May 18 11:00:34 fmn kernel: Workqueue: pm pm_runtime_work May 18 11:00:34 fmn kernel: Call Trace: May 18 11:00:34 fmn kernel: __schedule+0x2ee/0x950 May 18 11:00:34 fmn kernel: schedule+0x46/0xb0 May 18 11:00:34 fmn kernel: schedule_timeout+0x8b/0x140 May 18 11:00:34 fmn kernel: ? __next_timer_interrupt+0x100/0x100 May 18 11:00:34 fmn kernel: msleep+0x2a/0x40 May 18 11:00:34 fmn kernel: napi_disable+0x2b/0x70 May 18 11:00:34 fmn kernel: rtl8152_suspend+0x2ad/0x340 [r8152] May 18 11:00:34 fmn kernel: usb_suspend_both+0x9d/0x230 [usbcore] May 18 11:00:34 fmn kernel: usb_runtime_suspend+0x2b/0x70 [usbcore] May 18 11:00:34 fmn kernel: ? usb_autoresume_device+0x50/0x50 [usbcore] May 18 11:00:34 fmn kernel: __rpm_callback+0x81/0x140 May 18 11:00:34 fmn kernel: rpm_callback+0x4f/0x70 May 18 11:00:34 fmn kernel: ? usb_autoresume_device+0x50/0x50 [usbcore] May 18 11:00:34 fmn kernel: rpm_suspend+0x147/0x6f0 May 18 11:00:34 fmn kernel: ? __switch_to+0x26f/0x450 May 18 11:00:34 fmn kernel: pm_runtime_work+0x8e/0x90 May 18 11:00:34 fmn kernel: process_one_work+0x1df/0x370 May 18 11:00:34 fmn kernel: worker_thread+0x50/0x400 May 18 11:00:34 fmn kernel: ? process_one_work+0x370/0x370 May 18 11:00:34 fmn kernel: kthread+0x11b/0x140 May 18 11:00:34 fmn kernel: ? __kthread_bind_mask+0x60/0x60 May 18 11:00:34 fmn kernel: ret_from_fork+0x22/0x30 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c1
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c2
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c3
--- Comment #3 from Libor Pechacek
Is the problem reproducible reliably?
Unfortunately not. The freeze happens spontaneously about twice a week. I'll have a look if I can replicate the issue though. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c4
Libor Pechacek
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c5
--- Comment #5 from Oliver Neukum
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c6
--- Comment #6 from Libor Pechacek
Just a hunch, try this please
I patched the Tubleweed 5.12.10-1-default kernel driver so that I don't have to build the whole package. No success, the network stack froze even with this patch. The only remedy so far is the autosuspend disable. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c7
--- Comment #7 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c8
--- Comment #8 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c9
--- Comment #9 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c10
Libor Pechacek
Could you try to apply both Oliver's and my patches?
Yes, I'm testing. BTW, the frequency of deadlocks with the new docking station (or rather a port replicator - https://i-tec.pro/en/produkt/c31dualdockpd-2/) is high enough to call the scenario a reliable reproducer. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c11
--- Comment #11 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c12
Libor Pechacek
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c13
--- Comment #13 from Libor Pechacek
I'm going to test Takashi's kernel from https://download.opensuse.org/repositories/home:/tiwai:/bsc1186194/standard/ x86_64/ next.
This one (5.12.12-1.g2fdf821-default) has been running smoothly in the past few days. From the user's point of view, I would call the issue solved. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c14
Libor Pechacek
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c15
--- Comment #15 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c16
--- Comment #16 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c17
--- Comment #17 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c25
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c27
--- Comment #27 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c28
--- Comment #28 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c29
--- Comment #29 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c30
--- Comment #30 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c31
--- Comment #31 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c32
--- Comment #32 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c33
--- Comment #33 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c34
--- Comment #34 from Swamp Workflow Management
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c35
--- Comment #35 from Libor Pechacek
Libor, just to be sure, could you check the current status with the latest kernel in OBS Kernel:stable? It contains my two patches, but this might still hit the deadlock around napi_disable(). I'd like to verify before looking further.
I confirm that I saw the deadlock with recent TW kernel packages. It's very infrequent in comparison to the previous state, though. Thanks for the fixes! I've tried to locate kernel logs from one of the recent freezes but I cannot find them at the moment. I'll monitor the situation and post updates here if I hit the bug again. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1186194
https://bugzilla.suse.com/show_bug.cgi?id=1186194#c36
Libor Pechacek
participants (1)
-
bugzilla_noreply@suse.com