Bug ID 1087635
Summary r8152 livelocks during pm_runtime_suspend
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter jslaby@suse.com
QA Contact qa-bugs@suse.de
CC tbogendoerfer@suse.com
Found By ---
Blocker ---

Created attachment 765580 [details]
full dmesg after sysrq-t

I reported it to upstream too:
https://marc.info/?l=linux-pm&m=152241584411483&w=2

The kernel is 4.15.14-15.gdef7e44-default.

I have seen r8152 from my docking station to kill my box several times
in the last few days. The notebook is new, so I don't know if this is a
regression.

I have the NIC connected all the time. And when I return to the notebook
after a while, the networking is dead. Looking at the stack traces, it
is clear, that r8152 was attempted to be autosuspended and waits in
napi_disable for NAPI_STATE_SCHED bit to be cleared:
[22001.018437] kworker/2:0     D    0 16267      2 0x80000000
[22001.018441] Workqueue: pm pm_runtime_work
[22001.018443] Call Trace:
[22001.018453]  schedule+0x2f/0x90
[22001.018455]  schedule_timeout+0x1ce/0x540
[22001.018474]  msleep+0x29/0x30
[22001.018477]  napi_disable+0x25/0x60
[22001.018483]  rtl8152_suspend+0x20a/0x2d0 [r8152]
[22001.018493]  usb_suspend_both+0x8d/0x200 [usbcore]
[22001.018510]  usb_runtime_suspend+0x2a/0x70 [usbcore]
[22001.018514]  __rpm_callback+0xbc/0x1f0
[22001.018519]  rpm_callback+0x4f/0x70
[22001.018526]  rpm_suspend+0x11d/0x6d0
[22001.018532]  pm_runtime_work+0x73/0xb0
[22001.018535]  process_one_work+0x269/0x6c0
[22001.018541]  worker_thread+0x2b/0x3d0
[22001.018547]  kthread+0x113/0x130
[22001.018556]  ret_from_fork+0x24/0x50

The assembly:
> ffffffff81716730 <napi_disable>:
> ffffffff81716730:       e8 eb b7 2e 00          callq  ffffffff81a01f20 <__fentry__>
> ffffffff81716735:       55                      push   %rbp
> ffffffff81716736:       48 89 fd                mov    %rdi,%rbp
> ffffffff81716739:       53                      push   %rbx
> ffffffff8171673a:       48 8d 5f 10             lea    0x10(%rdi),%rbx
> ffffffff8171673e:       f0 80 4f 10 04          lock orb $0x4,0x10(%rdi)
> ffffffff81716743:       f0 0f ba 6f 10 00       lock btsl $0x0,0x10(%rdi)
> ffffffff81716749:       73 11                   jae    ffffffff8171675c <napi_disable+0x2c>
> ffffffff8171674b:       bf 01 00 00 00          mov    $0x1,%edi
> ffffffff81716750:       e8 ab ac a0 ff          callq  ffffffff81121400 <msleep>
> ffffffff81716755:       f0 0f ba 2b 00          lock btsl $0x0,(%rbx)
> ffffffff8171675a:       72 ef                   jb     ffffffff8171674b <napi_disable+0x1b>





There are other tasks in D state, of course, like these, waiting for the
device to become pm-up:
[22001.018749] kworker/3:1     D    0 16798      2 0x80000000
[22001.018753] Workqueue: events rtl_work_func_t [r8152]
[22001.018755] Call Trace:
[22001.018767]  schedule+0x2f/0x90
[22001.018769]  rpm_resume+0xf9/0x860
[22001.018777]  rpm_resume+0x592/0x860
[22001.018783]  __pm_runtime_resume+0x3a/0x50
[22001.018789]  usb_autopm_get_interface+0x1d/0x50 [usbcore]
[22001.018793]  rtl_work_func_t+0x3e/0x405 [r8152]
[22001.018801]  process_one_work+0x269/0x6c0
[22001.018807]  worker_thread+0x2b/0x3d0
[22001.018813]  kthread+0x113/0x130
[22001.018822]  ret_from_fork+0x24/0x50
[22001.019713] tcpdump         D    0 17119   4265 0x00000004
[22001.019716] Call Trace:
[22001.019728]  schedule+0x2f/0x90
[22001.019730]  rpm_resume+0xf9/0x860
[22001.019738]  rpm_resume+0x592/0x860
[22001.019744]  __pm_runtime_resume+0x3a/0x50
[22001.019750]  usb_autopm_get_interface+0x1d/0x50 [usbcore]
[22001.019754]  rtl8152_ioctl+0x30/0x140 [r8152]
[22001.019758]  dev_ifsioc+0x115/0x3f0
[22001.019763]  dev_ioctl+0x14b/0x680
[22001.019775]  sock_do_ioctl+0x41/0x50
[22001.019778]  sock_ioctl+0x1c2/0x2f0
[22001.019781]  do_vfs_ioctl+0x91/0x680
[22001.019789]  SyS_ioctl+0x74/0x80
[22001.019794]  do_syscall_64+0x76/0x1c0

...

> Showing all locks held in the system: 
> 1 lock held by in:imklog/1371:
>  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000a0b38807>] __fdget_pos+0x3f/0x50
> 1 lock held by Qt bearer threa/3003:
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000021e0bca0>] __netlink_dump_start+0x4c/0x1b0
> 1 lock held by Qt bearer threa/2825:
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000021e0bca0>] __netlink_dump_start+0x4c/0x1b0
> 1 lock held by DNS Res~ver #40/17041:
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000021e0bca0>] __netlink_dump_start+0x4c/0x1b0
> 1 lock held by Qt bearer threa/3110:
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000021e0bca0>] __netlink_dump_start+0x4c/0x1b0
> 1 lock held by DNS Res~ver #16/17044:
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000021e0bca0>] __netlink_dump_start+0x4c/0x1b0
> 2 locks held by bash/4561:
>  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e3d76e61>] tty_ldisc_ref_wait+0x24/0x50
>  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<0000000091462d05>] n_tty_read+0xc3/0x850
> 3 locks held by kworker/2:0/16267:
>  #0:  ((wq_completion)"pm"){+.+.}, at: [<00000000b9dc0832>] process_one_work+0x1e3/0x6c0
>  #1:  ((work_completion)(&dev->power.work)){+.+.}, at: [<00000000b9dc0832>] process_one_work+0x1e3/0x6c0
>  #2:  (&tp->control){+.+.}, at: [<00000000ca575b90>] rtl8152_suspend+0x2b/0x2d0 [r8152]
> 2 locks held by kworker/3:1/16798:
>  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000b9dc0832>] process_one_work+0x1e3/0x6c0
>  #1:  ((work_completion)(&(&tp->schedule)->work)){+.+.}, at: [<00000000b9dc0832>] process_one_work+0x1e3/0x6c0
> 1 lock held by tcpdump/17119: 
>  #0:  (rtnl_mutex){+.+.}, at: [<0000000023a6461d>] dev_ioctl+0x13d/0x680
> 2 locks held by less/17187: 
>  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e3d76e61>] tty_ldisc_ref_wait+0x24/0x50
>  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<0000000091462d05>] n_tty_read+0xc3/0x850


For now, I disabled pm-runtime on the device by:
echo on > /sys/bus/usb/devices/4-1.2/power/control

Any ideas what's wrong? napi_disable from runtime suspend? Double
napi_disable on the path? Some missing pm_runtime_get_sync somewhere?


You are receiving this mail because: