[kernel-bugs] [Bug 1177018] New: Rare kernel soft lockup causes PC to freeze
http://bugzilla.opensuse.org/show_bug.cgi?id=1177018 Bug ID: 1177018 Summary: Rare kernel soft lockup causes PC to freeze Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: pujos.michael@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 841985 --> http://bugzilla.opensuse.org/attachment.cgi?id=841985&action=edit output of journalctl Using Kernel 5.4.10 and current TW as of the date of this report. Today I was working normally in Xorg and suddenly my laptop freezed: no keyboard input, mouse cursor moving but clicks inoperant, ssh'ing from another PC impossible. Interestingly, I had audio playing and it still continue to play normally. Fans of the laptop triggered full speed, indicating high CPU usage. I rebooted the laptop with ALT+SysRq+b and looked at the journal which contains a lot of "watchdog: BUG: soft lockup - CPU#1 stuck for 22s!" entries with stack trace that seems to refer to usb. I have attached the full journal log, look at the end for the "BUG: soft lockup" entries. At that point, the laptop had an uptime of about 3 days with a few suspend in-between. As far as USB is concerned, I have a Thunderbolt 3 dock connected and audio was playing through it (via USB audio) when it happened. Also had an Android device connected to the laptop directly. There's the Logitech unifying receiver connected to the laptop and a keyboard connected to the dock. I can include the output of hwinfo if necessary This freeze is rare in the grand scheme of things but it also happened once 2 weeks ago with a previous 5.4.x kernel. First I blamed it to the NVIDIA driver and did not investigate more, but now I'm not so sure given that the stack trace refer to USB. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177018 http://bugzilla.opensuse.org/show_bug.cgi?id=1177018#c1 --- Comment #1 from Michael Pujos <pujos.michael@gmail.com> --- More on this. This happened again today but not an entire lockup this time. Was working as usual and suddenly, adb (command line tool to communicate with an Android device over USB) stopped responding and at the same time audio playing via USB to my Thunderbolt dock also stopped intermittently for several seconds. The adb process was unkillable with 'kill -9'. 'top' indicated that culprit is "kworker/7:2+usb_hub_wq" process taking 100% CPU all the time with regular traces below in journal. At that stage the only way the machine was really unstable (Ethernet networking from TB3 dock gone, temporary lockups) and had to force poweroff the machine with power button (as /sbin/poweroff remained stuck). So on my system, USB is going berserk at some point... Sep 28 17:28:24 p72 kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kworker/7:2:18895] Sep 28 17:28:24 p72 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq st sr_mod cdrom lp parport_pc ppdev parport rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat md4 iptable_mangle> Sep 28 17:28:24 p72 kernel: mei_wdt iTCO_vendor_support intel_rapl_msr fuse fat mac80211 snd_hda_codec_generic snd_soc_core kvm snd_compress snd_usb_audio snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg libarc4 irqbypass efi_pstore snd_hda_codec btusb > Sep 28 17:28:24 p72 kernel: xhci_pci_renesas fb_sys_fops cec xhci_hcd rc_core aesni_intel drm glue_helper usbcore crypto_simd cryptd nvme nvme_core rtsx_pci serio_raw wmi battery pinctrl_cannonlake video pinctrl_intel button btrfs blake2b_generic libcrc> Sep 28 17:28:24 p72 kernel: CPU: 7 PID: 18895 Comm: kworker/7:2 Kdump: loaded Tainted: P U W OEL 5.8.10-1-default #1 openSUSE Tumbleweed Sep 28 17:28:24 p72 kernel: Hardware name: LENOVO 20MBCTO1WW/20MBCTO1WW, BIOS N2CET50W (1.33 ) 01/15/2020 Sep 28 17:28:24 p72 kernel: Workqueue: usb_hub_wq hub_event [usbcore] Sep 28 17:28:24 p72 kernel: RIP: 0010:try_to_grab_pending+0xa0/0x170 Sep 28 17:28:24 p72 kernel: Code: e7 e8 c4 b5 94 00 48 8b 03 a8 04 74 0d 48 25 00 ff ff ff 74 05 4c 39 20 74 64 4c 89 e7 c6 07 00 0f 1f 40 00 48 8b 7d 00 57 9d <0f> 1f 44 00 00 48 8b 13 b8 fe ff ff ff 83 e2 14 48 83 fa 10 74 85 Sep 28 17:28:24 p72 kernel: RSP: 0018:ffffb1ebc642fac0 EFLAGS: 00000286 Sep 28 17:28:24 p72 kernel: RAX: 00000000000001c1 RBX: ffff95d78718f790 RCX: 0000000000000000 Sep 28 17:28:24 p72 kernel: RDX: 0000000000000001 RSI: ffff95d787802518 RDI: 0000000000000286 Sep 28 17:28:24 p72 kernel: RBP: ffffb1ebc642fae8 R08: ffff95db1d3ee000 R09: ffffffff82e5c6d8 Sep 28 17:28:24 p72 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff95db1d3ee000 Sep 28 17:28:24 p72 kernel: R13: ffff95da7f248000 R14: ffff95d78718f020 R15: ffff95d78718f440 Sep 28 17:28:24 p72 kernel: FS: 0000000000000000(0000) GS:ffff95db1d3c0000(0000) knlGS:0000000000000000 Sep 28 17:28:24 p72 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 28 17:28:24 p72 kernel: CR2: 00007f2eec84f300 CR3: 000000019d60a005 CR4: 00000000003606e0 Sep 28 17:28:24 p72 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 28 17:28:24 p72 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 28 17:28:24 p72 kernel: Call Trace: Sep 28 17:28:24 p72 kernel: __cancel_work_timer+0x3c/0x190 Sep 28 17:28:24 p72 kernel: ? _cond_resched+0x16/0x40 Sep 28 17:28:24 p72 kernel: ? usb_kill_urb.part.0+0x30/0xa0 [usbcore] Sep 28 17:28:24 p72 kernel: acm_disconnect+0x13f/0x280 [cdc_acm] Sep 28 17:28:24 p72 kernel: usb_unbind_interface+0x8a/0x270 [usbcore] Sep 28 17:28:24 p72 kernel: __device_release_driver+0x15c/0x210 Sep 28 17:28:24 p72 kernel: device_release_driver+0x24/0x30 Sep 28 17:28:24 p72 kernel: bus_remove_device+0xdb/0x140 Sep 28 17:28:24 p72 kernel: device_del+0x16f/0x2d0 Sep 28 17:28:24 p72 kernel: ? kobject_cleanup+0x4f/0x140 Sep 28 17:28:24 p72 kernel: usb_disable_device+0xc6/0x1f0 [usbcore] Sep 28 17:28:24 p72 kernel: usb_disconnect.cold+0x7e/0x20a [usbcore] Sep 28 17:28:24 p72 kernel: hub_port_connect+0x8a/0x820 [usbcore] Sep 28 17:28:24 p72 kernel: hub_port_connect_change+0xae/0x350 [usbcore] Sep 28 17:28:24 p72 kernel: port_event+0x321/0x500 [usbcore] Sep 28 17:28:24 p72 kernel: hub_event+0x1db/0x440 [usbcore] Sep 28 17:28:24 p72 kernel: process_one_work+0x1e3/0x3b0 Sep 28 17:28:24 p72 kernel: worker_thread+0x46/0x340 Sep 28 17:28:24 p72 kernel: ? process_one_work+0x3b0/0x3b0 Sep 28 17:28:24 p72 kernel: kthread+0x11b/0x140 Sep 28 17:28:24 p72 kernel: ? __kthread_bind_mask+0x60/0x60 Sep 28 17:28:24 p72 kernel: ret_from_fork+0x1f/0x30 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177018 http://bugzilla.opensuse.org/show_bug.cgi?id=1177018#c3 --- Comment #3 from Michael Pujos <pujos.michael@gmail.com> --- Still happening from time to time when unplugging/plugging my Samsung Galaxy S9 when adb is running. Need to hard poweroff the machine when this happens (poweroff command remain stuck) Here with kernel 5.9.1: Nov 11 12:41:24 p72 kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kworker/7:4:31218] Nov 11 12:41:24 p72 kernel: Modules linked in: cdc_acm vhost_net vhost tap vhost_iotlb tun snd_seq_dummy snd_hrtimer snd_seq rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip> Nov 11 12:41:24 p72 kernel: x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp nls_iso8859_1 snd_intel_dspcfg nls_cp437 kvm_intel snd_hda_codec snd_usb_audio mac80211 vfat fuse kvm fat libarc4 irqbypass btusb snd_usbmidi_lib jo> Nov 11 12:41:24 p72 kernel: xhci_pci_renesas drm xhci_hcd aesni_intel nvme glue_helper crypto_simd cryptd usbcore nvme_core serio_raw rtsx_pci wmi battery video pinctrl_cannonlake pinctrl_intel button btrfs blake2b_generic libcrc32c crc3> Nov 11 12:41:24 p72 kernel: CPU: 7 PID: 31218 Comm: kworker/7:4 Tainted: P S U W OEL 5.9.1-1-default #1 openSUSE Tumbleweed Nov 11 12:41:24 p72 kernel: Hardware name: LENOVO 20MBCTO1WW/20MBCTO1WW, BIOS N2CET54W (1.37 ) 06/20/2020 Nov 11 12:41:24 p72 kernel: Workqueue: usb_hub_wq hub_event [usbcore] Nov 11 12:41:24 p72 kernel: RIP: 0010:try_to_grab_pending+0xb8/0x170 Nov 11 12:41:24 p72 kernel: Code: 74 64 4c 89 e7 c6 07 00 0f 1f 40 00 48 8b 7d 00 57 9d 0f 1f 44 00 00 48 8b 13 b8 fe ff ff ff 83 e2 14 48 83 fa 10 74 85 f3 90 <48> 83 c4 08 b8 f5 ff ff ff 5b 5d 41 5c c3 48 8d 7f 20 e8 31 92 07 Nov 11 12:41:24 p72 kernel: RSP: 0018:ffffb447053cbac0 EFLAGS: 00000287 Nov 11 12:41:24 p72 kernel: RAX: 00000000fffffffe RBX: ffff8b5d67785790 RCX: 0000000000000000 Nov 11 12:41:24 p72 kernel: RDX: 0000000000000000 RSI: ffff8b5d478029a8 RDI: 0000000000000286 Nov 11 12:41:24 p72 kernel: RBP: ffffb447053cbae8 R08: ffff8b60dd3ee000 R09: ffffffff9c661c98 Nov 11 12:41:24 p72 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b60dd3ee000 Nov 11 12:41:24 p72 kernel: R13: ffff8b60a7210000 R14: ffff8b5d67785020 R15: ffff8b5d67785440 Nov 11 12:41:24 p72 kernel: FS: 0000000000000000(0000) GS:ffff8b60dd3c0000(0000) knlGS:0000000000000000 Nov 11 12:41:24 p72 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 11 12:41:24 p72 kernel: CR2: 000055cbf00f11fc CR3: 0000000434a0e002 CR4: 00000000003726e0 Nov 11 12:41:24 p72 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 11 12:41:24 p72 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 11 12:41:24 p72 kernel: Call Trace: Nov 11 12:41:24 p72 kernel: __cancel_work_timer+0x3c/0x190 Nov 11 12:41:24 p72 kernel: ? _cond_resched+0x16/0x40 Nov 11 12:41:24 p72 kernel: ? usb_kill_urb.part.0+0x30/0xa0 [usbcore] Nov 11 12:41:24 p72 kernel: acm_disconnect+0x13f/0x280 [cdc_acm] Nov 11 12:41:24 p72 kernel: usb_unbind_interface+0x8a/0x270 [usbcore] Nov 11 12:41:24 p72 kernel: ? kernfs_find_ns+0x35/0xd0 Nov 11 12:41:24 p72 kernel: __device_release_driver+0x16b/0x220 Nov 11 12:41:24 p72 kernel: device_release_driver+0x24/0x30 Nov 11 12:41:24 p72 kernel: bus_remove_device+0xdb/0x140 Nov 11 12:41:24 p72 kernel: device_del+0x16f/0x3f0 Nov 11 12:41:24 p72 kernel: ? kobject_cleanup+0x4f/0x140 Nov 11 12:41:24 p72 kernel: usb_disable_device+0xc6/0x1f0 [usbcore] Nov 11 12:41:24 p72 kernel: usb_disconnect.cold+0x7e/0x20a [usbcore] Nov 11 12:41:24 p72 kernel: hub_port_connect+0x8a/0x820 [usbcore] Nov 11 12:41:24 p72 kernel: hub_port_connect_change+0xae/0x350 [usbcore] Nov 11 12:41:24 p72 kernel: port_event+0x321/0x500 [usbcore] Nov 11 12:41:24 p72 kernel: hub_event+0x1db/0x440 [usbcore] Nov 11 12:41:24 p72 kernel: process_one_work+0x1e3/0x3b0 Nov 11 12:41:24 p72 kernel: worker_thread+0x46/0x340 Nov 11 12:41:24 p72 kernel: ? process_one_work+0x3b0/0x3b0 Nov 11 12:41:24 p72 kernel: kthread+0x11b/0x140 Nov 11 12:41:24 p72 kernel: ? __kthread_bind_mask+0x60/0x60 Nov 11 12:41:24 p72 kernel: ret_from_fork+0x1f/0x30 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1177018 http://bugzilla.opensuse.org/show_bug.cgi?id=1177018#c4 Philippe Cond� <conde.philippe@skynet.be> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |conde.philippe@skynet.be --- Comment #4 from Philippe Cond� <conde.philippe@skynet.be> --- Hello, I'm on tumbleweed and my system is updated on each snapshot via "zypper dup" I see the same problem from time to time: sometimes it occurs when I try to unlock the sytem , sometimes when I scroll in firefox for a page still in load. I found this error "Nov 19 08:36:57 hpprol2 systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)! Nov 19 08:36:57 hpprol2 systemd[1]: systemd-udevd.service: Killing process 618 (systemd-udevd) with signal SIGABRT." This is followed by these watchdog error repeated more > 25 times. "Nov 19 08:37:18 hpprol2 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [systemd:1] Nov 19 08:37:18 hpprol2 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse pppoe pppox af_packet ppp_generic slhc 8021q garp mrp xt_TCPMSS xt_state nf_nat_tftp n> Nov 19 08:37:18 hpprol2 kernel: snd_rawmidi snd_seq_device snd_pcm acpi_ipmi snd_timer snd soundcore ipmi_si thermal ipmi_devintf ipmi_msghandler tiny_power_button nfsd auth_r> Nov 19 08:37:18 hpprol2 kernel: CPU: 1 PID: 1 Comm: systemd Tainted: G S 5.9.1-2-default #1 openSUSE Tumbleweed Nov 19 08:37:18 hpprol2 kernel: Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013 Nov 19 08:37:18 hpprol2 kernel: RIP: e030:smp_call_function_many_cond+0x299/0x2e0 Nov 19 08:37:18 hpprol2 kernel: Code: 89 fe e8 ba 67 43 00 3b 05 88 c1 83 01 89 c7 0f 83 f9 fd ff ff 48 63 c7 49 8b 16 48 03 14 c5 00 c9 3c 82 8b 42 08 a8 01 74 09 <f3> 90 8b 4> Nov 19 08:37:18 hpprol2 kernel: RSP: e02b:ffffc9004002fb48 EFLAGS: 00000202 Nov 19 08:37:18 hpprol2 kernel: RAX: 0000000000000011 RBX: ffff88838126f5c8 RCX: 0000000000000009 Nov 19 08:37:18 hpprol2 kernel: RDX: ffff8883814753a0 RSI: 0000000000000000 RDI: 0000000000000009 Nov 19 08:37:18 hpprol2 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000009 Nov 19 08:37:18 hpprol2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Nov 19 08:37:18 hpprol2 kernel: R13: 0000000000000200 R14: ffff88838126f580 R15: ffff88838126f588 Nov 19 08:37:18 hpprol2 kernel: FS: 00007f02caf64940(0000) GS:ffff888381240000(0000) knlGS:0000000000000000 Nov 19 08:37:18 hpprol2 kernel: CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 19 08:37:18 hpprol2 kernel: CR2: 000055a28e658098 CR3: 00000003773ae000 CR4: 0000000000040660 Nov 19 08:37:18 hpprol2 kernel: Call Trace: Nov 19 08:37:18 hpprol2 kernel: ? __flush_tlb_all+0x30/0x30 Nov 19 08:37:18 hpprol2 kernel: ? __flush_tlb_all+0x30/0x30 Nov 19 08:37:18 hpprol2 kernel: on_each_cpu+0x2b/0x60 Nov 19 08:37:18 hpprol2 kernel: __purge_vmap_area_lazy+0x5d/0x670 Nov 19 08:37:18 hpprol2 kernel: ? do_jit+0xbe6/0x1ca0 Nov 19 08:37:18 hpprol2 kernel: _vm_unmap_aliases.part.0+0x104/0x140 Nov 19 08:37:18 hpprol2 kernel: change_page_attr_set_clr+0xb9/0x1c0 Nov 19 08:37:18 hpprol2 kernel: set_memory_ro+0x26/0x30 Nov 19 08:37:18 hpprol2 kernel: bpf_int_jit_compile+0x329/0x38f Nov 19 08:37:18 hpprol2 kernel: bpf_prog_select_runtime+0x101/0x1a0 Nov 19 08:37:18 hpprol2 kernel: bpf_prog_load+0x47b/0x8b0 Nov 19 08:37:18 hpprol2 kernel: ? _cond_resched+0x16/0x40 Nov 19 08:37:18 hpprol2 kernel: ? slab_pre_alloc_hook.constprop.0+0xd0/0x110 Nov 19 08:37:18 hpprol2 kernel: ? _kstrtoull+0x35/0xd0 Nov 19 08:37:18 hpprol2 kernel: __do_sys_bpf+0x405/0x750 Nov 19 08:37:18 hpprol2 kernel: do_syscall_64+0x33/0x80 Nov 19 08:37:18 hpprol2 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nov 19 08:37:18 hpprol2 kernel: RIP: 0033:0x7f02cba6357d Nov 19 08:37:18 hpprol2 kernel: Code: d1 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f> Nov 19 08:37:18 hpprol2 kernel: RSP: 002b:00007fffdccf74a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 Nov 19 08:37:18 hpprol2 kernel: RAX: ffffffffffffffda RBX: 000055a28e648990 RCX: 00007f02cba6357d Nov 19 08:37:18 hpprol2 kernel: RDX: 0000000000000070 RSI: 00007fffdccf74b0 RDI: 0000000000000005 Nov 19 08:37:18 hpprol2 kernel: RBP: 0000000000000000 R08: 000055a28e3b6010 R09: 0000000800000008 Nov 19 08:37:18 hpprol2 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055a28e5d5490 Nov 19 08:37:18 hpprol2 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 000055a28e641f30 Nov 19 08:37:46 hpprol2 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [systemd:1] ...." The system is then locked (numlock doesn't respond and changing to VT is not possible) --> I need to do a hard restart. Regards Philippe -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com