Bug ID | 1203630 |
---|---|
Summary | Multiple occurences of "BUG: workqueue leaked lock or atomic" causing OOM conditions after kernel update 5.14.21-150400.24.18-default -> 5.14.21-150400.24.21-default |
Classification | openSUSE |
Product | openSUSE Distribution |
Version | Leap 15.4 |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Major |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-bugs@opensuse.org |
Reporter | okurz@suse.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
## Observation See the detailed report in https://progress.opensuse.org/issues/116722 We observed multiple occassions of "BUG: workqueue leaked lock or atomic" causing OOM conditions after kernel update 5.14.21-150400.24.18-default -> 5.14.21-150400.24.21-default. Detailed log contents: ``` 1202771 Sep 18 06:41:21 openqa kernel: BUG: workqueue leaked lock or atomic: kworker/u21:2/0x00000001/19034 1202772 last function: xs_error_handle [sunrpc] 1202773 Sep 18 06:41:21 openqa kernel: CPU: 2 PID: 19034 Comm: kworker/u21:2 Not tainted 5.14.21-150400.24.21-default #1 SLE15-SP4 7550826c4c7e8c258239e300508e0c8b2a69bad2 1202774 Sep 18 06:41:21 openqa kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 1202775 Sep 18 06:41:21 openqa kernel: Workqueue: xprtiod xs_error_handle [sunrpc] 1202776 Sep 18 06:41:21 openqa kernel: Call Trace: 1202777 Sep 18 06:41:21 openqa kernel: <TASK> 1202778 Sep 18 06:41:21 openqa kernel: dump_stack_lvl+0x45/0x5b 1202779 Sep 18 06:41:21 openqa kernel: process_one_work+0x390/0x440 1202780 Sep 18 06:41:21 openqa kernel: worker_thread+0x2d/0x3d0 1202781 Sep 18 06:41:21 openqa kernel: ? process_one_work+0x440/0x440 1202782 Sep 18 06:41:21 openqa kernel: kthread+0x156/0x180 1202783 Sep 18 06:41:21 openqa kernel: ? set_kthread_struct+0x50/0x50 1202784 Sep 18 06:41:21 openqa kernel: ret_from_fork+0x22/0x30 1202785 Sep 18 06:41:21 openqa kernel: </TASK> 1202786 Sep 18 06:41:21 openqa kernel: BUG: scheduling while atomic: kworker/u21:2/19034/0x00000002 1202787 Sep 18 06:41:21 openqa kernel: Modules linked in: dm_mod iscsi_ibft iscsi_boot_sysfs rfkill loop xfs libcrc32c kvm_amd ccp kvm virtio_net net_failover virtio_balloon failover irqbypass joydev i2c_piix4 pcspkr button nfsd auth_rpcgss nfs_acl lockd grace fuse sunrpc configfs ip _tables x_tables ext4 crc16 mbcache jbd2 drm_kms_helper ata_generic syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops cec ata_piix rc_core ahci uhci_hcd libahci ehci_hcd drm libata usbcore virtio_blk floppy qemu_fw_cfg sg scsi_mod 1202788 Sep 18 06:41:21 openqa kernel: Supported: Yes 1202789 Sep 18 06:41:21 openqa kernel: CPU: 2 PID: 19034 Comm: kworker/u21:2 Not tainted 5.14.21-150400.24.21-default #1 SLE15-SP4 7550826c4c7e8c258239e300508e0c8b2a69bad2 1202790 Sep 18 06:41:21 openqa kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 1202791 Sep 18 06:41:21 openqa kernel: Workqueue: 0x0 (xprtiod) 1202792 Sep 18 06:41:21 openqa kernel: Call Trace: 1202793 Sep 18 06:41:21 openqa kernel: <TASK> 1202794 Sep 18 06:41:21 openqa kernel: dump_stack_lvl+0x45/0x5b 1202795 Sep 18 06:41:21 openqa kernel: __schedule_bug+0x52/0x70 1202796 Sep 18 06:41:21 openqa kernel: __schedule+0xdc4/0x1140 1202797 Sep 18 06:41:21 openqa kernel: ? arch_local_irq_enable+0x7/0xc 1202798 Sep 18 06:41:21 openqa kernel: schedule+0x64/0xe0 1202799 Sep 18 06:41:21 openqa kernel: worker_thread+0xab/0x3d0 1202800 Sep 18 06:41:21 openqa kernel: ? process_one_work+0x440/0x440 1202801 Sep 18 06:41:21 openqa kernel: kthread+0x156/0x180 1202802 Sep 18 06:41:21 openqa kernel: ? set_kthread_struct+0x50/0x50 1202803 Sep 18 06:41:21 openqa kernel: ret_from_fork+0x22/0x30 1202804 Sep 18 06:41:21 openqa kernel: </TASK> 1202805 Sep 18 06:41:46 openqa kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/2:2:24024] ``` after that the process is stuck for a longer time until processes like postgres run into out of memory conditions eventually crashing the complete machine. I looked into 10 days of logs before the upgrade 5.14.21-150400.24.18-default -> 5.14.21-150400.24.21-default and found no mentions of the above condition.