Bug ID 1203630
Summary Multiple occurences of "BUG: workqueue leaked lock or atomic" causing OOM conditions after kernel update 5.14.21-150400.24.18-default -> 5.14.21-150400.24.21-default
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.4
Hardware Other
OS Other
Status NEW
Severity Major
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter okurz@suse.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

## Observation
See the detailed report in https://progress.opensuse.org/issues/116722
We observed multiple occassions of "BUG: workqueue leaked lock or atomic"
causing OOM conditions after kernel update 5.14.21-150400.24.18-default ->
5.14.21-150400.24.21-default.

Detailed log contents:

```
1202771 Sep 18 06:41:21 openqa kernel: BUG: workqueue leaked lock or atomic:
kworker/u21:2/0x00000001/19034
1202772                                     last function: xs_error_handle
[sunrpc]
1202773 Sep 18 06:41:21 openqa kernel: CPU: 2 PID: 19034 Comm: kworker/u21:2
Not tainted 5.14.21-150400.24.21-default #1 SLE15-SP4
7550826c4c7e8c258239e300508e0c8b2a69bad2
1202774 Sep 18 06:41:21 openqa kernel: Hardware name: QEMU Standard PC (i440FX
+ PIIX, 1996), BIOS Bochs 01/01/2011
1202775 Sep 18 06:41:21 openqa kernel: Workqueue: xprtiod xs_error_handle
[sunrpc]
1202776 Sep 18 06:41:21 openqa kernel: Call Trace:
1202777 Sep 18 06:41:21 openqa kernel:  <TASK>
1202778 Sep 18 06:41:21 openqa kernel:  dump_stack_lvl+0x45/0x5b
1202779 Sep 18 06:41:21 openqa kernel:  process_one_work+0x390/0x440
1202780 Sep 18 06:41:21 openqa kernel:  worker_thread+0x2d/0x3d0
1202781 Sep 18 06:41:21 openqa kernel:  ? process_one_work+0x440/0x440
1202782 Sep 18 06:41:21 openqa kernel:  kthread+0x156/0x180
1202783 Sep 18 06:41:21 openqa kernel:  ? set_kthread_struct+0x50/0x50
1202784 Sep 18 06:41:21 openqa kernel:  ret_from_fork+0x22/0x30
1202785 Sep 18 06:41:21 openqa kernel:  </TASK>
1202786 Sep 18 06:41:21 openqa kernel: BUG: scheduling while atomic:
kworker/u21:2/19034/0x00000002
1202787 Sep 18 06:41:21 openqa kernel: Modules linked in: dm_mod iscsi_ibft
iscsi_boot_sysfs rfkill loop xfs libcrc32c kvm_amd ccp kvm virtio_net
net_failover virtio_balloon failover irqbypass joydev i2c_piix4 pcspkr button
nfsd auth_rpcgss nfs_acl lockd grace fuse sunrpc configfs ip        _tables
x_tables ext4 crc16 mbcache jbd2 drm_kms_helper ata_generic syscopyarea
sysfillrect sysimgblt serio_raw fb_sys_fops cec ata_piix rc_core ahci uhci_hcd
libahci ehci_hcd drm libata usbcore virtio_blk floppy qemu_fw_cfg sg scsi_mod
1202788 Sep 18 06:41:21 openqa kernel: Supported: Yes
1202789 Sep 18 06:41:21 openqa kernel: CPU: 2 PID: 19034 Comm: kworker/u21:2
Not tainted 5.14.21-150400.24.21-default #1 SLE15-SP4
7550826c4c7e8c258239e300508e0c8b2a69bad2
1202790 Sep 18 06:41:21 openqa kernel: Hardware name: QEMU Standard PC (i440FX
+ PIIX, 1996), BIOS Bochs 01/01/2011
1202791 Sep 18 06:41:21 openqa kernel: Workqueue:  0x0 (xprtiod)
1202792 Sep 18 06:41:21 openqa kernel: Call Trace:
1202793 Sep 18 06:41:21 openqa kernel:  <TASK>
1202794 Sep 18 06:41:21 openqa kernel:  dump_stack_lvl+0x45/0x5b
1202795 Sep 18 06:41:21 openqa kernel:  __schedule_bug+0x52/0x70
1202796 Sep 18 06:41:21 openqa kernel:  __schedule+0xdc4/0x1140
1202797 Sep 18 06:41:21 openqa kernel:  ? arch_local_irq_enable+0x7/0xc
1202798 Sep 18 06:41:21 openqa kernel:  schedule+0x64/0xe0
1202799 Sep 18 06:41:21 openqa kernel:  worker_thread+0xab/0x3d0
1202800 Sep 18 06:41:21 openqa kernel:  ? process_one_work+0x440/0x440
1202801 Sep 18 06:41:21 openqa kernel:  kthread+0x156/0x180
1202802 Sep 18 06:41:21 openqa kernel:  ? set_kthread_struct+0x50/0x50
1202803 Sep 18 06:41:21 openqa kernel:  ret_from_fork+0x22/0x30
1202804 Sep 18 06:41:21 openqa kernel:  </TASK>
1202805 Sep 18 06:41:46 openqa kernel: watchdog: BUG: soft lockup - CPU#2 stuck
for 26s! [kworker/2:2:24024]
```

after that the process is stuck for a longer time until processes like postgres
run into out of memory conditions eventually crashing the complete machine.

I looked into 10 days of logs before the upgrade 5.14.21-150400.24.18-default
-> 5.14.21-150400.24.21-default and found no mentions of the above condition.


You are receiving this mail because: