[Bug 1210793] New: Processes stuck waiting jbd2_fc_begin_commit prevent zombies from being killed
https://bugzilla.suse.com/show_bug.cgi?id=1210793 Bug ID: 1210793 Summary: Processes stuck waiting jbd2_fc_begin_commit prevent zombies from being killed Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.4 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: moio@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 866543 --> https://bugzilla.suse.com/attachment.cgi?id=866543&action=edit dmesg output Context: a stress test of Rancher running in k3d, using latest Docker, on an updated openSUSE 15.4 results in the k3s process to die (expected), but its children are not reaped (not expected). Main system partition is ext4 on devicemapper. According to Aleksa, who already gave a first look: "the issue causing the k3s zombie to not be reaped appears to actually be that many processes on the machine (including still-alive threads of the zombie group leader of k3s that isn't being reaped) are stuck waiting in jbd2_fc_begin_commit+0xef/0x120 (waiting for journal->j_fc_wait ) and kjournald2 also appears to be stuck in jbd2_journal_commit_transaction+0x16b/0x1a90 (which is also waiting for journal->j_fc_wait )" `echo w >/proc/sysrq-trigger` was executed after a fresh reboot and the immediate reproduction of the issue - full dmesg output is attached. A kernel dump was also collected via `echo c >/proc/sysrq-trigger` immediately afterwards, and the result is attached. (Switching to the rescue target and proceeding from there, as recommended in https://documentation.suse.com/sles/15-SP4/single-html/SLES-tuning/#cha-tuni..., was attempted but did not work, specifically starting kdump would hang indefinitely). Dump is available at: https://mysuse-my.sharepoint.com/:f:/g/personal/moio_suse_com/EpT94_BHipJAsK... For reference, the main zombie process in the files above has PID 7276: root 7276 19.9 0.0 0 0 ? Zl 18:45 1:36 [k3s] <defunct> I remain available for any other information and I am open to offer a tmate session if that helps. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1210793
Silvio Moioli
participants (1)
-
bugzilla_noreply@suse.com