[Bug 1180344] New: Kernel 5.10.1 fails to boot with workqueue lockup
https://bugzilla.suse.com/show_bug.cgi?id=1180344 Bug ID: 1180344 Summary: Kernel 5.10.1 fails to boot with workqueue lockup Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: adam@valkor.net QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 844693 --> https://bugzilla.suse.com/attachment.cgi?id=844693&action=edit journal from failed boot Linux 5.10.1 on my Thinkpad X1 Carbon Gen 7 fails to boot with workqueue lockup messages and a failure to mount /home. I've given it 10 minutes, but it will not complete booting. Previous kernels, including the fallback 5.9.14 do not exhibit this behavior. Dec 24 18:00:05 think kernel: BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 50s! Dec 24 18:00:05 think kernel: Showing busy workqueues and worker pools: Dec 24 18:00:05 think kernel: workqueue events: flags=0x0 Dec 24 18:00:05 think kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=11/256 refcnt=12 Dec 24 18:00:05 think kernel: in-flight: 840:request_firmware_work_func Dec 24 18:00:05 think kernel: pending: delayed_fput, drm_fb_helper_dirty_work [drm_kms_helper], free_work, kfree_rcu_monitor, kernfs_notify_workfn, rfkill_global_led_trigger_worker [rfkill], smp_call_on_c> Dec 24 18:00:05 think kernel: pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 Dec 24 18:00:05 Previous kernels, including the fallback 5.9.14 do not exhibit this behavior. think kernel: pending: vmstat_shepherd Dec 24 18:00:05 think kernel: workqueue events_long: flags=0x0 Dec 24 18:00:05 think kernel: pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 Dec 24 18:00:05 think kernel: in-flight: 143:ucsi_init_work [typec_ucsi] Dec 24 18:00:05 think kernel: workqueue events_unbound: flags=0x2 Dec 24 18:00:05 think kernel: pwq 16: cpus=0-7 flags=0x4 nice=0 active=4/512 refcnt=6 Dec 24 18:00:05 think kernel: in-flight: 58:fsnotify_connector_destroy_workfn fsnotify_connector_destroy_workfn, 63:fsnotify_mark_destroy_workfn fsnotify_mark_destroy_workfn Dec 24 18:00:05 think kernel: workqueue rcu_gp: flags=0x8 Dec 24 18:00:05 think kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 Dec 24 18:00:05 think kernel: pending: process_srcu Dec 24 18:00:05 think kernel: workqueue mm_percpu_wq: flags=0x8 Dec 24 18:00:05 think kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=2/256 refcnt=4 Dec 24 18:00:05 think kernel: pending: vmstat_update, lru_add_drain_per_cpu BAR(907) Dec 24 18:00:05 think kernel: workqueue kec_query: flags=0x0 Dec 24 18:00:05 think kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/16 refcnt=2 Dec 24 18:00:05 think kernel: pending: acpi_ec_event_processor Dec 24 18:00:05 think kernel: workqueue usb_hub_wq: flags=0x4 Dec 24 18:00:05 think kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 Dec 24 18:00:05 think kernel: pending: hub_event [usbcore] Dec 24 18:00:05 think kernel: pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 5 770 7 Dec 24 18:00:05 think kernel: pool 10: cpus=5 node=0 flags=0x0 nice=0 hung=50s workers=4 idle: 764 43 133 Dec 24 18:00:05 think kernel: pool 16: cpus=0-7 flags=0x4 nice=0 hung=0s workers=8 idle: 8 60 59 62 61 64 ��� rpm -qa | grep kernel-default kernel-default-5.10.1-1.1.x86_64 kernel-default-5.9.14-1.2.x86_64 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c1
--- Comment #1 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c2
--- Comment #2 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c3
--- Comment #3 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c4
--- Comment #4 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c5
--- Comment #5 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c12
--- Comment #12 from Adam Stephens
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c15
Bit Juggler
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c21
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c22
--- Comment #22 from Frank Kr�ger
I haven't had time to take a deeper look (and the bug doesn't hit on my machine), but as a blind shot, I built a kernel with the patch with a revert of the commit in the relevant code path. Can anyone test a kernel package in OBS home:tiwai:bsc1180344? Just to be sure.
Tried your kernel, but failed with "bad shim signature". -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c23
--- Comment #23 from Takashi Iwai
(In reply to Takashi Iwai from comment #21)
I haven't had time to take a deeper look (and the bug doesn't hit on my machine), but as a blind shot, I built a kernel with the patch with a revert of the commit in the relevant code path. Can anyone test a kernel package in OBS home:tiwai:bsc1180344? Just to be sure.
Tried your kernel, but failed with "bad shim signature".
It's an unofficial package, so please test with Secure Boot disabled. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c24
--- Comment #24 from Frank Kr�ger
I haven't had time to take a deeper look (and the bug doesn't hit on my machine), but as a blind shot, I built a kernel with the patch with a revert of the commit in the relevant code path. Can anyone test a kernel package in OBS home:tiwai:bsc1180344? Just to be sure.
Disabling secure boot and booting kernel-default-5.10.2-1.1.gcb6a1b3.x86_64 from home:tiwai:bsc1180344 results in a black screen. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c25
--- Comment #25 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c26
--- Comment #26 from Frank Kr�ger
Then you're hitting a problem that is something else than this bug. I guess you get the same result from the kernel in OBS Kernel:stable, too?
Yes, but your kernel with the "iwlwifi dbg revert" boots fine with 'options iwlwifi enable_ini=0'. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c27
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c28
--- Comment #28 from Frank Kr�ger
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c29
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c30
--- Comment #30 from Frank Kr�ger
Thanks, that shows that the problem still persists.
I refreshed the same repo with another shot, now a new kernel is being built. Please check it later again. It'll be based on 5.10.3.
5.10.3-1.g0a5cd07-default x86_64 from your repo works fine (without the workaround)! -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c31
--- Comment #31 from Frank Kr�ger
https://bugzilla.suse.com/show_bug.cgi?id=1180344
Frank Kr�ger
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c32
--- Comment #32 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c33
--- Comment #33 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c34
--- Comment #34 from Arjen de Korte
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c35
Arjen de Korte
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c36
Klein Kravis
I get the same problem on a Thinkpad T495, but my desktop boots fine with 5.10
Same issue on my ThinkBook 13s IWL (See bug: 1180376) Would blacklisting iwlwifi prevent WiFi from working? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c37
--- Comment #37 from Arjen de Korte
Same issue on my ThinkBook 13s IWL (See bug: 1180376) Would blacklisting iwlwifi prevent WiFi from working?
You don't need to blacklist iwlwifi (which would indeed prevent WiFi from working). In https://bugzilla.suse.com/show_bug.cgi?id=1180344#c16 a workaround is described which will temporary fix this problem. It disables the debug logging, which is where the problem seems to be in. Unless you're actively involved in the iwlwifi development, it is highly unlikely that you need the debuglog anyway. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c38
--- Comment #38 from Takashi Iwai
Looking at drivers/net/wireless/intel/iwlwifi/iwl-debug.h, IWL_DEBUG_FW accepts a va_arg list, so why not replace the offending code with
IWL_DEBUG_FW(trans, "WRT: parsing region: %.*s\n", IWL_FW_INI_MAX_NAME, reg->name);
This will have the same effect, but without the terrible habit of modifying a buffer/string one doesn't own.
That would work, yes. But then add a comment why this form is used (i.e. it may be a non-terminated string and may be read-only) before the call, otherwise the reader at a later point would overlook the need of the modifier again. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c39
Thomas Rother
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c40
--- Comment #40 from Arjen de Korte
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c42
Huy Phung
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c43
--- Comment #43 from Michal Kube��ek
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c49
--- Comment #49 from Adam Stephens
JFYI: kernel-default-5.10.3-2.1.g73f6c2f.x86_64 from Kernel:stable boots fine without the above-mentioned workaround. Thx.
I can confirm that I'm able to boot into 5.10.3 without any workarounds. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c50
Thomas Rother
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c51
Michal Kube��ek
Also confirmed, kernel 5.10.3 works without workarounds. Marking resolved fixed
That would be premature. First, 5.10.3 certainly does not work without workaround, only our KotD snapshot does, thanks to Takashi's patch. Even current mainline (or 5.11-rc2) does not have the issue addressed, AFAICS. Thus I would prefer to keep the bug open until we have some response from upstream and until a fix (Takashi's, Arjen's or some other) is on its way to mainline. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c52
--- Comment #52 from Axel Braun
(In reply to Thomas Rother from comment #50)
Also confirmed, kernel 5.10.3 works without workarounds. Marking resolved fixed
That would be premature. First, 5.10.3 certainly does not work without workaround, only our KotD snapshot does, thanks to Takashi's patch. Even current mainline (or 5.11-rc2) does not have the issue addressed, AFAICS.
I have upgraded to Kernel 5.10.3 without workaround and iwlwifi works.... -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c53
--- Comment #53 from Arjen de Korte
I have upgraded to Kernel 5.10.3 without workaround and iwlwifi works....
That is because Takashi patched out the offending code in Kernel 5.10.3 before it was released in Tumbleweed. But upstream Kernel 5.10.3 (and 5.10.4 as well) is still broken. That's why it is probably a good idea to leave this open, as the upstream fix is not available. FWIW, upstream might choose a completely different approach to fix this. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c54
--- Comment #54 from Jiri Slaby
Created attachment 844714 [details]
This is the second time iwlwifi tries to write to the FW data:
commit ea0cca61d628662e4a1b26c77c7646f9a0257069
Author: Jiri Slaby
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c55
--- Comment #55 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1180344
https://bugzilla.suse.com/show_bug.cgi?id=1180344#c58
Takashi Iwai
participants (1)
-
bugzilla_noreply@suse.com