[Bug 1231583] New: kernel panic especially shorttly after wakeup from STR on linux-default-6.4.0 "kernel BUG at ../lib/dynamic_queue_limits.c:27!"
https://bugzilla.suse.com/show_bug.cgi?id=1231583 Bug ID: 1231583 Summary: kernel panic especially shorttly after wakeup from STR on linux-default-6.4.0 "kernel BUG at ../lib/dynamic_queue_limits.c:27!" Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.6 Hardware: x86-64 OS: SUSE Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: okurz@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 877944 --> https://bugzilla.suse.com/attachment.cgi?id=877944&action=edit dmesg of kernel panic especially shorttly after wakeup from STR on linux-default-6.4.0 showing "kernel BUG at ../lib/dynamic_queue_limits.c:27!" ## Observation Since about 2024-06-19 on my Dell Inc. Latitude 5300 2-in-1/0J9C2F, BIOS 1.19.0 12/14/2021, I observed multiple kernel panics, mostly either immediately or some seconds after wakeup from STR. After enabling kdump I could verify that every time the problem reported is "kernel BUG at ../lib/dynamic_queue_limits.c:27! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI". Currently running 6.4.0-150600.23.25-default. stack trace of latest recorded kernel panic: ``` [27011.816948] kernel BUG at ../lib/dynamic_queue_limits.c:27! [27011.816961] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [27011.816969] CPU: 0 PID: 10821 Comm: StreamT~ns #477 Kdump: loaded Tainted: P OE n 6.4.0-150600.23.25-default #1 SLE15-SP6 7129efd26ed51feb3306c16abaa78c465d48fc44 [27011.816981] Hardware name: Dell Inc. Latitude 5300 2-in-1/0J9C2F, BIOS 1.19.0 12/14/2021 [27011.816987] RIP: 0010:dql_completed+0x13b/0x150 [27011.817001] Code: ef 74 01 48 89 57 58 e9 47 ff ff ff 85 ed 40 0f 95 c5 41 39 db 41 0f 95 c3 44 84 dd 74 04 85 d2 78 0a 44 89 c1 e9 29 ff ff ff <0f> 0b 01 f6 44 89 c2 29 f2 0f 48 d1 eb 8a cc cc cc cc cc cc cc 90 [27011.817013] RSP: 0000:ffff993541cf3de8 EFLAGS: 00010293 [27011.817021] RAX: 00000000000000aa RBX: ffff8d9a81e09b78 RCX: 00000022dd8dc000 [27011.817029] RDX: 00000022dd8da000 RSI: 000000000000012e RDI: ffff8d9a81d03100 [27011.817036] RBP: ffff8d9a81e099c0 R08: 0000000000000001 R09: 0000000000000000 [27011.817043] R10: 00000000000000aa R11: 0000000000000000 R12: ffff8d9a81d03000 [27011.817050] R13: ffff8d9a81e09b98 R14: ffff8d9a81e09000 R15: 0000000000000006 [27011.817056] FS: 00007f8df98036c0(0000) GS:ffff8da1c7e00000(0000) knlGS:0000000000000000 [27011.817064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27011.817071] CR2: 00007f8ded069000 CR3: 00000001dcd82001 CR4: 00000000003706f0 [27011.817078] Call Trace: [27011.817084] <TASK> [27011.817091] ? __die_body+0x1a/0x60 [27011.817103] ? die+0x38/0x60 [27011.817111] ? do_trap+0x10a/0x120 [27011.817120] ? dql_completed+0x13b/0x150 [27011.817129] ? do_error_trap+0x64/0xa0 [27011.817137] ? dql_completed+0x13b/0x150 [27011.817146] ? exc_invalid_op+0x53/0x60 [27011.817154] ? dql_completed+0x13b/0x150 [27011.817161] ? asm_exc_invalid_op+0x16/0x20 [27011.817172] ? dql_completed+0x13b/0x150 [27011.817181] write_bulk_sg_callback+0xc7/0x1e0 [r8152 5053a9ece6fed4b24993dc907e4c2f28089d5aff] [27011.817210] __usb_hcd_giveback_urb+0x84/0x120 [usbcore e5761cade106b17f06019a7c390c1f126777ec30] [27011.817286] usb_giveback_urb_bh+0x94/0x120 [usbcore e5761cade106b17f06019a7c390c1f126777ec30] [27011.817357] tasklet_action_common.isra.23+0xc0/0x240 [27011.817369] __do_softirq+0xbf/0x2b3 [27011.817379] irq_exit_rcu+0xa3/0xc0 [27011.817387] common_interrupt+0x8b/0xa0 [27011.817395] asm_common_interrupt+0x22/0x40 [27011.817403] RIP: 0033:0x7f8e13180ceb [27011.817468] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 c4 41 01 ef ff 89 f8 09 f0 c1 e0 14 3d 00 00 00 f8 0f 87 29 03 00 00 c5 fe 6f 07 <c5> fd 74 0e c5 85 74 d0 c5 ed df c9 c5 fd d7 c9 ff c1 74 61 90 f3 [27011.817480] RSP: 002b:00007f8df9802098 EFLAGS: 00000287 [27011.817489] RAX: 00000000e9400000 RBX: 00007f8de5ed8e80 RCX: 000000000000000c [27011.817496] RDX: 00007f8e0c3b37a0 RSI: 00007f8e09877a14 RDI: 00007f8de5ed8e80 [27011.817504] RBP: 0000000000000014 R08: 0000000000000053 R09: 00000000ffffffff [27011.817511] R10: 0000000000000000 R11: 00007f8e131a8680 R12: 00007f8ded069330 [27011.817519] R13: 00007f8e0c3b3840 R14: 0000000000000000 R15: 000000000000000a [27011.817530] </TASK> [27011.817535] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq tun nf_conntrack_netbios_ns nf_conntrack_broadcast ccm tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet af_packet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter qrtr(n) cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat snd_sof_pci_intel_cnl ext4 snd_sof_intel_hda_common soundwire_intel hid_multitouch wacom snd_sof_intel_hda_mlink mbcache binfmt_misc jbd2 soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils [27011.817631] soundwire_generic_allocation iTCO_wdt soundwire_bus spi_nor intel_pmc_bxt i2c_designware_platform ee1004(n) iTCO_vendor_support mei_wdt mei_hdcp(n) i2c_designware_core mtd mei_pxp dell_rbtn(n) snd_soc_skl(n) ccp snd_soc_hdac_hda snd_hda_ext_core dell_laptop intel_rapl_msr snd_ctl_led snd_soc_sst_ipc snd_hda_codec_realtek snd_soc_sst_dsp dell_smm_hwmon(n) snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_generic intel_tcc_cooling(n) snd_soc_core x86_pkg_temp_thermal dell_wmi snd_compress snd_pcm_dmaengine intel_powerclamp coretemp iwlmvm snd_hda_intel kvm_intel dell_smbios snd_intel_dspcfg dcdbas(X) snd_intel_sdw_acpi kvm irqbypass pcspkr uvcvideo snd_usb_audio btusb snd_hda_codec videobuf2_vmalloc dell_wmi_sysman(n) snd_usbmidi_lib btrtl dell_wmi_ddv(n) ledtrig_audio dell_wmi_descriptor efi_pstore(n) firmware_attributes_class(n) thunderbolt wmi_bmof mac80211 intel_wmi_thunderbolt(n) snd_hda_core spi_intel_pci(n) snd_ump uvc btintel i2c_i801 spi_intel(n) btbcm videobuf2_memops btmtk libarc4 snd_hwdep [27011.817760] i2c_smbus squashfs bluetooth snd_rawmidi snd_seq_device videobuf2_v4l2 mei_me mei snd_pcm loop videodev snd_timer hid_sensor_incl_3d ecdh_generic intel_lpss_pci hid_sensor_magn_3d hid_sensor_rotation snd hid_sensor_gyro_3d hid_sensor_accel_3d videobuf2_common intel_lpss iwlwifi crc16 mc idma64 soundcore hid_sensor_trigger hid_sensor_iio_common processor_thermal_device_pci_legacy(n) industrialio_triggered_buffer kfifo_buf processor_thermal_device processor_thermal_rfim processor_thermal_mbox industrialio tiny_power_button(n) processor_thermal_rapl intel_rapl_common intel_pch_thermal intel_soc_dts_iosf(n) thermal button joydev int3403_thermal soc_button_array(n) int340x_thermal_zone int3400_thermal intel_hid(n) intel_pmc_core acpi_thermal_rel sparse_keymap acpi_pad ac nvme_fabrics fuse nvme_keyring configfs dmi_sysfs ip_tables x_tables dm_crypt essiv authenc typec_displayport r8152(OEX) hid_sensor_custom(n) hid_sensor_hub intel_ishtp_hid wl(POEn) crc32_pclmul polyval_clmulni(n) polyval_generic(n) gf128mul [27011.817891] ghash_clmulni_intel hid_generic sha512_ssse3 rtsx_pci_sdmmc sha256_ssse3 usbhid i915 nvme sha1_ssse3 mmc_core i2c_algo_bit cfg80211 drm_buddy nvme_core ttm drm_display_helper nvme_auth rtsx_pci t10_pi aesni_intel crc64_rocksoft_generic ucsi_acpi cec xhci_pci typec_ucsi video intel_ish_ipc xhci_pci_renesas crypto_simd crc64_rocksoft xhci_hcd cryptd usbcore roles crc64 mfd_core rc_core intel_ishtp i2c_hid_acpi rfkill typec i2c_hid battery wmi pinctrl_cannonlake serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq dm_mod uhid br_netfilter bridge stp llc msr efivarfs [27011.817994] Unloaded tainted modules: intel_pmc_core_pltdrv(n):1 [27011.818026] Supported: No, Proprietary modules are loaded ``` In three stack traces I found ``` write_bulk_sg_callback+0xc7/0x1e0 [r8152 5053a9ece6fed4b24993dc907e4c2f28089d5aff] ``` so potentially the problem source is the network driver r8152 but maybe that's only a symptom. ## Reproducible Unclear. So far only observed on that device after some wakeups from STR, not all. Not fully reproducible yet. ## Expected result Last known good is possibly kernel-default-5.14.21-150500.55.52.1 from Leap 15.5 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|kernel panic especially |kernel panic especially |shorttly after wakeup from |shortly after wakeup from |STR on linux-default-6.4.0 |STR on linux-default-6.4.0 |"kernel BUG at |"kernel BUG at |../lib/dynamic_queue_limits |../lib/dynamic_queue_limits |.c:27!" |.c:27!" -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c1 --- Comment #1 from Oliver Kurz <okurz@suse.com> --- vmcore is available within (SUSE-internal) https://w3.nue.suse.com/~okurz/testing/boo1231583_kernel_panic_shortly_after... The only other remotely related information I found is https://lkml.org/lkml/2019/2/9/44 about "Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!" -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c2 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |okurz@suse.com, | |tiwai@suse.com Flags| |needinfo?(okurz@suse.com) --- Comment #2 from Takashi Iwai <tiwai@suse.com> --- Is this a regression by the recent update kernel? Also, check with the latest SLE15-SP6 kernel in OBS Kernel:SLE15-SP6 repo, just to be sure, too: http://download.opensuse.org/repositories/Kernel:/SLE15-SP6/pool/ -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c3 --- Comment #3 from Oliver Kurz <okurz@suse.com> --- (In reply to Takashi Iwai from comment #2)
Is this a regression by the recent update kernel?
I don't think so. As stated in the description the last known good is possibly kernel-default-5.14.21-150500.55.52.1 from Leap 15.5 and the problem only hits every couple of days so not easy to reproduce and not easy to verify a fix but I will try to come up with a better reproducer.
Also, check with the latest SLE15-SP6 kernel in OBS Kernel:SLE15-SP6 repo, just to be sure, too: http://download.opensuse.org/repositories/Kernel:/SLE15-SP6/pool/
Thanks, I will consider testing that as well as Kernel:HEAD -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c4 --- Comment #4 from Oliver Kurz <okurz@suse.com> --- Tried STR and wakeup 10x, no problem. on 2024-10-16 crash again but only after a night in STR, maybe there is some hybrid switch to suspend with writing data to the persistent storage that is a problem or related to clock when off too long? Now trying SLE15-SP6 development kernel 6.4.0-150600.328.g07e1f67-default -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c5 --- Comment #5 from Takashi Iwai <tiwai@suse.com> --- Thanks. The problem is indeed likely related with r8152 driver, as it seems. BTW, is r8152 module the built-in one? I'm building a test kernel to replace BUG_ON() with WARN_ON() and debug print. It's built in OBS home:tiwai:bsc1231583. Please check it later. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c6 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED --- Comment #6 from Oliver Kurz <okurz@suse.com> --- (In reply to Takashi Iwai from comment #5)
Thanks. The problem is indeed likely related with r8152 driver, as it seems.
BTW, is r8152 module the built-in one?
`grep r8152 /lib/modules/$(uname -r)/modules.builtin` says no.
I'm building a test kernel to replace BUG_ON() with WARN_ON() and debug print. It's built in OBS home:tiwai:bsc1231583. Please check it later.
got it. Will look into that depending on results from the current 6.4.0-150600.328.g07e1f67-default -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c7 --- Comment #7 from Takashi Iwai <tiwai@suse.com> --- (In reply to Oliver Kurz from comment #6)
(In reply to Takashi Iwai from comment #5)
Thanks. The problem is indeed likely related with r8152 driver, as it seems.
BTW, is r8152 module the built-in one?
`grep r8152 /lib/modules/$(uname -r)/modules.builtin` says no.
Ah, sorry, I meant differently, whether it's an official kernel module included in kernel-default or it's a 3rd party module, if any. The dmesg showed "r8152(OEX)", which indicates a KMP. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c8 --- Comment #8 from Oliver Kurz <okurz@suse.com> --- (In reply to Takashi Iwai from comment #7)
(In reply to Oliver Kurz from comment #6)
(In reply to Takashi Iwai from comment #5)
Thanks. The problem is indeed likely related with r8152 driver, as it seems.
BTW, is r8152 module the built-in one?
`grep r8152 /lib/modules/$(uname -r)/modules.builtin` says no.
Ah, sorry, I meant differently, whether it's an official kernel module included in kernel-default or it's a 3rd party module, if any. The dmesg showed "r8152(OEX)", which indicates a KMP.
true, true. I see ``` # dmesg | grep -i r8152 [ 4.418045] r8152: loading out-of-tree module taints kernel. [ 4.418084] r8152: module verification failed: signature and/or required key missing - tainting kernel [ 4.423780] usbcore: registered new device driver r8152-cfgselector [ 4.513433] r8152-cfgselector 4-1.4: reset SuperSpeed USB device number 4 using xhci_hcd [ 4.541288] r8152 4-1.4:1.0 (unnamed net_device) (uninitialized): Using pass-thru MAC addr 70:b5:e8:a0:87:09 [ 4.593264] r8152 4-1.4:1.0 eth0: v2.18.1 (2024/05/20) [ 4.593270] r8152 4-1.4:1.0 eth0: This product is covered by one or more of the following patents: [ 4.593323] usbcore: registered new interface driver r8152 [ 21.434535] r8152 4-1.4:1.0 eth1: renamed from eth0 [ 25.403136] r8152 4-1.4:1.0 eth1: carrier on ``` What's the consequence of that? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c9 --- Comment #9 from Takashi Iwai <tiwai@suse.com> --- It might be a bug of that out-of-tree driver. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c10 --- Comment #10 from Takashi Iwai <tiwai@suse.com> --- That is, try to remove that KMP for r8152 and use the standard r8152 driver included in the kernel. If the problem persists with the standard driver, we can debug further. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1231583 https://bugzilla.suse.com/show_bug.cgi?id=1231583#c11 --- Comment #11 from Oliver Kurz <okurz@suse.com> --- In the meantime I had multiple halted systems without logs. Finally 2024-11-09 another crash with kdump effective. dmesg shows same, kernel BUG at ../lib/dynamic_queue_limits.c:27. no direct mention of rtl8152 but there is "RIP: 0010:dql_completed+0x13b/0x150" and the last message before "cut here" is "r8152 4-1.4:1.0 eth1: carrier on" so this could still be related. I was still running 6.4.0-150600.328.g07e1f67-default . I now removed all installed kmp modules with ``` sudo zypper rm -u broadcom-wl-kmp-default crash-kmp-default r8152-kmp-default r8168-kmp-default ``` I assume those were actually pulled in initially by the OS installer. I don't recall explicitly selecting to install any KMPs. I will also switch back to the standard Leap kernel and try that again. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com