[Bug 1183839] New: kernel 5.11.{4,6} RIP: 0010:kobject_put+0x19/0x1d0 on Dell Precision M7510
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 Bug ID: 1183839 Summary: kernel 5.11.{4,6} RIP: 0010:kobject_put+0x19/0x1d0 on Dell Precision M7510 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: bruno@ioda-net.ch QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- As asked in https://bugzilla.opensuse.org/show_bug.cgi?id=1182377 I open a new bug since a new RIP appear. See attachements for more information about the hardware (On this system bios is up to date to last revision Dell has offered) As the bios allow it, only the nvidia gpu is used by disabling intel gpu part. All mass storage are fully encrypted and opened during grub2 operation This crash start to be seen on 5.11.4 Extract of the captured crash Mar 20 16:15:19 qt-kt kernel: CPU: 6 PID: 739 Comm: systemd-udevd Not tainted 5.11.6-1-default #1 openSUSE Tumbleweed Mar 20 16:15:19 qt-kt kernel: Hardware name: Dell Inc. Precision 7510/0YH43H, BIOS 1.21.3 08/04/2020 Mar 20 16:15:19 qt-kt kernel: RIP: 0010:kobject_put+0x19/0x1d0 Mar 20 16:15:19 qt-kt kernel: iwlwifi 0000:02:00.0: loaded firmware version 36.ad812ee0.0 8000C-36.ucode op_mode iwlmvm Mar 20 16:15:19 qt-kt kernel: pstore: Efi pstore disabled, enforce via pstore.backend=efi Mar 20 16:15:19 qt-kt kernel: pstore: On a broken BIOS, this can severely harm your system Mar 20 16:15:19 qt-kt kernel: pstore: Only enable efi based pstore when you know what you are doing Mar 20 16:15:19 qt-kt kernel: Bluetooth: HCI device and connection manager initialized Mar 20 16:15:19 qt-kt kernel: Code: 48 c7 c7 08 34 86 92 e8 35 b6 fe ff eb d6 0f 1f 00 48 85 ff 0f 84 b4 00 00 00 41 56 41 55 41 54 55 48 89 fd 53 bb ff ff ff ff <f6> 45 3c 01 0f 84 80 00 00 00 48 8d 7d 38 89 d8 f0 0f c1 45 38 83 Mar 20 16:15:19 qt-kt kernel: usb 2-2: Qualcomm USB modem converter now attached to ttyUSB1 Mar 20 16:15:19 qt-kt kernel: Bluetooth: HCI socket layer initialized Mar 20 16:15:19 qt-kt kernel: RSP: 0018:ffffacd6c07fbd98 EFLAGS: 00010202 Mar 20 16:15:19 qt-kt kernel: RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000 Mar 20 16:15:19 qt-kt kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 1cc78e3ae6c3a8ae Mar 20 16:15:19 qt-kt kernel: RBP: 1cc78e3ae6c3a8ae R08: ffff8f77f8fe8900 R09: ffff8f74c01f1e58 Mar 20 16:15:19 qt-kt kernel: R10: 000000000000001f R11: ffff8f74c4a89488 R12: 1cc78e3ae6c3a8ae Mar 20 16:15:19 qt-kt kernel: R13: 0000000000007ea3 R14: ffffacd6c07fbe80 R15: 0000000000010000 Mar 20 16:15:19 qt-kt kernel: FS: 00007f984fb7b940(0000) GS:ffff8f8404580000(0000) knlGS:0000000000000000 Mar 20 16:15:19 qt-kt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 16:15:19 qt-kt kernel: CR2: 000055e078f036a8 CR3: 00000001037d0004 CR4: 00000000003706e0 Mar 20 16:15:19 qt-kt kernel: Bluetooth: L2CAP socket layer initialized Mar 20 16:15:19 qt-kt kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 20 16:15:19 qt-kt kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 20 16:15:19 qt-kt kernel: Call Trace: Mar 20 16:15:19 qt-kt kernel: kset_unregister+0x25/0x40 Mar 20 16:15:19 qt-kt kernel: ? 0xffffffffc0ea2000 Mar 20 16:15:19 qt-kt kernel: qcserial 2-2:1.3: Qualcomm USB modem converter detected Mar 20 16:15:19 qt-kt kernel: sysman_init+0x20a/0x1000 [dell_wmi_sysman] Mar 20 16:15:19 qt-kt kernel: do_one_initcall+0x44/0x1d0 Mar 20 16:15:19 qt-kt kernel: ? kmem_cache_alloc_trace+0xfe/0x250 Mar 20 16:15:19 qt-kt kernel: do_init_module+0x5c/0x270 Mar 20 16:15:19 qt-kt kernel: __do_sys_init_module+0x13b/0x1c0 Mar 20 16:15:19 qt-kt kernel: do_syscall_64+0x33/0x80 Mar 20 16:15:19 qt-kt kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 20 16:15:19 qt-kt kernel: RIP: 0033:0x7f98507ee15e Mar 20 16:15:19 qt-kt kernel: Code: 48 8b 0d 15 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e2 1c 0c 00 f7 d8 64 89 01 48 Mar 20 16:15:19 qt-kt kernel: RSP: 002b:00007ffede5cf4c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af Mar 20 16:15:19 qt-kt kernel: RAX: ffffffffffffffda RBX: 000055e078640d00 RCX: 00007f98507ee15e Mar 20 16:15:19 qt-kt kernel: RDX: 00007f985090e3a3 RSI: 0000000000017ea3 RDI: 000055e078651b70 Mar 20 16:15:19 qt-kt kernel: RBP: 000055e078651b70 R08: 000055e078640140 R09: 0000000000000003 Mar 20 16:15:19 qt-kt kernel: R10: 000055e5264348e0 R11: 0000000000000246 R12: 00007f985090e3a3 Mar 20 16:15:19 qt-kt kernel: R13: 000055e07844d070 R14: 0000000000000000 R15: 000055e078645d60 Mar 20 16:15:19 qt-kt kernel: Modules linked in: dell_wmi_sysman(+) sparse_keymap efi_pstore dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof mxm_wmi cdc_ether qcserial(+) bluetooth(+) iwlwifi snd_usbmidi_lib usb_wwan usbnet fjes(-) videodev snd_rawmidi mii usbserial snd_seq_device i2c_i801 joydev mc cfg80211 ecdh_generic mei_me e1000e(+) i2c_smbus ecc processor_thermal_device mei intel_pch_thermal thermal processor_thermal_rfim parport_pc processor_thermal_mbox int3403_thermal processor_thermal_rapl tiny_power_button parport intel_rapl_common dell_smo8800 int3402_thermal intel_soc_dts_iosf int340x_thermal_zone int3400_thermal dell_rbtn button acpi_pad rfkill acpi_thermal_rel ie31200_edac ac nls_iso8859_1 nls_cp437 vfat fat drm fuse configfs hid_logitech_hidpp hid_logitech_dj dm_crypt uas hid_generic usb_storage usbhid rtsx_pci_sdmmc mmc_core nvme crct10dif_pclmul xhci_pci crc32_pclmul xhci_pci_renesas ghash_clmulni_intel xhci_hcd aesni_intel glue_helper crypto_simd cryptd nvme_core serio_raw usbcore Mar 20 16:15:19 qt-kt kernel: rtsx_pci battery video wmi btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation snd_soc_core snd_compress snd_pcm_dmaengine soundwire_cadence soundwire_bus snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua br_netfilter bridge stp llc msr efivarfs Mar 20 16:15:19 qt-kt kernel: ---[ end trace c28276df22413bc9 ]--- -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c1 --- Comment #1 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 847490 --> http://bugzilla.opensuse.org/attachment.cgi?id=847490&action=edit RIP: 0010:kobject_put+0x19/0x1d0 kernel with 5.11.6 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c2 --- Comment #2 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 847491 --> http://bugzilla.opensuse.org/attachment.cgi?id=847491&action=edit RIP: 0010:kobject_put+0x19/0x1d0 kernel with 5.11.6 (without nvidia) This time I've tried to not taint the kernel by removing completely the nvidia drivers, but the crash still occurs. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #847491|text/x-vhdl |text/plain mime type| | -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c3 --- Comment #3 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 847492 --> http://bugzilla.opensuse.org/attachment.cgi?id=847492&action=edit Dell Precison M7510 lshw Hardware description -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c4 --- Comment #4 from Takashi Iwai <tiwai@suse.com> --- As far as I see in the previous bug report, the first Oops is always same in this code path from various 5.11.x kernels, right? And, you've tried to blacklist dell-wmi-sysman module but then you got a workqueue stall (in https://bugzilla.opensuse.org/show_bug.cgi?id=1182377#c29)? The Oops itself looks like some memory corruption or such, which leads to the next crashes. But, even if the problem is triggered without dell-wmi-sysman, it means the corruption already occurred before that point. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c5 --- Comment #5 from Bruno Friedmann <bruno@ioda-net.ch> --- (In reply to Takashi Iwai from comment #4)
As far as I see in the previous bug report, the first Oops is always same in this code path from various 5.11.x kernels, right?
And, you've tried to blacklist dell-wmi-sysman module but then you got a workqueue stall (in https://bugzilla.opensuse.org/show_bug.cgi?id=1182377#c29)?
The Oops itself looks like some memory corruption or such, which leads to the next crashes. But, even if the problem is triggered without dell-wmi-sysman, it means the corruption already occurred before that point.
Humm this sound perticulary bad. I believe I will have to open the lappy, and check all connections, and run the highly long full Dell hardware check (with 64GB it always takes time). I will report later the result. I've to move all the vm and container running on this datacenter :-) -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c6 --- Comment #6 from Bruno Friedmann <bruno@ioda-net.ch> --- But I'm still hesitating to be 100% sure it's hardware related, as I can boot 5.10.6 in single mod and simply remove snd-hda-intel and nvidia then reload them in the right order nvidia first and snd-hda-intel afterwards and everything run smoothly, no trace of rip of whatever ? A failing memory module would act the same no ? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c7 --- Comment #7 from Takashi Iwai <tiwai@suse.com> --- I meant a memory corruption by some bad kernel code, not the hardware defect. You can try to install kernel-debug and see whether this catches anything. Also, systemd.confirm_spwan=1 boot option will allow you step-by-step boot sequence. I guess it'll be the udev-trigger or such stage where everything happens, but it might narrow down a bit. Better to disable plymouth if you do that, though (via plymouth.enable=0 option). -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c8 --- Comment #8 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 847523 --> http://bugzilla.opensuse.org/attachment.cgi?id=847523&action=edit Picture of the Rip I was only able to pick those picture (not complete) sorry. The crash occur before it seems the opening of / partition and as such the journal is not synced. I will try again later, sometimes the crash appear later. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c9 --- Comment #9 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 847524 --> http://bugzilla.opensuse.org/attachment.cgi?id=847524&action=edit Picture of the Rip 2 If you have another option on boot that I can try (it seems I've not started the -debug kernel in my last attempt, so I will redo that shortly) options used nosplash silent plymouth.enabled=0 noresume crashkernel=256M-:128M rd.vconsole.font=ter-v32b.psfu rd.vconsole.keymap=ch-fr rd.locale.LANG=en_US.UTF-8 audit=0 apparmor=0 mitigations=auto nvme_core.default_ps_max_latency_us=5500 scsi_mod.use_blk_mq=1 usb_storage.quirks=0x0bc2:0xab38:u systemd.confirm_spwan=1 single -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c25 --- Comment #25 from Takashi Iwai <tiwai@suse.com> --- Do you still face the problem with latest upstream? i.e. the sound module is still included in initrd? If yes, we need to track down the cause. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c26 --- Comment #26 from Bruno Friedmann <bruno@ioda-net.ch> --- (In reply to Takashi Iwai from comment #25)
Do you still face the problem with latest upstream? i.e. the sound module is still included in initrd? If yes, we need to track down the cause.
Hi Takashi, I've just done a new test this morning with kernel 5.12.10-1-default and I still have some snd modules added to the initrd lsinitrd | grep snd -rw-r--r-- 1 root root 7908 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd-hwdep.ko.xz -rw-r--r-- 1 root root 66888 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd-pcm.ko.xz -rw-r--r-- 1 root root 18416 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd-rawmidi.ko.xz -rw-r--r-- 1 root root 5228 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd-seq-device.ko.xz -rw-r--r-- 1 root root 19364 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd-timer.ko.xz -rw-r--r-- 1 root root 42836 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/core/snd.ko.xz -rw-r--r-- 1 root root 42912 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/hda/snd-hda-core.ko.xz -rw-r--r-- 1 root root 5680 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/hda/snd-intel-dspcfg.ko.xz -rw-r--r-- 1 root root 3336 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/hda/snd-intel-sdw-acpi.ko.xz -rw-r--r-- 1 root root 72856 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/pci/hda/snd-hda-codec.ko.xz -rw-r--r-- 1 root root 27516 Jun 15 18:32 usr/lib/modules/5.12.10-1-default/kernel/sound/pci/hda/snd-hda-intel.ko.xz I'm normally not too much annoyed actually as I used to use a dracut configuration which force it to avoid adding them /etc/dracut.conf.d/80-no-sound-modules.conf omit_drivers+=" snd_hda_intel snd_seq_device snd_hwdep snd_hda_intel snd_usb_audio snd_usbmidi_lib snd_hda_codec snd_timer snd_compress snd_soc_core snd_pcm snd_rawmidi snd_intel_dspcfg soundwire_intel soundwire_intel snd_compress snd_soc_core snd_hda_core snd_pcm_dmaengine snd_soc_core snd_compress snd_seq_device snd_hwdep snd_timer snd_pcm snd_timer snd soundcore_bus soundcore soundwire_generic_allocation soundwire_bus soundwire_cadence " As extra modules I only have nvidia, and virtualbox (all coming from openSUSE official rpm). -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c27 --- Comment #27 from Takashi Iwai <tiwai@suse.com> --- OK, then could you try to uninstall Nvidia package once, rebuild initrd and check whether initrd still contains the sound module? If it's still there, uninstall vbox once and check again. If both drivers are uninstalled but still the sound module gets dragged, some other configuration must be the cause. Maybe better to ask dracut people for taking a look. A typical cause in that case is some leftover setup for the kernel. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c28 --- Comment #28 from Bruno Friedmann <bruno@ioda-net.ch> --- (In reply to Takashi Iwai from comment #27)
OK, then could you try to uninstall Nvidia package once, rebuild initrd and check whether initrd still contains the sound module? If it's still there, uninstall vbox once and check again. If both drivers are uninstalled but still the sound module gets dragged, some other configuration must be the cause. Maybe better to ask dracut people for taking a look. A typical cause in that case is some leftover setup for the kernel.
Hello Takashi, sorry was off a few days. I've tried by uninstalling nvidia and virtualbox, and rebuild with my dracut exclusion configuration file. snd modules are present in the initrd. I was not yet able to disconnect everything (take the laptop out of the base without any additionnal peripheral like screens, webcam etc). Perhaps the snd come now from one of those, but the laptop serve here a central datacenter with a lot of vms so no easy to get out of order to make tests (sorry) I'm pretty sure it is not nvidia, nor virtualbox, and on another computer, with both installed snd doesn't appear on initrd. :-( Hopefully the trick of using an omit_driver with dracut works, and allow me to use again the computer. So in a sense it now works for me, we just don't know why it failed .... -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c29 --- Comment #29 from Takashi Iwai <tiwai@suse.com> --- (In reply to Bruno Friedmann from comment #28)
(In reply to Takashi Iwai from comment #27)
OK, then could you try to uninstall Nvidia package once, rebuild initrd and check whether initrd still contains the sound module? If it's still there, uninstall vbox once and check again. If both drivers are uninstalled but still the sound module gets dragged, some other configuration must be the cause. Maybe better to ask dracut people for taking a look. A typical cause in that case is some leftover setup for the kernel.
Hello Takashi, sorry was off a few days.
I've tried by uninstalling nvidia and virtualbox, and rebuild with my dracut exclusion configuration file. snd modules are present in the initrd.
I was not yet able to disconnect everything (take the laptop out of the base without any additionnal peripheral like screens, webcam etc). Perhaps the snd come now from one of those, but the laptop serve here a central datacenter with a lot of vms so no easy to get out of order to make tests (sorry)
I'm pretty sure it is not nvidia, nor virtualbox, and on another computer, with both installed snd doesn't appear on initrd. :-(
Hopefully the trick of using an omit_driver with dracut works, and allow me to use again the computer. So in a sense it now works for me, we just don't know why it failed ....
I guess we have to check the dracut behavior step by step. My wild guess is some leftover configuration or setup file that matters; e.g. /usr/lib/modules-loaded.d/* or /etc/modules-loaded.d/*, also some /etc/sysconfig/*, whatever... -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1183839 http://bugzilla.opensuse.org/show_bug.cgi?id=1183839#c30 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #30 from Bruno Friedmann <bruno@ioda-net.ch> --- (In reply to Takashi Iwai from comment #29)
I guess we have to check the dracut behavior step by step. My wild guess is some leftover configuration or setup file that matters; e.g. /usr/lib/modules-loaded.d/* or /etc/modules-loaded.d/*, also some /etc/sysconfig/*, whatever...
OMG I found a culprit in /etc/modules-load.d/ -rw-r--r-- 1 root root 14 Jan 24 2018 yast.conf cat /etc/modules-load.d/yast.conf snd-hda-intel Bingo! Removing this one just kill the problem, no more snd module in initrd You already know that, but you're an ace!!! I'm just closing now this report. Many����� Thanks -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com