[Bug 1208783] New: Kernel hard lockup when initializing FW sound device with jackdbus
https://bugzilla.suse.com/show_bug.cgi?id=1208783 Bug ID: 1208783 Summary: Kernel hard lockup when initializing FW sound device with jackdbus Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.4 Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: sbahling@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Ever since upgrading from leap 15.3 to 15.4 I sometimes get hard lock up when initializing my FireWire audio device via jackdbus. I enabled kdump and was able to get a core dump. Here is the back trace: crash> bt PID: 14532 TASK: ffff8d1ffab08000 CPU: 15 COMMAND: "jackdbus" #0 [fffffe0000382a10] machine_kexec at ffffffffa4a7efc3 #1 [fffffe0000382a68] __crash_kexec at ffffffffa4b80f1a #2 [fffffe0000382b28] panic at ffffffffa5430205 #3 [fffffe0000382bb0] nmi_panic at ffffffffa4aab2e8 #4 [fffffe0000382bb8] watchdog_overflow_callback at ffffffffa4bbad98 #5 [fffffe0000382bc8] __perf_event_overflow at ffffffffa4c5a07f #6 [fffffe0000382bf8] handle_pmi_common at ffffffffa4a106c0 #7 [fffffe0000382de8] intel_pmu_handle_irq at ffffffffa4a10885 #8 [fffffe0000382e40] perf_event_nmi_handler at ffffffffa4a065d4 #9 [fffffe0000382e58] nmi_handle at ffffffffa4a3eb85 #10 [fffffe0000382e98] default_do_nmi at ffffffffa5436d89 #11 [fffffe0000382eb8] exc_nmi at ffffffffa5436fb7 #12 [fffffe0000382ef0] end_repeat_nmi at ffffffffa5601658 [exception RIP: native_queued_spin_lock_slowpath+102] RIP: ffffffffa4b18676 RSP: ffffb869467ef9d8 RFLAGS: 00000002 RAX: 0000000000000101 RBX: ffff8d1ec68b4600 RCX: ffff8d1f09f649e8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1ec68b46f8 RBP: ffff8d1f09f649c0 R8: 0000000007494103 R9: 00160d7108b72c9b R10: 0000000000000000 R11: 000000010011275a R12: ffff8d1ec68b4c00 R13: ffffffffc0c11180 R14: 0000000000000001 R15: 0000000000000004 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffffb869467ef9d8] native_queued_spin_lock_slowpath at ffffffffa4b18676 #14 [ffffb869467ef9d8] _raw_spin_lock at ffffffffa544a185 #15 [ffffb869467ef9e0] snd_pcm_action_group at ffffffffc0c00c98 [snd_pcm] #16 [ffffb869467efa20] snd_pcm_action at ffffffffc0c02c57 [snd_pcm] #17 [ffffb869467efa48] __snd_pcm_xrun at ffffffffc0c0b10d [snd_pcm] #18 [ffffb869467efa88] snd_pcm_update_state at ffffffffc0c0b326 [snd_pcm] #19 [ffffb869467efa98] snd_pcm_update_hw_ptr0 at ffffffffc0c0b541 [snd_pcm] #20 [ffffb869467efb38] snd_pcm_period_elapsed_under_stream_lock at ffffffffc0c0bc07 [snd_pcm] #21 [ffffb869467efb50] process_rx_packets at ffffffffc09a2ca5 [snd_firewire_lib] #22 [ffffb869467efbc8] irq_target_callback at ffffffffc09a30c2 [snd_firewire_lib] #23 [ffffb869467efbd8] ohci_flush_iso_completions at ffffffffc04b4708 [firewire_ohci] #24 [ffffb869467efbf0] amdtp_domain_stream_pcm_pointer at ffffffffc09a1d94 [snd_firewire_lib] #25 [ffffb869467efc00] snd_pcm_update_hw_ptr0 at ffffffffc0c0b37c [snd_pcm] #26 [ffffb869467efca0] snd_pcm_status64 at ffffffffc0c064c4 [snd_pcm] #27 [ffffb869467efcd8] snd_pcm_status_user64 at ffffffffc0c06685 [snd_pcm] #28 [ffffb869467efd90] snd_pcm_common_ioctl at ffffffffc0c072f4 [snd_pcm] #29 [ffffb869467efe78] snd_pcm_ioctl at ffffffffc0c078fa [snd_pcm] #30 [ffffb869467efe80] __x64_sys_ioctl at ffffffffa4d6060f #31 [ffffb869467efeb8] do_syscall_64 at ffffffffa5435408 #32 [ffffb869467efed8] switch_fpu_return at ffffffffa4a48829 #33 [ffffb869467efef8] exit_to_user_mode_prepare at ffffffffa4b5431c #34 [ffffb869467eff10] syscall_exit_to_user_mode at ffffffffa5439698 #35 [ffffb869467eff20] do_syscall_64 at ffffffffa5435417 #36 [ffffb869467eff50] entry_SYSCALL_64_after_hwframe at ffffffffa5600099 RIP: 00007f9710e2ec27 RSP: 00007f970e3e6bb8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000561f9c64dd50 RCX: 00007f9710e2ec27 RDX: 00007f970e3e6c10 RSI: 00000000c0984124 RDI: 0000000000000009 RBP: 00007f970e3e6d10 R8: 0000000000000001 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f970e3e6c10 R13: 0000561f9c64bf88 R14: 0000561f9c64cf90 R15: 00000000ffffffff ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b crash> sys KERNEL: ./vmlinux-5.14.21-150400.24.46-default.gz DEBUGINFO: /usr/lib/debug/boot/vmlinux-5.14.21-150400.24.46-default.debug DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 28 DATE: Mon Feb 27 11:43:40 CET 2023 UPTIME: 01:20:10 LOAD AVERAGE: 0.99, 0.80, 0.74 TASKS: 1303 NODENAME: cagafuego RELEASE: 5.14.21-150400.24.46-default VERSION: #1 SMP PREEMPT_DYNAMIC Thu Feb 9 08:38:18 UTC 2023 (2d95137) MACHINE: x86_64 (2000 Mhz) MEMORY: 31.7 GB PANIC: "Kernel panic - not syncing: Hard LOCKUP" The audio device is a Tascam FW-1884 using the snd_firewire_tascam driver. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiwai@suse.com --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Do you have other threads or processes accessing the sound stuff? From the bt, it looks like a spinlock that is blocked by others by some reason. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c2 --- Comment #2 from Scott Bahling <sbahling@suse.com> --- When I start up my X session, pulseaudio is usually running and jackdbus, but neither connected to the hardware. I usually see the issue when I run "jack_control start" to start jack (which then bridges to pulse audio). Or if I stop the jack server or kill jackdbus (which sometimes hangs). What I just noticed is that pipewire was also running in the background. I was not even aware that I had it running on this system and strange because usually pipewire and pulse do not play well together. I have disabled the pipewire service now and will see if the issue seems to go away. But if you need me to reproduce let me know. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c3 --- Comment #3 from Takashi Iwai <tiwai@suse.com> --- I meant whether you can see other tasks or threads in kdump that are accessing the sound device. JACK is running with the linked PCM streams, and both playback and capture PCM streams were being accessed when the bug happened, I suppose. If my assumption is right, the other task or thread should be also spinning to take some lock while keeping the spinlock held. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c4 --- Comment #4 from Scott Bahling <sbahling@suse.com> --- I'm not that familiar with analyzing kdumps. Crash does show two active jackdbus tasks running in parallel.
14517 6347 4 ffff8d228fb3c000 RU 0.1 308228 18904 jackdbus 14532 6347 15 ffff8d1ffab08000 RU 0.1 308228 18904 jackdbus
crash> files 14517 PID: 14517 TASK: ffff8d228fb3c000 CPU: 4 COMMAND: "jackdbus" ROOT: /@/.snapshots/1/snapshot CWD: /@/.snapshots/1/snapshot/home/@/home/sbahling FD FILE DENTRY INODE TYPE PATH 0 ffff8d2260d48300 ffff8d224040f200 ffff8d2243b0a178 CHR /@/.snapshots/1/snapshot/dev/null 1 ffff8d20c70fc700 ffff8d20096e8540 ffff8d1ec0448080 SOCK UNIX 2 ffff8d20c70fc600 ffff8d1eff9838c0 ffff8d1ec044b480 SOCK UNIX 3 ffff8d1edf47df00 ffff8d1fdc92db00 ffff8d22aa82ef78 REG /@/.snapshots/1/snapshot/home/@/home/sbahling/.log/jack/jackdbus.log 4 ffff8d1edf47c400 ffff8d2092c56a80 ffff8d229666aac0 SOCK UNIX 5 ffff8d1ffdaf0d00 ffff8d1fdc908300 ffff8d2050860678 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_system 6 ffff8d1ffdaede00 ffff8d1ec6c40000 ffff8d1ed5975478 CHR /@/.snapshots/1/snapshot/dev/snd/controlC3 7 ffff8d1ffdaec800 ffff8d1fdc909500 ffff8d1fdc9e1740 SOCK UNIX 8 ffff8d1ffdaec700 ffff8d1ec6c415c0 ffff8d1ed5973378 CHR /@/.snapshots/1/snapshot/dev/snd/pcmC3D0p 9 ffff8d1ffdaecc00 ffff8d1ec6c403c0 ffff8d1ed5976378 CHR /@/.snapshots/1/snapshot/dev/snd/pcmC3D0c 10 ffff8d1ffdaec300 ffff8d1fdc908840 ffff8d1fdc9e3480 SOCK UNIX 11 ffff8d1ffdaedb00 ffff8d1fdc9098c0 ffff8d2050863378 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_freewheel 12 ffff8d1ffdaec400 ffff8d1fdc908180 ffff8d2050863978 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_dbusapi 13 ffff8d228dea2700 ffff8d2257bbfd40 ffff8d224685ebc0 SOCK UNIX crash> files 14532 PID: 14532 TASK: ffff8d1ffab08000 CPU: 15 COMMAND: "jackdbus" ROOT: /@/.snapshots/1/snapshot CWD: /@/.snapshots/1/snapshot/home/@/home/sbahling FD FILE DENTRY INODE TYPE PATH 0 ffff8d2260d48300 ffff8d224040f200 ffff8d2243b0a178 CHR /@/.snapshots/1/snapshot/dev/null 1 ffff8d20c70fc700 ffff8d20096e8540 ffff8d1ec0448080 SOCK UNIX 2 ffff8d20c70fc600 ffff8d1eff9838c0 ffff8d1ec044b480 SOCK UNIX 3 ffff8d1edf47df00 ffff8d1fdc92db00 ffff8d22aa82ef78 REG /@/.snapshots/1/snapshot/home/@/home/sbahling/.log/jack/jackdbus.log 4 ffff8d1edf47c400 ffff8d2092c56a80 ffff8d229666aac0 SOCK UNIX 5 ffff8d1ffdaf0d00 ffff8d1fdc908300 ffff8d2050860678 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_system 6 ffff8d1ffdaede00 ffff8d1ec6c40000 ffff8d1ed5975478 CHR /@/.snapshots/1/snapshot/dev/snd/controlC3 7 ffff8d1ffdaec800 ffff8d1fdc909500 ffff8d1fdc9e1740 SOCK UNIX 8 ffff8d1ffdaec700 ffff8d1ec6c415c0 ffff8d1ed5973378 CHR /@/.snapshots/1/snapshot/dev/snd/pcmC3D0p 9 ffff8d1ffdaecc00 ffff8d1ec6c403c0 ffff8d1ed5976378 CHR /@/.snapshots/1/snapshot/dev/snd/pcmC3D0c 10 ffff8d1ffdaec300 ffff8d1fdc908840 ffff8d1fdc9e3480 SOCK UNIX 11 ffff8d1ffdaedb00 ffff8d1fdc9098c0 ffff8d2050863378 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_freewheel 12 ffff8d1ffdaec400 ffff8d1fdc908180 ffff8d2050863978 REG /@/.snapshots/1/snapshot/dev/shm/jack_sem.1000_default_dbusapi 13 ffff8d228dea2700 ffff8d2257bbfd40 ffff8d224685ebc0 SOCK UNIX I guess that is not normal. On my currently running system there is only one jackdbus task running. I see the parallel jackdbus tasks in another crash dump I have. I do have problems with jackdbus hanging from time to time and I kill the current task - the service restarts a new jackdbus task. Might be that the task that has been "killed" but is really still alive when the next task starts? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c5 --- Comment #5 from Takashi Iwai <tiwai@suse.com> --- Possibly yes, but I can't say surely, either. Could you check the backtrace of another jackdbus process, too? Does it point to the similar spinlock? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1208783 https://bugzilla.suse.com/show_bug.cgi?id=1208783#c6 --- Comment #6 from Scott Bahling <sbahling@suse.com> --- I don't see a spinlock: crash> bt 14517 PID: 14517 TASK: ffff8d228fb3c000 CPU: 4 COMMAND: "jackdbus" #0 [fffffe00000f9e50] crash_nmi_callback at ffffffffa4a71853 #1 [fffffe00000f9e58] nmi_handle at ffffffffa4a3eb85 #2 [fffffe00000f9e98] default_do_nmi at ffffffffa5436d89 #3 [fffffe00000f9eb8] exc_nmi at ffffffffa5436fb7 #4 [fffffe00000f9ef0] end_repeat_nmi at ffffffffa5601658 [exception RIP: smp_call_function_many_cond+291] RIP: ffffffffa4b75833 RSP: ffffb8694d853b88 RFLAGS: 00000202 RAX: 0000000000000011 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8d263fa7af40 RBP: 000000000000000f R8: 0000000000000000 R9: 0000000000000000 R10: 000000000000000f R11: 0000000000000000 R12: ffff8d222ff34a40 R13: ffff8d222ff34a40 R14: 0000000000000000 R15: 0000000000000014 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffffb8694d853b88] smp_call_function_many_cond at ffffffffa4b75833 #6 [ffffb8694d853c00] on_each_cpu_cond_mask at ffffffffa4b75d05 #7 [ffffb8694d853c08] flush_tlb_mm_range at ffffffffa4a9563e #8 [ffffb8694d853c28] unmap_page_range at ffffffffa4cb9096 #9 [ffffb8694d853d08] unmap_vmas at ffffffffa4cb9664 #10 [ffffb8694d853d70] unmap_region at ffffffffa4cc2fe8 #11 [ffffb8694d853e30] __do_munmap at ffffffffa4cc5284 #12 [ffffb8694d853e88] __vm_munmap at ffffffffa4cc55fb #13 [ffffb8694d853ed8] __x64_sys_munmap at ffffffffa4cc56e7 #14 [ffffb8694d853ef0] do_syscall_64 at ffffffffa5435408 #15 [ffffb8694d853f08] syscall_exit_to_user_mode at ffffffffa5439698 #16 [ffffb8694d853f18] do_syscall_64 at ffffffffa5435417 #17 [ffffb8694d853f50] entry_SYSCALL_64_after_hwframe at ffffffffa5600099 RIP: 00007f9710e33147 RSP: 00007ffce7708968 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 00007f970ed10508 RCX: 00007f9710e33147 RDX: 0000000000000001 RSI: 000000000000000c RDI: 00007f9712287000 RBP: 0000561f9c63f520 R8: 0000000000000000 R9: 00000000ffffffff R10: 00007f9710d29468 R11: 0000000000000202 R12: 0000000000000002 R13: 00007ffce77091b0 R14: 00007ffce77089b0 R15: 0000561f9c64fd40 ORIG_RAX: 000000000000000b CS: 0033 SS: 002b -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com