[Bug 1179632] New: Oops in handle_cmd_completion+0x7e9/0x1120
http://bugzilla.opensuse.org/show_bug.cgi?id=1179632 Bug ID: 1179632 Summary: Oops in handle_cmd_completion+0x7e9/0x1120 Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: jeffm@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 844147 --> http://bugzilla.opensuse.org/attachment.cgi?id=844147&action=edit dmesg.txt I have a UAS device that disconnects occasionally, causing a mess of I/O failures, SCSI disconnects, and USB resets. Eventually, it oopses: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD db5b067 P4D db5b067 PUD db59067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 7 PID: 16121 Comm: duplicacy Kdump: loaded Not tainted 5.3.18-lp152.50-default #1 openSUSE Leap 15.2 Hardware name: System manufacturer System Product Name/PRIME Z370-A II, BIOS 1202 08/15/2019 RIP: 0010:handle_cmd_completion+0x7e9/0x1120 [xhci_hcd] Code: 0f 85 ec 06 00 00 45 0f b7 6c 24 0e 8b 04 24 48 89 ef 41 83 e5 1f 41 8d 55 ff 4c 8d 6c c5 00 49 8b 85 98 01 00 00 89 54 24 10 <48> 8b 70 08 e8 de 82 ff ff 48 89 c1 0f 1f 44 00 00 89 d9 48 c7 c2 RSP: 0000:ffffb20280270e28 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 000000000000000b RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000fffffffe0 RDI: ffff90ac0764e270 RBP: ffff90ac0764e270 R08: ffff90ab3f842a70 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff90ab3f840810 R13: ffff90ac0764e2c8 R14: ffff90ae07eee920 R15: ffff90ac0764e320 FS: 00007f2145068700(0000) GS:ffff90b34edc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000004c9e006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? try_to_wake_up+0x460/0x4d0 xhci_irq+0x2c7/0x440 [xhci_hcd] ? run_timer_softirq+0x75/0x440 __handle_irq_event_percpu+0x46/0x1a0 handle_irq_event_percpu+0x30/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x9b/0x1e0 handle_irq+0x1f/0x30 do_IRQ+0x49/0xd0 common_interrupt+0xf/0xf </IRQ> The RIP resolves to drivers/usb/host/xhci-ring.c:1157 1148 static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id, 1149 union xhci_trb *trb, u32 cmd_comp_code) 1150 { 1151 struct xhci_virt_device *vdev; 1152 struct xhci_ep_ctx *ep_ctx; 1153 unsigned int ep_index; 1154 1155 ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3])); 1156 vdev = xhci->devs[slot_id]; 1157 ep_ctx = xhci_get_ep_ctx(xhci, vdev->out_ctx, ep_index); 1158 trace_xhci_handle_cmd_reset_ep(ep_ctx); 1159 1160 /* This command will only fail if the endpoint wasn't halted, 1161 * but we don't care. 1162 */ 1163 xhci_dbg_trace(xhci, trace_xhci_dbg_reset_ep, 1164 "Ignoring reset ep completion code of %u", cmd_comp_code); 1165 I expect that xhci->devs[slot_id] is NULL and is getting cleared via xhci_free_virt_device when the device disappears. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1179632 http://bugzilla.opensuse.org/show_bug.cgi?id=1179632#c1 --- Comment #1 from Jeff Mahoney <jeffm@suse.com> --- Unfortunately, the kdump environment OOMed while capturing the vmcore so all I have is the dmesg. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1179632 http://bugzilla.opensuse.org/show_bug.cgi?id=1179632#c2 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@suse.com, | |tiwai@suse.com Flags| |needinfo?(jeffm@suse.com) --- Comment #2 from Takashi Iwai <tiwai@suse.com> --- As a mandatory question: is it a regression from the earlier releases? If yes, which was the last working kernel? Through a quick glance, dmesg showed quite lots of other errors beforehand, e.g. [28884.413438] sd 2:0:0:0: [sdz] tag#25 data cmplt err -32 uas-tag 22 inflight: CMD [28884.413443] sd 2:0:0:0: [sdz] tag#25 CDB: Read(16) 88 00 00 00 00 00 6b 8e 0b 00 00 00 01 00 00 00 but those don't seem always leading to the Oops. Do such errors happen occasionally in the past, too? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1179632 http://bugzilla.opensuse.org/show_bug.cgi?id=1179632#c3 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jeffm@suse.com) | --- Comment #3 from Jeff Mahoney <jeffm@suse.com> --- I don���t know if it���s a regression. I don���t use these devices often and I started using them this week to migrate an array to larger disks so they���re seeing heavy I/O activity. I don���t expect the SCSI errors are necessarily part of it. I think it���s the usb device disconnect in combination with the usb-level errors that are causing a use-after-free. This is also not the most recent kernel. I���ve installed the latest update, though nothing stood out in the changelog that looked like a fix. I���ve also adjusted the memory reserved for kdump to (hopefully) capture a dump next time. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com