[Bug 1167793] New: kernel crash when watching go to meeting L&L
http://bugzilla.suse.com/show_bug.cgi?id=1167793 Bug ID: 1167793 Summary: kernel crash when watching go to meeting L&L Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: jreidinger@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Am Donnerstag, den 26.03.2020, 13:06 +0100 schrieb Jiri Slaby:
On 26. 03. 20, 12:56, josef Reidinger wrote:
OK, reproduced. Now I have vmcore, which I can upload somewhere and also more detailed info in crash dmesg:
[ 94.326703] show_signal_msg: 34 callbacks suppressed [ 94.326706] chrome[4992]: segfault at 39 ip 000055b2354eca04 sp 00007ffd8aab1af0 error 6 in chrome[55b2351f1000+7287000] [ 94.326712] Code: cc cc cc cc cc cc 55 48 89 e5 bf 08 00 00 00 e8 62 ed 73 02 48 8d 0d 3b bb 03 07 48 89 08 5d c3 cc cc cc cc cc cc 55 48 89 e5 <c6> 04 25 39 00 00 00 21 5d c3 cc cc 55 48 89 e5 b8 01 00 00 00 48 [ 94.453075] BUG: unable to handle page fault for address: ffff8ca8947d8000 [ 94.453081] #PF: supervisor write access in kernel mode [ 94.453084] #PF: error_code(0x0002) - not-present page [ 94.453086] PGD 59e01067 P4D 59e01067 PUD 3b438c063 PMD 3947d3063 PTE 0 [ 94.453095] Oops: 0002 [#1] SMP NOPTI [ 94.453101] CPU: 6 PID: 0 Comm: swapper/6 Kdump: loaded Tainted: P C OE 5.5.9-1-default #1 openSUSE Tumbleweed (unreleased) [ 94.453103] Hardware name: ASUSTeK COMPUTER INC. TUF Gaming FX705DT_FX705DT/FX705DT, BIOS FX705DT.308 09/19/2019 [ 94.453111] RIP: 0010:__memset+0x24/0x30 [ 94.453115] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 [ 94.453117] RSP: 0018:ffffa208802f0898 EFLAGS: 00010216 [ 94.453120] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 000000001ff1a800 [ 94.453122] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff8ca8947d7ffa [ 94.453124] RBP: ffff8ca916f36800 R08: ffff8ca8d612e800 R09: ffff8ca8940ac002 [ 94.453126] R10: ffff8ca8940a8000 R11: 0000000000000568 R12: ffff8ca917039ec0 [ 94.453128] R13: 000000000000000c R14: 00000000304d434e R15: ffff8ca90835a800 [ 94.453131] FS: 0000000000000000(0000) GS:ffff8ca936d80000(0000) knlGS:0000000000000000 [ 94.453133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 94.453135] CR2: ffff8ca8947d8000 CR3: 00000003aca6a000 CR4: 00000000003406e0 [ 94.453137] Call Trace: [ 94.453140] <IRQ> [ 94.453150] cdc_ncm_fill_tx_frame+0x597/0x700 [cdc_ncm]
It is a long standing issue. There is a guy reporting this on 5.2: https://lists.zx2c4.com/pipermail/wireguard/2019-August/004386.html
I would try reporting it upstream. And upstream is ... Oliver Neukum. CCed.
You should perhaps create a bug too.
Please make a bugzilla entry, enable dynamic debugging for module cdc_ncm, reload it and please reproduce. Regards Oliver -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c1 Josef Reidinger <jreidinger@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |oneukum@suse.com Flags| |needinfo?(oneukum@suse.com) --- Comment #1 from Josef Reidinger <jreidinger@suse.com> --- Oliver: 1. how to enable dynamic debugging for that module? 2. I reproduce it when using GTM L&L. Usual GTM calls does not cause segfault, so I am not sure when I can reproduce it. But I can let it running this way and if it segfault I would update this bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c2 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jreidinger@suse.com Flags|needinfo?(oneukum@suse.com) |needinfo?(jreidinger@suse.c | |om) --- Comment #2 from Oliver Neukum <oneukum@suse.com> --- (In reply to Josef Reidinger from comment #1)
Oliver:
1. how to enable dynamic debugging for that module?
echo "module acm_ncm +fmp" > /sys/kernel/debug/dynamic_debug/control and ramp up logging level to maximum result is in dmesg
2. I reproduce it when using GTM L&L. Usual GTM calls does not cause segfault, so I am not sure when I can reproduce it. But I can let it running this way and if it segfault I would update this bug.
Then give full dmesg without a crash. I need the setup for your device and will assume it was the same for teh crash. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c3 --- Comment #3 from Oliver Neukum <oneukum@suse.com> --- (In reply to Oliver Neukum from comment #2)
(In reply to Josef Reidinger from comment #1)
Oliver:
1. how to enable dynamic debugging for that module?
echo "module acm_ncm +fmp" > /sys/kernel/debug/dynamic_debug/control
Sorry cdc_ncm -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c4 Michal Hocko <mhocko@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mhocko@suse.com --- Comment #4 from Michal Hocko <mhocko@suse.com> --- FWIW and so that it doesn't get lost in the mailing list: All code ======== 0: cc int3 1: cc int3 2: cc int3 3: cc int3 4: cc int3 5: cc int3 6: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) b: 49 89 f9 mov %rdi,%r8 e: 48 89 d1 mov %rdx,%rcx 11: 83 e2 07 and $0x7,%edx 14: 48 c1 e9 03 shr $0x3,%rcx 18: 40 0f b6 f6 movzbl %sil,%esi 1c: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax 23: 01 01 01 26: 48 0f af c6 imul %rsi,%rax 2a: f3 48 ab rep stos %rax,%es:*(%rdi) <-- trapping instruction 2d: 89 d1 mov %edx,%ecx 2f: f3 aa rep stos %al,%es:(%rdi) 31: 4c 89 c8 mov %r9,%rax 34: c3 retq 35: 90 nop 36: 49 89 f9 mov %rdi,%r9 39: 40 88 f0 mov %sil,%al 3c: 48 89 d1 mov %rdx,%rcx 3f: f3 repz
[ 94.453117] RSP: 0018:ffffa208802f0898 EFLAGS: 00010216 [ 94.453120] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 000000001ff1a800 [ 94.453122] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff8ca8947d7ffa [ 94.453124] RBP: ffff8ca916f36800 R08: ffff8ca8d612e800 R09: ffff8ca8940ac002
Interesting. The pointer to memory to initialize was in rdi with a copy to r8. The number of bytes was initially in rdx but a copy is in rcx (later multiplied by 8) and here we can see the problem already. I doubt that the buffer was really ~512MB large For the completeness the crash happens while trying to store 0 to the address at rdi which is really far away from the given address. I didn't get to look at the call path but it is likely that a bogus buffer length is either provided by the HW and not being checked properly or it has been miscalculated on the way. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c5 Josef Reidinger <jreidinger@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jreidinger@suse.c | |om) | --- Comment #5 from Josef Reidinger <jreidinger@suse.com> --- Created attachment 833972 --> http://bugzilla.suse.com/attachment.cgi?id=833972&action=edit dmesg I am not sure, if I do it correctly as it is first time I debug kernel oops. I send that echo and then grab output from dmesg with `dmesg --level debug,info,notice,warn,err,crit,alert,emerg` and output is attached here. I also try to reproduce it, but failed to do so. It is my own machine, so it is not problem to provide more output. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c6 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jreidinger@suse.c | |om) --- Comment #6 from Oliver Neukum <oneukum@suse.com> --- This looks like the device may provide a bogus wNtbOutMaxDatagrams. Ufortunately this is queried by a request from the device and not part of the interface descriptors, so it will have to be yielded in a debug output. Something is eating debug output. Can you please recheck? It is possible that your device is one of the Huawei devices that do not reliably work in 16 bit mode. In that case there is an upstream patch for support of 32 bit mode. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c7 Josef Reidinger <jreidinger@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jreidinger@suse.c |needinfo?(oneukum@suse.com) |om) | --- Comment #7 from Josef Reidinger <jreidinger@suse.com> --- (In reply to Oliver Neukum from comment #6)
This looks like the device may provide a bogus wNtbOutMaxDatagrams. Ufortunately this is queried by a request from the device and not part of the interface descriptors, so it will have to be yielded in a debug output. Something is eating debug output. Can you please recheck?
It is possible that your device is one of the Huawei devices that do not reliably work in 16 bit mode. In that case there is an upstream patch for support of 32 bit mode.
Hi Oliver, I worry I need a bit more detailed instruction how to do it. Link is enough. I try to google it but failed. Here is detailed lsusb output for this huawei device: lsusb -D /dev/bus/usb/001/006 Device: ID 12d1:1506 Huawei Technologies Co., Ltd. Modem/Networkcard Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x12d1 Huawei Technologies Co., Ltd. idProduct 0x1506 Modem/Networkcard bcdDevice 1.02 iManufacturer 3 HUAWEI iProduct 2 HUAWEI Mobile iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x00ce bNumInterfaces 6 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 2 bInterfaceProtocol 1 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 5 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 2 bInterfaceProtocol 22 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 06 24 1a 00 01 1f ** UNRECOGNIZED: 0d 24 0f 01 05 00 00 00 ea 05 03 00 01 ** UNRECOGNIZED: 05 24 06 01 01 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 5 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 1 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 2 bInterfaceProtocol 22 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 5 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 2 bInterfaceProtocol 3 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 2 bInterfaceProtocol 2 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x86 EP 6 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x04 EP 4 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 4 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x05 EP 5 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x87 EP 7 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 5 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x06 EP 6 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x88 EP 8 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 can't get debug descriptor: Resource temporarily unavailable Device Status: 0x0000 (Bus Powered) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c8 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(oneukum@suse.com) | --- Comment #8 from Oliver Neukum <oneukum@suse.com> --- Sorry for being unclear. A potential fix is in upstream 0fa81b304a7973a499f844176ca031109487dd31 Could you test it? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c9 --- Comment #9 from Josef Reidinger <jreidinger@suse.com> --- (In reply to Oliver Neukum from comment #8)
Sorry for being unclear. A potential fix is in upstream 0fa81b304a7973a499f844176ca031109487dd31
Could you test it?
yes, will try. Sadly I do not see that crash outside of that L&L session. So will try it when such large scale LL again happen. Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jslaby@suse.com Flags| |needinfo?(jreidinger@suse.c | |om) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1167793 http://bugzilla.suse.com/show_bug.cgi?id=1167793#c10 Josef Reidinger <jreidinger@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WORKSFORME Flags|needinfo?(jreidinger@suse.c | |om) | --- Comment #10 from Josef Reidinger <jreidinger@suse.com> --- Well, I do not see crash and I need to return that HW device. So cannot add more observation. Lets close it. -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com