[Bug 1028966] New: Kernel oops when connecting ethernet cable to USB-C ethernet card
http://bugzilla.suse.com/show_bug.cgi?id=1028966 Bug ID: 1028966 Summary: Kernel oops when connecting ethernet cable to USB-C ethernet card Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: oneukum@suse.com Reporter: rbrown@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 717094 --> http://bugzilla.suse.com/attachment.cgi?id=717094&action=edit picture of the oops System: Dell Precision Workstation 5510 (SUSE Corporate Laptop) Operating System: Tumbleweed 20170308 Kernel: 4.10.1 This laptop does not have a wired ethernet port on the chassis, instead is supplied with a USB-C Ethernet card. This card appears to use the r8152 module. This Ethernet card works perfectly fine as long as it is present during system boot. Detaching and reattaching the card has no negative effect to the system. The Ethernet cable can be connected and disconnected from the card with no negative effect to the system. The problem comes ONLY in when the system is booted WITHOUT the USB-C device connected at boot. In this case, the card can be connected and disconnected from the system without problems, but as soon as an ethernet cable is also connected to the card, the Kernel panics. Photo of the panic is attached - no kdump sadly as kdump is broken on Tumbleweed it seems. In the followup comments I will attach two logs from dmesg on the system. dmesg.log shows the full output of the system, from boot to panic at timecode 369.0 onwards I connected the USB-C Ethernet card. You can see an ACPI Error followed by what seems to me to be the expected PCI, USB, and r8152 messages related to the device being connected at timecode 406.0 onwards I then connected an ethernet cable to the USB-C Ethernet card. You can see what looks to me like an error disconnecting usb 4-1 (the r8152 device) followed immediately by what looks like a successful attempt to disconnect usb 3-1 (the hub which seems to be internal to the USB-C dongle), releasing it's psi_bus's just as the kernel explodes -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c1 --- Comment #1 from Richard Brown <rbrown@suse.com> --- Created attachment 717095 --> http://bugzilla.suse.com/attachment.cgi?id=717095&action=edit dmesg log -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c2 --- Comment #2 from Richard Brown <rbrown@suse.com> --- Created attachment 717096 --> http://bugzilla.suse.com/attachment.cgi?id=717096&action=edit dmesg log grep ACPI This is a log of "dmesg | grep ACPI" just to highlight all the ACPI errors that happen as the system boots up - this makes me think the ACPI errors that occur when the USB-C device is connected might not be related to the oops -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 Richard Brown <rbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sebastian.chlad@novell.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 Martin Pluskal <mpluskal@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mpluskal@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c3 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rbrown@suse.com Flags| |needinfo?(rbrown@suse.com) --- Comment #3 from Oliver Neukum <oneukum@suse.com> --- This is a type C-plug. Plugging anything into that creates the full hotplug cycle: USB, Thunderbolt and DisplayPort. It is unclear what of that is broken on your device. Please provide the log of the working case. 1. boot with device 2. unplug the device -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c4 Richard Brown <rbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(rbrown@suse.com) | --- Comment #4 from Richard Brown <rbrown@suse.com> --- Created attachment 717267 --> http://bugzilla.suse.com/attachment.cgi?id=717267&action=edit dmesg of working system, boot and disconnect included log, showing a full boot and the device being unplugged will also include another log showing it being connected again (as originally reported, it doesn't crash in this case) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c5 --- Comment #5 from Richard Brown <rbrown@suse.com> --- Created attachment 717268 --> http://bugzilla.suse.com/attachment.cgi?id=717268&action=edit dmesg of working system, boot and disconnect and reconnect -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c6 Oliver Kurz <okurz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |okurz@suse.com --- Comment #6 from Oliver Kurz <okurz@suse.com> --- https://bugzilla.suse.com/show_bug.cgi?id=1029634 might be related -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c7 Richard Brown <rbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |oneukum@suse.com Flags| |needinfo?(oneukum@suse.com) --- Comment #7 from Richard Brown <rbrown@suse.com> --- oliver, the plot thickens I just had the identical behaviour happen with a different USB 3.0 *A-Type* ethernet adapter It's also an r8152 chipset, but this means we can rule out anything USB-C specific I guess Wanna borrow the adapter? :) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED Flags|needinfo?(oneukum@suse.com) | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c9 Richard Palethorpe <richard.palethorpe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |richard.palethorpe@suse.com --- Comment #9 from Richard Palethorpe <richard.palethorpe@suse.com> --- I am now on kernel 4.10.5 with the same laptop and port replicator. My problem is different to Mr Brown's, if I have the USB-C (Thunderbolt?) port replicator plugged in at boot the laptop freezes once it reaches the login screen. The Caps Lock key starts slowly pulsating, then after a while it restarts. There is no Ooops message in the kernel log. If I leave the replicator unplugged until the laptop has completed booting up then plug it in, there is no crash, however it seems the laptop's keyboard and trackpad then stop working. Occasionally the ethernet device stops working and disappears. I will upload the log of a boot which froze and one which did not. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c10 --- Comment #10 from Richard Palethorpe <richard.palethorpe@suse.com> --- Created attachment 719729 --> http://bugzilla.suse.com/attachment.cgi?id=719729&action=edit dmesg freeze after boot -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c11 --- Comment #11 from Richard Palethorpe <richard.palethorpe@suse.com> --- Created attachment 719730 --> http://bugzilla.suse.com/attachment.cgi?id=719730&action=edit Boot without freeze (but no laptop keyboard/trackpad) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c12 --- Comment #12 from Richard Palethorpe <richard.palethorpe@suse.com> --- Created attachment 719784 --> http://bugzilla.suse.com/attachment.cgi?id=719784&action=edit ACPI AML/ASL Disassembly Seeing as there are so many ACPI errors I have included a dump of the disassembled hardware description tables. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c13 --- Comment #13 from Richard Palethorpe <richard.palethorpe@suse.com> --- The following errors seen when plugging the port replicator in, firstly [ 105.425932] xhci_hcd 0000:3e:00.0: xHCI Host Controller [ 105.425934] xhci_hcd 0000:3e:00.0: new USB bus registered, assigned bus number 4 [ 105.425988] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 [ 105.425989] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 105.425990] usb usb4: Product: xHCI Host Controller [ 105.425991] usb usb4: Manufacturer: Linux 4.10.5-1-default xhci-hcd [ 105.425992] usb usb4: SerialNumber: 0000:3e:00.0 [ 105.426158] hub 4-0:1.0: USB hub found [ 105.426174] hub 4-0:1.0: 2 ports detected [ 105.783147] usb 4-1: Device not responding to setup address. [ 110.448469] xhci_hcd 0000:3e:00.0: remove, state 1 [ 110.448476] usb usb4: USB disconnect, device number 1 [ 111.081323] xhci_hcd 0000:3e:00.0: Stopped the command ring failed, maybe the host is dead [ 111.081335] xhci_hcd 0000:3e:00.0: Host halt failed, -19 [ 111.081336] xhci_hcd 0000:3e:00.0: Abort command ring failed [ 111.081340] xhci_hcd 0000:3e:00.0: Timeout while waiting for setup device command [ 111.081343] xhci_hcd 0000:3e:00.0: HC died; cleaning up [ 111.288351] usb 4-1: device not accepting address 2, error -62 [ 111.288370] usb usb4-port1: couldn't allocate usb_device [ 111.288562] xhci_hcd 0000:3e:00.0: Host halt failed, -19 [ 111.288571] xhci_hcd 0000:3e:00.0: Host not accessible, reset failed. [ 111.288573] xhci_hcd 0000:3e:00.0: USB bus 4 deregistered [ 111.288579] xhci_hcd 0000:3e:00.0: remove, state 4 This says to me that the host controller is taking too long to respond. I would guess this is due to a stall rather than it being put in the wrong state. So maybe increasing the timeout might fix it? I am guessing from the number of ports and the USB number it gets that this is the controller the ethernet adapter is connected to. To see if this error always happens I reconnected the port replicator again [ +0.000065] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 [ +0.000004] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ +0.000003] usb usb4: Product: xHCI Host Controller [ +0.000009] usb usb4: Manufacturer: Linux 4.10.5-1-default xhci-hcd [ +0.000003] usb usb4: SerialNumber: 0000:3e:00.0 [ +0.000253] hub 4-0:1.0: USB hub found [ +0.000022] hub 4-0:1.0: 2 ports detected [ +0.328565] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd [ +0.040609] usb 4-1: device descriptor read/8, error -71 [ +0.107134] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd [ +4.536137] xhci_hcd 0000:3e:00.0: remove, state 1 [ +0.000008] usb usb4: USB disconnect, device number 1 [ +0.727842] usb 4-1: device descriptor read/8, error -110 [ +5.119974] xhci_hcd 0000:3e:00.0: Error while assigning device slot ID [ +0.000015] xhci_hcd 0000:3e:00.0: Max number of devices this xHCI host supports is 64. [ +0.000008] usb usb4-port1: couldn't allocate usb_device [ +0.001545] xhci_hcd 0000:3e:00.0: USB bus 4 deregistered [ +0.000023] xhci_hcd 0000:3e:00.0: remove, state 4 Different error, but looking at the code and guessing a bit I think it could also be caused by a stall. It seems that the system retries and the device begins working. There don't seem to be any similar looking errors in rbrown's logs. Possibly we have different firmware installed on laptop and port replicator because rbrown's is at least half a year newer. There are firmware updates available, but currently there is only an update utility which runs on Windows. Allegedly DELL are working on a Linux version. The port is Thunderbolt 3 which uses USB-C. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c14 --- Comment #14 from Oliver Neukum <oneukum@suse.com> --- Testing with v4.11-rc2 -> adapter works -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c15 --- Comment #15 from Richard Brown <rbrown@suse.com> --- (In reply to Oliver Neukum from comment #14)
Testing with v4.11-rc2 -> adapter works
I can also confirm that the original reproduction steps no longer work on Kernel 4.10.8 (Latest TW) HOWEVER, this bug is not fixed, just less painful. If I now boot up without any r8152 USB devices connected, then connect TWO r8152 devices, plugging the ethernet cable into EITHER of them causes a kernel panic as before So while something has clearly improved in recent kernels, there's still something buggy lurking in there. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c17 Oliver Neukum <oneukum@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #17 from Oliver Neukum <oneukum@suse.com> --- Fix gone in through stable. Please reopen if it does not fix the issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1028966 http://bugzilla.suse.com/show_bug.cgi?id=1028966#c18 Richard Palethorpe <richard.palethorpe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #18 from Richard Palethorpe <richard.palethorpe@suse.com> --- I am still suffering random crashes in 4.10.12-1 Tumbleweed. I am not sure whether you are saying it is only fixed in 4.11. DMESG log from most recent crash: [May10 11:03] ------------[ cut here ]------------ [ +0.000027] WARNING: CPU: 3 PID: 0 at ../net/sched/sch_generic.c:316 dev_watchdog+0x229/0x230 [ +0.000005] NETDEV WATCHDOG: enp62s0u1u2 (r8152): transmit queue 0 timed out [ +0.000004] Modules linked in: binfmt_misc dm_mod iscsi_ibft iscsi_boot_sysfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables fuse af_packet cdc_ether usbnet snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device r8152 mii msr uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev btusb btrtl hid_multitouch nls_iso8859_1 nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi xfs dell_rbtn libcrc32c dell_laptop kvm iTCO_wdt dell_led i2c_designware_platform dell_wmi arc4 irqbypass [ +0.000067] mei_wdt iTCO_vendor_support i2c_designware_core dell_smbios crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek pcbc dcdbas snd_hda_codec_generic dell_smm_hwmon iwlmvm intel_hid sparse_keymap joydev mac80211 snd_hda_intel snd_hda_codec snd_hda_core aesni_intel snd_hwdep idma64 snd_pcm rtsx_pci_ms aes_x86_64 crypto_simd iwlwifi glue_helper snd_timer pcspkr virt_dma i2c_i801 cryptd snd soundcore memstick cfg80211 mei_me hci_uart mei btbcm thermal btqca intel_pch_thermal intel_lpss_pci btintel fan processor_thermal_device intel_soc_dts_iosf bluetooth rfkill int3403_thermal dell_smo8800 acpi_als pinctrl_sunrisepoint kfifo_buf pinctrl_intel industrialio tpm_tis tpm_tis_core int3402_thermal intel_lpss_acpi intel_lpss tpm int340x_thermal_zone shpchp fjes int3400_thermal [ +0.000065] acpi_pad battery acpi_thermal_rel ac btrfs xor hid_generic usbhid raid6_pq rtsx_pci_sdmmc mmc_core nouveau mxm_wmi ttm i915 i2c_algo_bit crc32c_intel drm_kms_helper syscopyarea xhci_pci sysfillrect sysimgblt serio_raw fb_sys_fops rtsx_pci xhci_hcd mfd_core usbcore drm i2c_hid wmi video button sg efivarfs [ +0.000036] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.10.12-1-default #1 [ +0.000003] Hardware name: Dell Inc. Precision 5510/0N8J4R, BIOS 01.02.00 04/07/2016 [ +0.000002] Call Trace: [ +0.000021] <IRQ> [ +0.000012] dump_stack+0x5c/0x7a [ +0.000008] __warn+0xbe/0xe0 [ +0.000010] warn_slowpath_fmt+0x4f/0x60 [ +0.000011] ? enqueue_task_fair+0x84/0x680 [ +0.000006] dev_watchdog+0x229/0x230 [ +0.000008] ? qdisc_rcu_free+0x40/0x40 [ +0.000004] ? qdisc_rcu_free+0x40/0x40 [ +0.000007] call_timer_fn+0x2e/0x160 [ +0.000006] ? qdisc_rcu_free+0x40/0x40 [ +0.000006] run_timer_softirq+0x223/0x4c0 [ +0.000009] ? tick_sched_handle.isra.15+0x20/0x50 [ +0.000004] ? tick_sched_timer+0x38/0x70 [ +0.000005] __do_softirq+0x105/0x2e5 [ +0.000008] irq_exit+0xae/0xb0 [ +0.000006] smp_apic_timer_interrupt+0x39/0x50 [ +0.000006] apic_timer_interrupt+0x82/0x90 [ +0.000003] </IRQ> [ +0.000006] ? cpuidle_enter_state+0x11a/0x2e0 [ +0.000004] ? cpuidle_enter_state+0x107/0x2e0 [ +0.000005] ? do_idle+0x17a/0x1f0 [ +0.000004] ? cpu_startup_entry+0x5d/0x60 [ +0.000006] ? start_secondary+0x144/0x170 [ +0.000004] ? start_cpu+0x14/0x14 [ +0.000005] ---[ end trace 932bf4f661e7ad08 ]--- [ +0.000008] r8152 4-1.2:1.0 enp62s0u1u2: Tx timeout [ +1.035551] irq 16: nobody cared (try booting with the "irqpoll" option) [ +0.000003] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 4.10.12-1-default #1 [ +0.000001] Hardware name: Dell Inc. Precision 5510/0N8J4R, BIOS 01.02.00 04/07/2016 [ +0.000001] Call Trace: [ +0.000006] <IRQ> [ +0.000004] dump_stack+0x5c/0x7a [ +0.000002] __report_bad_irq+0x30/0xc0 [ +0.000002] note_interrupt+0x23b/0x280 [ +0.000003] handle_irq_event_percpu+0x41/0x50 [ +0.000002] handle_irq_event+0x37/0x60 [ +0.000002] handle_fasteoi_irq+0x9d/0x170 [ +0.000005] handle_irq+0x19/0x30 [ +0.000002] do_IRQ+0x41/0xc0 [ +0.000002] common_interrupt+0x82/0x82 [ +0.000001] </IRQ> [ +0.000002] ? cpuidle_enter_state+0x11a/0x2e0 [ +0.000001] ? cpuidle_enter_state+0x107/0x2e0 [ +0.000001] ? do_idle+0x17a/0x1f0 [ +0.000001] ? cpu_startup_entry+0x5d/0x60 [ +0.000002] ? start_secondary+0x144/0x170 [ +0.000001] ? start_cpu+0x14/0x14 [ +0.000001] handlers: [ +0.000003] [<ffffffffc0b9a910>] idma64_irq [idma64] [ +0.000001] [<ffffffffc0b43b20>] i801_isr [i2c_i801] [ +0.000002] [<ffffffffc083f140>] i2c_dw_isr [i2c_designware_core] [ +0.000001] Disabling IRQ #16 [ +4.083989] xhci_hcd 0000:3e:00.0: xHCI host not responding to stop endpoint command. [ +0.000009] xhci_hcd 0000:3e:00.0: Assuming host is dying, halting host. [ +0.000093] r8152 4-1.2:1.0 enp62s0u1u2: Tx status -108 [ +0.000008] r8152 4-1.2:1.0 enp62s0u1u2: Tx status -108 [ +0.000005] r8152 4-1.2:1.0 enp62s0u1u2: Tx status -108 [ +0.000006] r8152 4-1.2:1.0 enp62s0u1u2: Tx status -108 [ +0.000044] xhci_hcd 0000:3e:00.0: HC died; cleaning up [ +0.000022] r8152 4-1.2:1.0 enp62s0u1u2: Tx timeout [ +0.000062] usb 3-1: USB disconnect, device number 2 [ +0.000006] usb 3-1.5: USB disconnect, device number 3 [ +0.000318] usb 4-1: USB disconnect, device number 2 [ +0.000008] usb 4-1.2: USB disconnect, device number 3 [ +0.000982] usb 3-1.6: USB disconnect, device number 4 -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com