[Bug 1131976] New: e1000e shows Hardware Unit Hang on fast connections with high network load
http://bugzilla.suse.com/show_bug.cgi?id=1131976 Bug ID: 1131976 Summary: e1000e shows Hardware Unit Hang on fast connections with high network load Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: x86-64 OS: SUSE Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: werner@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- This seems to known in the community but I'd like to report this here as this morning my workstation was not accessible over network. After replugging the network wire the driver had reset all. After this I've changed the module paramter for the module e1000e to InterruptThrottleRate=2000 SmartPowerDownEnable=1 and reload the module (it is not marked as busy). [Tue Apr 9 14:03:15 2019] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <51> TDT <6c> next_to_use <6c> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <10051dbdb> next_to_watch <51> jiffies <10051ddd8> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [Tue Apr 9 14:03:17 2019] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <51> TDT <6c> next_to_use <6c> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <10051dbdb> next_to_watch <51> jiffies <10051dfc8> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [Tue Apr 9 14:03:19 2019] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly [Tue Apr 9 14:03:22 2019] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 2 Transceiver: internal Auto-negotiation: on MDI-X: off (auto) Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bpoirier@suse.com, | |dchang@suse.com, | |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c1 --- Comment #1 from Dr. Werner Fink <werner@suse.com> --- This seems to help somehow, at least the messages had disapeared since I've applied the command: ethtool -K eth0 tso off gso off gro off with tso: TCP segmentation offload gso: Generic segmentation offload gro: Generic receive offload will watch out if the messages will occur again -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c2 --- Comment #2 from Dr. Werner Fink <werner@suse.com> --- Currently no reset had happen since the ethtool -K line lspci -s :19.0 -vvv 00:19.0 Ethernet controller: Intel Corporation 82567LM-2 Gigabit Network Connection Subsystem: Intel Corporation Device 0000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 32 Region 0: Memory at d0300000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at d0323000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 3100 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee13000 Data: 40c4 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c3 --- Comment #3 from Dr. Werner Fink <werner@suse.com> --- The workaround is still avoiding the reset of the e1000e adapter -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 Hannes Reinecke <hare@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hare@suse.com Assignee|kernel-maintainers@forge.pr |denis.kirjanov@suse.com |ovo.novell.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c4 Denis Kirjanov <denis.kirjanov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(werner@suse.com) --- Comment #4 from Denis Kirjanov <denis.kirjanov@suse.com> --- Could you please try to just switch off GRO? Thank you. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c5 --- Comment #5 from Dr. Werner Fink <werner@suse.com> --- (In reply to Denis Kirjanov from comment #4)
Could you please try to just switch off GRO?
Thank you.
You mean GRO only and TSO, GSO back to on? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c6 --- Comment #6 from Denis Kirjanov <denis.kirjanov@suse.com> --- (In reply to Dr. Werner Fink from comment #5)
(In reply to Denis Kirjanov from comment #4)
Could you please try to just switch off GRO?
Thank you.
You mean GRO only and TSO, GSO back to on?
Right -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c7 --- Comment #7 from Dr. Werner Fink <werner@suse.com> --- (In reply to Denis Kirjanov from comment #6)
(In reply to Dr. Werner Fink from comment #5)
(In reply to Denis Kirjanov from comment #4)
Could you please try to just switch off GRO?
Thank you.
You mean GRO only and TSO, GSO back to on?
Right
Done ... let's if this is stable -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1131976 http://bugzilla.suse.com/show_bug.cgi?id=1131976#c8 Dr. Werner Fink <werner@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(werner@suse.com) | --- Comment #8 from Dr. Werner Fink <werner@suse.com> --- (In reply to Dr. Werner Fink from comment #7)
(In reply to Denis Kirjanov from comment #6)
(In reply to Dr. Werner Fink from comment #5)
(In reply to Denis Kirjanov from comment #4)
Could you please try to just switch off GRO?
Thank you.
You mean GRO only and TSO, GSO back to on?
Right
Done ... let's if this is stable
Currently I do not see any problems here dmesg -T | grep e1000e [Tue Nov 19 12:57:27 2019] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [Tue Nov 19 12:57:27 2019] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [Tue Nov 19 12:57:27 2019] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to 2000 [Tue Nov 19 12:57:27 2019] e1000e 0000:00:19.0: PHY Smart Power Down Enabled [Tue Nov 19 12:57:27 2019] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:1c:c0:a4:1c:14 [Tue Nov 19 12:57:27 2019] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection [Tue Nov 19 12:57:27 2019] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 8, PBA No: FFFFFF-0FF [Tue Nov 19 12:57:40 2019] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [Tue Nov 19 14:48:07 2019] e1000e: eth0 NIC Link is Down [Tue Nov 19 14:48:10 2019] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [Wed Nov 20 10:42:31 2019] e1000e: eth0 NIC Link is Down [Wed Nov 20 10:42:34 2019] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1131976 https://bugzilla.suse.com/show_bug.cgi?id=1131976#c9 Denis Kirjanov <denis.kirjanov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Denis Kirjanov <denis.kirjanov@suse.com> --- Since the problem is gone let's close it. Please reopen if you see the issue -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com