Sun X2100 (nv)ethernet problem with SUSE 10.0
Hi Guys We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them magically stopped responding to network requests in the middle of the night (after operating perfectly for 1 month). A power down solved the problem, however a simple reboot did not! The log messages are below. Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:42:56 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:42:56 server kernel: Badness in local_bh_enable at kernel/softirq.c:140 Apr 3 11:42:56 server kernel: Apr 3 11:42:56 server kernel: Call Trace: <IRQ> <ffffffff801383c1>{local_bh_enable+49} <ffffffff881b8655>{:ip_conntrack:destroy_conntrack+53} Apr 3 11:42:56 server kernel: <ffffffff802c0c1b>{__kfree_skb+219} <ffffffff8815c29a>{:forcedeth:nv_drain_tx+138} Apr 3 11:42:56 server kernel: <ffffffff8815d3ec>{:forcedeth:nv_tx_timeout+92} <ffffffff802d5e30>{dev_watchdog+0} Apr 3 11:42:56 server kernel: <ffffffff802d5ea8>{dev_watchdog+120} <ffffffff8013c519>{run_timer_softirq+361} Apr 3 11:42:56 server kernel: <ffffffff801385d7>{__do_softirq+87} <ffffffff8010f8db>{call_softirq+31} Apr 3 11:42:56 server kernel: <ffffffff80111380>{do_softirq+48} <ffffffff801113dd>{do_IRQ+77} Apr 3 11:42:56 server kernel: <ffffffff8010eebc>{ret_from_intr+0} <EOI> <ffffffff803047a0>{udp_poll+0} Apr 3 11:42:56 server kernel: <ffffffff8010d2b0>{default_idle+0} <ffffffff88086635>{:processor:acpi_processor_idle+292} Apr 3 11:42:56 server kernel: <ffffffff8010d311>{cpu_idle+49} <ffffffff804a87aa>{start_kernel+458} Apr 3 11:42:56 server kernel: <ffffffff804a81f4>{_sinittext+500} Apr 3 11:44:02 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:44:02 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:45:07 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:45:07 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:46:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:46:13 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:47:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:47:13 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Now these servers are supposedly certified with SLES 9 (See http://www.sun.com/servers/entry/x2100/os.jsp#SUSE ) however I have not been able to get SLES 9 to install, although OES SP2 can be convinced to work with much crossing of fingers, hand holding and a serial console cable.. Even OES refuses however to recognise any of the USB keyboards we have tried which is why we run SUSE 10.0. Is anyone else have problems with the nv ethernet interface on their X2100 servers (and USB keyboards)? We have temporarily moved all traffic to the 2nd GigE interface on this server (which uses a different chipset) however this is not a long term solution. I were planning to use link bonding between the 2 interfaces for redundancy to 2 different switches however the fact the the interface doesn't come back up without a complete poweroff means that this is a dangerous bug... -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc
Hi, Peter Nixon schrieb:
We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them magically stopped responding to network requests in the middle of the night (after operating perfectly for 1 month). A power down solved the problem, however a simple reboot did not! The log messages are below.
Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Is anyone else have problems with the nv ethernet interface on their X2100 servers (and USB keyboards)?
Known problem, a fix is in the works. Note: All versions except SUSE Linux 10.0 do not have this bug. However, to not break anything, a simple down-/upgrade of the driver is not possible. Regards, Carl-Daniel -- http://www.hailfinger.org/
On Mon 03 Apr 2006 16:59, Carl-Daniel Hailfinger wrote:
Hi,
Peter Nixon schrieb:
We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them magically stopped responding to network requests in the middle of the night (after operating perfectly for 1 month). A power down solved the problem, however a simple reboot did not! The log messages are below.
Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Is anyone else have problems with the nv ethernet interface on their X2100 servers (and USB keyboards)?
Known problem, a fix is in the works. Note: All versions except SUSE Linux 10.0 do not have this bug. However, to not break anything, a simple down-/upgrade of the driver is not possible.
So what is the workaround/fix? -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc
Peter Nixon schrieb:
On Mon 03 Apr 2006 16:59, Carl-Daniel Hailfinger wrote:
Hi,
Peter Nixon schrieb:
We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them magically stopped responding to network requests in the middle of the night (after operating perfectly for 1 month). A power down solved the problem, however a simple reboot did not! The log messages are below.
Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Is anyone else have problems with the nv ethernet interface on their X2100 servers (and USB keyboards)?
Known problem, a fix is in the works. Note: All versions except SUSE Linux 10.0 do not have this bug. However, to not break anything, a simple down-/upgrade of the driver is not possible.
So what is the workaround/fix?
Still investigating. Downgrading the driver breaks support for some chipsets, upgrading the driver to a halfway recent version has other risks (new bugs). I'm favoring a downgrade, but that will affect exactly those users who need the bugfix most. So that's not really an option. forcedeth from Linux 2.6.15 may be a safe choice. Regards, Carl-Daniel -- http://www.hailfinger.org/
participants (2)
-
Carl-Daniel Hailfinger
-
Peter Nixon