Mailinglist Archive: opensuse (3337 mails)

< Previous Next >
Sun X2100 (nv)ethernet problem with SUSE 10.0
  • From: Peter Nixon <listuser@xxxxxxxxxxxxxx>
  • Date: Mon, 3 Apr 2006 15:55:33 +0300
  • Message-id: <200604031555.40376.listuser@xxxxxxxxxxxxxx>
Hi Guys

We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them
magically stopped responding to network requests in the middle of the night
(after operating perfectly for 1 month). A power down solved the problem,
however a simple reboot did not! The log messages are below.

Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 3 11:42:56 server kernel: nv_stop_tx: TransmitterStatus remained
busy<7>eth0: tx_timeout: dead entries!
Apr 3 11:42:56 server kernel: Badness in local_bh_enable at
kernel/softirq.c:140
Apr 3 11:42:56 server kernel:
Apr 3 11:42:56 server kernel: Call Trace: <IRQ>
<ffffffff801383c1>{local_bh_enable+49}
<ffffffff881b8655>{:ip_conntrack:destroy_conntrack+53}
Apr 3 11:42:56 server kernel: <ffffffff802c0c1b>{__kfree_skb+219}
<ffffffff8815c29a>{:forcedeth:nv_drain_tx+138}
Apr 3 11:42:56 server kernel:
<ffffffff8815d3ec>{:forcedeth:nv_tx_timeout+92}
<ffffffff802d5e30>{dev_watchdog+0}
Apr 3 11:42:56 server kernel: <ffffffff802d5ea8>{dev_watchdog+120}
<ffffffff8013c519>{run_timer_softirq+361}
Apr 3 11:42:56 server kernel: <ffffffff801385d7>{__do_softirq+87}
<ffffffff8010f8db>{call_softirq+31}
Apr 3 11:42:56 server kernel: <ffffffff80111380>{do_softirq+48}
<ffffffff801113dd>{do_IRQ+77}
Apr 3 11:42:56 server kernel: <ffffffff8010eebc>{ret_from_intr+0}
<EOI> <ffffffff803047a0>{udp_poll+0}
Apr 3 11:42:56 server kernel: <ffffffff8010d2b0>{default_idle+0}
<ffffffff88086635>{:processor:acpi_processor_idle+292}
Apr 3 11:42:56 server kernel: <ffffffff8010d311>{cpu_idle+49}
<ffffffff804a87aa>{start_kernel+458}
Apr 3 11:42:56 server kernel: <ffffffff804a81f4>{_sinittext+500}
Apr 3 11:44:02 server kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 3 11:44:02 server kernel: nv_stop_tx: TransmitterStatus remained
busy<7>eth0: tx_timeout: dead entries!
Apr 3 11:45:07 server kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 3 11:45:07 server kernel: nv_stop_tx: TransmitterStatus remained
busy<7>eth0: tx_timeout: dead entries!
Apr 3 11:46:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 3 11:46:13 server kernel: nv_stop_tx: TransmitterStatus remained
busy<7>eth0: tx_timeout: dead entries!
Apr 3 11:47:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 3 11:47:13 server kernel: nv_stop_tx: TransmitterStatus remained
busy<7>eth0: tx_timeout: dead entries!

Now these servers are supposedly certified with SLES 9 (See
http://www.sun.com/servers/entry/x2100/os.jsp#SUSE ) however I have not been
able to get SLES 9 to install, although OES SP2 can be convinced to work with
much crossing of fingers, hand holding and a serial console cable.. Even OES
refuses however to recognise any of the USB keyboards we have tried which is
why we run SUSE 10.0.

Is anyone else have problems with the nv ethernet interface on their X2100
servers (and USB keyboards)?

We have temporarily moved all traffic to the 2nd GigE interface on this server
(which uses a different chipset) however this is not a long term solution. I
were planning to use link bonding between the 2 interfaces for redundancy to
2 different switches however the fact the the interface doesn't come back up
without a complete poweroff means that this is a dangerous bug...

--

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc
< Previous Next >