
Hi Guys We have a bunch of Sun X2100 opteron servers running SUSE 10.0 and one of them magically stopped responding to network requests in the middle of the night (after operating perfectly for 1 month). A power down solved the problem, however a simple reboot did not! The log messages are below. Apr 3 11:42:56 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:42:56 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:42:56 server kernel: Badness in local_bh_enable at kernel/softirq.c:140 Apr 3 11:42:56 server kernel: Apr 3 11:42:56 server kernel: Call Trace: <IRQ> <ffffffff801383c1>{local_bh_enable+49} <ffffffff881b8655>{:ip_conntrack:destroy_conntrack+53} Apr 3 11:42:56 server kernel: <ffffffff802c0c1b>{__kfree_skb+219} <ffffffff8815c29a>{:forcedeth:nv_drain_tx+138} Apr 3 11:42:56 server kernel: <ffffffff8815d3ec>{:forcedeth:nv_tx_timeout+92} <ffffffff802d5e30>{dev_watchdog+0} Apr 3 11:42:56 server kernel: <ffffffff802d5ea8>{dev_watchdog+120} <ffffffff8013c519>{run_timer_softirq+361} Apr 3 11:42:56 server kernel: <ffffffff801385d7>{__do_softirq+87} <ffffffff8010f8db>{call_softirq+31} Apr 3 11:42:56 server kernel: <ffffffff80111380>{do_softirq+48} <ffffffff801113dd>{do_IRQ+77} Apr 3 11:42:56 server kernel: <ffffffff8010eebc>{ret_from_intr+0} <EOI> <ffffffff803047a0>{udp_poll+0} Apr 3 11:42:56 server kernel: <ffffffff8010d2b0>{default_idle+0} <ffffffff88086635>{:processor:acpi_processor_idle+292} Apr 3 11:42:56 server kernel: <ffffffff8010d311>{cpu_idle+49} <ffffffff804a87aa>{start_kernel+458} Apr 3 11:42:56 server kernel: <ffffffff804a81f4>{_sinittext+500} Apr 3 11:44:02 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:44:02 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:45:07 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:45:07 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:46:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:46:13 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 3 11:47:13 server kernel: NETDEV WATCHDOG: eth0: transmit timed out Apr 3 11:47:13 server kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Now these servers are supposedly certified with SLES 9 (See http://www.sun.com/servers/entry/x2100/os.jsp#SUSE ) however I have not been able to get SLES 9 to install, although OES SP2 can be convinced to work with much crossing of fingers, hand holding and a serial console cable.. Even OES refuses however to recognise any of the USB keyboards we have tried which is why we run SUSE 10.0. Is anyone else have problems with the nv ethernet interface on their X2100 servers (and USB keyboards)? We have temporarily moved all traffic to the 2nd GigE interface on this server (which uses a different chipset) however this is not a long term solution. I were planning to use link bonding between the 2 interfaces for redundancy to 2 different switches however the fact the the interface doesn't come back up without a complete poweroff means that this is a dangerous bug... -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc