tcp checksum errors

Dear List, We've had a problem with a tcp app communicating with an external system. On diagnosis tcpdump is showing a chksum error on the packets that go out that don't get received. This is Suse 10.0 - 2.6.13-15.8-smp x86_64 The socket layer in the app, ::send and ::write, report no errors. We've confirmed such errors as reported by tcpdump are occurring reasonably frequently without too much consequence on intel e1000, intel e100, broadcom, nvidia ck804. (tyan, shuttle and iwill mainboards). Problem has been hidden we think as the apps seem to recover or not loose the pkts. The only discernable difference is there is a firewall on the route between the systems where the packets don't get through, but we're grasping at straws. We can duplicate the error conditions with scp and telnet (if the chksum error is real). We are seeing no problems with a couple of systems on their pci card based realtek 8169 cards. Same systems have issues with their mainboard adaptors. Smells like a problem. It could be tcpdump / libpcap misreporting but the lost packets indicate a problem. We don't lose packets that have a good chksum but we do occaisonally see packets survive that are marked as bad chksum... On similar hardware CentOS 4.2 with 2.6.9-22.ELsmp doesn't report similar errors. Just installed the same CentOS on a system reporting the errors and it no longer has such errors. We saw some notes about issues with APIC with various adaptors, tried booting with no apic and same errors. Any thoughts? Matt. matthurd@acm.org

On Monday 27 February 2006 17:12, Matt Hurd wrote:
Any thoughts?
In 10.0 tcpdump on the local machine reports bogus checksum errors when TSO is enabled. You can disable it with ethtool or run the tcpdump on a different machine. The 10.0 Nvidia driver shouldn't use TSO though I think and e100 also probably not. BCM and E1000 do. As to why your packets are lost I don't know. -Andi

On 01/03/06, Andi Kleen <ak@suse.de> wrote: On Monday 27 February 2006 17:12, Matt Hurd wrote:
Any thoughts?
In 10.0 tcpdump on the local machine reports bogus checksum errors when TSO is enabled. You can disable it with ethtool or run the tcpdump on a different machine.
The 10.0 Nvidia driver shouldn't use TSO though I think and e100 also probably not. BCM and E1000 do.
As to why your packets are lost I don't know.
Interesting. It is hard to know exactly what is going on. The tcpdump chksum might be a red herring. Might have to turn a switch port into a monitor so we can Lanalyze the pkts. Thanks and regards, Matt. matthurd@acm.org
participants (2)
-
Andi Kleen
-
Matt Hurd