
Hi, we have some serious trouble to get multicast networking to work on our AMD Opteron machines. The problem seems to be that for some reason the amd64 machines are sending IGMP V3 messages with a wrong checksum whereas our Intel boxes are sending IGMP V2 messages. Following is a packet dump from one of our Opteron boxes as we start xntpd in multicast mode (IP addresses changed to protect the innocent): "uname -a" output: Linux monster540 2.4.21-201-smp #1 SMP Wed Feb 18 19:17:53 UTC 2004 x86_64 x86_64 x86_64 GNU/Linux "tethereal -V igmp" output: Capturing on eth0 Frame 1 (54 bytes on wire, 54 bytes captured) Arrival Time: Feb 25, 2004 14:40:01.975282000 Time delta from previous packet: 0.000000000 seconds Time relative to first packet: 0.000000000 seconds Frame Number: 1 Packet Length: 54 bytes Capture Length: 54 bytes Ethernet II, Src: 00:09:3d:00:09:fe, Dst: 01:00:5e:00:00:16 Destination: 01:00:5e:00:00:16 (01:00:5e:00:00:16) Source: 00:09:3d:00:09:fe (00:09:3d:00:09:fe) Type: IP (0x0800) Internet Protocol, Src Addr: some.ip.add.ress (some.ip.add.ress), Dst Addr: 224.0.0.22 (224.0.0.22) Version: 4 Header length: 24 bytes Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00) 1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 40 Identification: 0xa360 (41824) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 1 Protocol: IGMP (0x02) Header checksum: 0x1ff5 (correct) Source: some.ip.add.ress (some.ip.add.ress) Destination: 224.0.0.22 (224.0.0.22) Options: (4 bytes) Router Alert: Every router examines packet Internet Group Management Protocol IGMP Version: 3 Type: Membership Report (0x22) Header checksum: 0xf9fe (incorrect, should be 0xf8fc) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this looks suspicious Num Group Records: 1 Group Record : 224.0.1.1 Change To Exclude Mode Record Type: Change To Exclude Mode (4) Aux Data Len: 0 Num Src: 0 Multicast Address: 224.0.1.1 (224.0.1.1) If I do the same thing on a i686 machine it looks like this: "uname -a": Linux greyhound 2.4.21-192-default #1 Wed Feb 18 19:26:28 UTC 2004 i686 i686 i386 GNU/Linux "tethereal -V igmp": Frame 2 (46 bytes on wire, 46 bytes captured) Arrival Time: Feb 25, 2004 14:41:16.268523000 Time delta from previous packet: 1.944834000 seconds Time relative to first packet: 1.944834000 seconds Frame Number: 2 Packet Length: 46 bytes Capture Length: 46 bytes Ethernet II, Src: 00:06:5b:76:d0:03, Dst: 01:00:5e:00:01:01 Destination: 01:00:5e:00:01:01 (01:00:5e:00:01:01) Source: 00:06:5b:76:d0:03 (00:06:5b:76:d0:03) Type: IP (0x0800) Internet Protocol, Src Addr: some.other.ip.addr (some.other.ip.addr), Dst Addr: 224.0.1.1 (224.0.1.1) Version: 4 Header length: 24 bytes Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00) 1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 32 Identification: 0x0bfd (3069) Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 1 Protocol: IGMP (0x02) Header checksum: 0xc6c3 (correct) Source: some.other.ip.addr (some.other.ip.addr) Destination: 224.0.1.1 (224.0.1.1) Options: (4 bytes) Router Alert: Every router examines packet Internet Group Management Protocol IGMP Version: 2 Type: Membership Report (0x16) Max Response Time: 0.0 sec (0x00) Header checksum: 0x08fe (correct) Multicast Address: 224.0.1.1 (224.0.1.1) Does anybody know why these two platforms behave differently and why the x86_64 machines uses IGMP V3 and not V2? Karsten.

On Wed, 25 Feb 2004 14:56:24 -0500 Karsten Künne <kuenne@rentec.com> wrote:
we have some serious trouble to get multicast networking to work on our AMD Opteron machines. The problem seems to be that for some reason the amd64 machines are sending IGMP V3 messages with a wrong checksum whereas our Intel boxes are sending IGMP V2 messages. Following is a packet dump from one of our Opteron boxes as we start xntpd in multicast mode (IP addresses changed to protect the innocent):
I filed a bug. What NIC is monster540 using?
Does anybody know why these two platforms behave differently and why the x86_64 machines uses IGMP V3 and not V2?
Different kernel versions. IGMPv3 was recently added to support MLD. -Andi

On Wednesday 25 February 2004 18:17, Andi Kleen wrote:
On Wed, 25 Feb 2004 14:56:24 -0500
Karsten Künne <kuenne@rentec.com> wrote:
we have some serious trouble to get multicast networking to work on our AMD Opteron machines. The problem seems to be that for some reason the amd64 machines are sending IGMP V3 messages with a wrong checksum whereas our Intel boxes are sending IGMP V2 messages. Following is a packet dump from one of our Opteron boxes as we start xntpd in multicast mode (IP addresses changed to protect the innocent):
I filed a bug. What NIC is monster540 using?
It has two of these: Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02) But I don't think it's the driver. We tried the bcm5700 and the tg3 and both had the same problem. One of our guys also looked at the checksum routine and apparently fixed it! But I don't know all the details, I have to talk to that guy tomorrow. I'll let you know.
Does anybody know why these two platforms behave differently and why the x86_64 machines uses IGMP V3 and not V2?
Different kernel versions. IGMPv3 was recently added to support MLD.
-Andi
Karsten.

On Wednesday 25 February 2004 19:15, Karsten Künne wrote:
But I don't think it's the driver. We tried the bcm5700 and the tg3 and both had the same problem. One of our guys also looked at the checksum routine and apparently fixed it! But I don't know all the details, I have to talk to that guy tomorrow. I'll let you know.
O.k., I follow up to my own mail. Apparently what we found out is that the checksum function ip_compute_csum for x86_64 is wrong. For i386 it looks like this: from /usr/src/linux/include/asm-i386/checksum.h: static inline unsigned short ip_compute_csum(unsigned char * buff, int len) { return csum_fold (csum_partial(buff, len, 0)); } whereas for x86_64 it looks like this: from /usr/src/linux/arch/x86_64/lib/csum-partial.c: unsigned short ip_compute_csum(unsigned char * buff, int len) { return ~csum_partial(buff,len,0); } (don't ask me why the functions are in completely different areas of the tree) If we change the x86_64 function to the same as the i386 function everything falls into place and IGMP starts working. Now it also uses IGMP V2 instead of V3 because apparently the same function is used in order to compute the checksum on incoming IGMP packets and they were all rejected because of the broken function so that the kernel fell back to the default IGMP V3. Apparently ip_compute_csum is mainly used in IGMP, that might explain why this bug wasn't detected earlier. Credits for discovering it don't go to me but to one of our guys. He'll also send the fix to other mailing lists. Karsten.

On Thu, 26 Feb 2004 11:32:24 -0500 Karsten Künne <kuenne@rentec.com> wrote:
unsigned short ip_compute_csum(unsigned char * buff, int len) { return ~csum_partial(buff,len,0); }
(don't ask me why the functions are in completely different areas of the tree)
Indeed. Silly bug. Thanks for the fix. I notice it's also used in ipt_REJECT, which explains some weird behaviour I once noticed in firewall rules, but never tracked down.
Apparently ip_compute_csum is mainly used in IGMP, that might explain why this bug wasn't detected earlier. Credits for discovering it don't go to me but to one of our guys. He'll also send the fix to other mailing lists.
Not needed. I'm the responsible maintainer anyways and I have it already now. -Andi

Howdy world, I finally got 2.6.3 to work under SuSE (well damn close anyway). The point was to get SATA drives to work with ide drives and not headaches, but I lost support of serial modems (and I've not quite figured out where to load ppp during boot, but that seems trivial compared to the pain so far). To get serial ports to work, I needed to add the following lines to modprobe.conf: alias /dev/tts serial alias /dev/tts/0 serial alias /dev/tts/1 serial alias /dev/tts/2 serial alias /dev/tts/3 serial alias /dev/ttyS* serial and in the install blocks install serial /sbin/modprobe 8250 && /etc/init.d/setserial start remove serial /etc/init.d/setserial stop ; /sbin/modprobe -r 8250 (but I didn't include module unloading so the last line isn't used). So, for anyone else attempting to get an AMD64 under SuSE to work with SATA, EIDE and modems, I hope that helps. Patience, persistence, truth, Dr. mike
participants (3)
-
Andi Kleen
-
Karsten Künne
-
Mike Rosing