[Bug 668194] New: dhcp client not working properly in Xen domU due to partial checksum offload
https://bugzilla.novell.com/show_bug.cgi?id=668194 https://bugzilla.novell.com/show_bug.cgi?id=668194#c0 Summary: dhcp client not working properly in Xen domU due to partial checksum offload Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: All OS/Version: openSUSE 11.3 Status: NEW Severity: Normal Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: samuel.kvasnica@ims.co.at QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101203 SUSE/3.6.13-0.2.1 Firefox/3.6.13 Well, hard to say if this ought be fixed on xen or on dhdp side, I'm rather assigning it to the network component. If dhcp client is deployed from within domU (no matter if dhcpdp or dhclient), we start seeing this storm on dhcp server from the first lease renewal on: Jan 22 16:54:40 master dhcpd: DHCPREQUEST for 192.168.101.83 from 02:ee:ee:ee:11:01 (testsrv) via br0 Jan 22 16:54:40 master dhcpd: DHCPACK on 192.168.101.83 to 02:ee:ee:ee:11:01 (testsrv) via br0 Jan 22 16:54:43 master dhcpd: DHCPREQUEST for 192.168.101.83 from 02:ee:ee:ee:11:01 (testsrv) via br0 Jan 22 16:54:43 master dhcpd: DHCPACK on 192.168.101.83 to 02:ee:ee:ee:11:01 (testsrv) via br0 Jan 22 16:54:46 master dhcpd: DHCPREQUEST for 192.168.101.83 from 02:ee:ee:ee:11:01 (testsrv) via br0 Jan 22 16:54:46 master dhcpd: DHCPACK on 192.168.101.83 to 02:ee:ee:ee:11:01 (testsrv) via br0 and these on the client side: Jan 22 16:54:44 logsrv dhcpcd[1836]: eth0: bad UDP checksum, ignoring Jan 22 16:54:47 logsrv dhcpcd[1836]: eth0: bad UDP checksum, ignoring Jan 22 16:54:50 logsrv dhcpcd[1836]: eth0: bad UDP checksum, ignoring Googling a bit around, this seems to be a known issue for quite a long time (2006?). Seems like the dhcp does not play well with partial checksum offload done on dom0 side of the vif interface. For me as a workaround now, setting 'ethtool -K $vif tx off' in dom0 helps, the question is still, how should the final solution look like ? Maybe properly patched dhcpd or dhcpcd ? There used to be a partial checksum patch on redhat, see: http://www.redhat.com/archives/fedora-cvs-commits/2007-April/msg00452.html http://rpmfind.net/linux/RPM/fedora/12/i386/dhcp-4.1.0p1-12.fc12.i686.html Why was this not included on suse ? Reproducible: Always Steps to Reproduce: 1. try to use dhcp client in pvm xen domU domain 2. wait until first lease renewal 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c
wei wang
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c1
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c2
--- Comment #2 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c3
Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c4
--- Comment #4 from Martin Vidner
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c5
--- Comment #5 from Samuel Kvasnica
AFAIK, it also works fine, when you add ETHTOOL_OPTIONS='-K iface tx off' to the interface config (that is to /etc/sysconfig/network/ifcfg-br0 assuming the interface name of the bridge is 'br0' or ifcfg-$vif in case of e.g. routed setup) or as you did to the xen scripts.
Well, fiddling around with offload is a temporary quick workaround but not really a valid solution. It is a performance issue to switch in general offload off on br0. Also for vif. Partial checksumming was not introduced just for fun. So if all other distros include the dhcp patch, what is the reason suse does not want it ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c6
--- Comment #6 from Marius Tomaschewski
It is a performance issue to switch in general offload off on br0. Also for vif. Partial checksumming was not introduced just for fun.
It is a brief performance optimization creating interoperability problems and rejected by ISC, see: https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001825.html https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001832.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c7
--- Comment #7 from Samuel Kvasnica
It is a brief performance optimization creating interoperability problems and rejected by ISC, see:
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001825.html https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001832.html
Yeah, I've read already that whole thread... and came to the conclusion the dhcpd is buggy. I'm not really happy to e.g. decrease the performance of a local nfs mount to satisfy a buggy dhcp server. The checksums are needed only if a packet leaves the machine by wire. And even that is done by hardware in the NIC. Here we try to emulate the NIC checksumming because of buggy userland software. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c8
--- Comment #8 from Marius Tomaschewski
(In reply to comment #6)
It is a brief performance optimization creating interoperability problems and rejected by ISC, see:
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001825.html https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001832.html
Yeah, I've read already that whole thread... and came to the conclusion the dhcpd is buggy.
No, it isn't. AFAIK it verifies the checksum if there is one or skips the check if the packet does not have one and discards with incorrect one.
I'm not really happy to e.g. decrease the performance of a local nfs mount to satisfy a buggy dhcp server. The checksums are needed only if a packet leaves the machine by wire.
Yes, so just turn off checksums.
And even that is done by hardware in the NIC.
Exactly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c9
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c11
Olaf Kirch
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c12
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c13
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c15
--- Comment #15 from Samuel Kvasnica
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c17
--- Comment #17 from Marius Tomaschewski
Olaf, thanks for speaking the clear words on checksum offload topic ! :-)
I can't resist now :-) It is as Olaf writes:
The bug here is that the packet with the partial checksum is delivered to the application.
That is: the bug is in the xen kernel (driver) -- we just go to patch every application using UDP instead of xen. The problem AFAIR does not occur at all when: the dhcp server is not in xen dom0 of the same host or when the dhcp server is running in a vm too [at least on kvm]. AFAIK, the OS/stack can make partial checksums, that will be corrected/completed by the hardware (in this case job for the xen NIC driver) _before_ the packets go over the wire to the another host. And this correction does not happen here. But when Olaf means, that this is the way we have to take [better than to disable the partial checksums what AFAIK happens automatically on kernels that do not support GSO but e.g. TSO only], I'll patch the apps. Question is, what is the next app using UDP we're going to patch -- ipsec? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c18
--- Comment #18 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c19
--- Comment #19 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c20
--- Comment #20 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c22
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c23
Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c24
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c26
Olaf Kirch
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c27
--- Comment #27 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c28
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c29
Marius Tomaschewski
Very likely (though I'm not a networking expert at all). In 11.4 (further improved subsequently [2.6.38-rc, not yet committed] following a recent discussion on netdev, with a possibility of pulling this back into 11.4
11.3 and sles11 ? ;-)
once sufficiently verified) I sync-ed checksum behavior with what pv-ops has been doing for a while, which particularly results in
if (rx->flags & NETRXF_csum_blank) skb->ip_summed = CHECKSUM_PARTIAL; else if (rx->flags & NETRXF_data_validated) skb->ip_summed = CHECKSUM_UNNECESSARY; else skb->ip_summed = CHECKSUM_NONE;
replacing the cited code sequence in netfront.
So testing with the current 11.4 kernel in Dom0 and DomU-s would certainly be quite helpful.
OK, I can (backup my text boxes and) update. Maybe I got it all wrong -- you write above it is not committed yet, so I assume it is not yet in http://download.opensuse.org/factory/repo/oss/suse/x86_64/kernel-xen-2.6.37-... right? Do you have kernel-xen rpms, where above patch is applied? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c30
Jan Beulich
Very likely (though I'm not a networking expert at all). In 11.4 (further improved subsequently [2.6.38-rc, not yet committed] following a recent discussion on netdev, with a possibility of pulling this back into 11.4
11.3 and sles11 ? ;-)
sle11 sp2 - yes. sp1 - probably not without an adequate amount of testing. 11.3 - not sure, but more no than yes.
Maybe I got it all wrong -- you write above it is not committed yet,
All I said was that the extra changes to post-11.4 kernel code aren't committed yet.
so I assume it is not yet in
http://download.opensuse.org/factory/repo/oss/suse/x86_64/kernel-xen-2.6.37-...
right?
The bits in 11.4 have been in git for a couple of weeks I think. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c31
--- Comment #31 from Marius Tomaschewski
sle11 sp2 - yes. sp1 - probably not without an adequate amount of testing. 11.3 - not sure, but more no than yes.
OK. As it AFAIS does not break anything, I've submitted the ISC dhcp patch for 11.4 already and it has been accepted: $ osc ls openSUSE:11.4 dhcp | grep xen-checksum dhcp-4.2.0-xen-checksum.patch but is currently built in factory-snapshot only until now: http://download.opensuse.org/factory-snapshot/repo/oss/suse/x86_64/dhcp-4.2.... Everything older than 11.4 has to be decided by Maintenance-Team anyway. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c32
--- Comment #32 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c33
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c34
--- Comment #34 from Jan Beulich
11.3 and sles11 ? ;-)
Actually, for sle11 the proposed patch in bug 652942 comment 41 could be a less intrusive alternative to what is done in 11.4 and master, and hence more reasonable to consider applying before sp2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c35
--- Comment #35 from Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c39
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c40
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c41
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c42
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c43
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c
Marius Tomaschewski
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c44
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c45
--- Comment #45 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c46
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c47
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c48
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c49
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c50
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c51
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c52
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c53
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c54
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=668194
https://bugzilla.novell.com/show_bug.cgi?id=668194#c55
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com