[Bug 649584] New: bridge drops nfs udp packet -- breaks/blocks NFS server via udp !!
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c0 Summary: bridge drops nfs udp packet -- breaks/blocks NFS server via udp !! Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: koenig@linux.de QAContact: qa@suse.de Found By: --- Blocker: --- Created an attachment (id=397321) --> (http://bugzilla.novell.com/attachment.cgi?id=397321) tcpdump log from nfs server nfs server over udp does not work anymore with recent 11.3 update kernels when using a bridged network device! this first showed up on my 64bit XEN dom0 server running kernel 2.6.34.7-0.4-xen being accessed as nfs server from a solaris 10 nfs client with udp protocoll (nfs via tcp works fine). tcpdump shows that nfs v3 udp "null reply" packet (packet #4 in attached file nfs-server.udp.tcpdump) does not make it onto the ethernet wire, it does not show up on the nfs client side (nfs-client.udp.snoop -- use wireshark!) -- tested between two opensuse 11.3 boxes with loopback cable! a few tests showed: - using eth0 instead of br0 "solves" the problem (nfs mount over udp works again) - using real kernel instead of xen dom0 doesn't seem to make a difference: real 2.6.34.7-0.4-desktop kernel breaks with udp nfs server too (so not likely a xen-only problem!) - on a xen domU running suse 11.3 with distro kernel 2.6.34-12-xen 64bit the nfs server works fine with udp over bridge br0 in domU. so maybe that bug got intoduced somewhere between 2.6.34-12 and 2.6.34.7-0.4 - update from 2.6.34.7-0.4-desktop to kernel-desktop-2.6.34.7-0.5.1.x86_64 did not help on my notebook as udp nfs server over br0 (works over eth0 and/or with tcp). test with: breaks: mount -o udp server:/dir /mnt works: mount -o tcp server:/dir /mnt -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c1 --- Comment #1 from Harald Koenig <koenig@linux.de> 2010-10-27 14:37:07 UTC --- Created an attachment (id=397322) --> (http://bugzilla.novell.com/attachment.cgi?id=397322) network dump from nfs client (solaris 10) -- use wireshark to view, not tcpdump! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c2 --- Comment #2 from Harald Koenig <koenig@linux.de> 2010-10-27 14:42:46 UTC --- (In reply to comment #0)
this first showed up on my 64bit XEN dom0 server running kernel 2.6.34.7-0.4-xen being accessed as nfs server from a solaris 10 nfs client with udp protocoll (nfs via tcp works fine).
FYI: until recently that xen dom0 was running opensuse 11.2 (2.6.31.12-0.2-xen) which worked fine with udp nfs over br0 too!). the problem showed up immediately after upgrading from 11.2 to 11.3 :-( -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c4 Martin Vidner <mvidner@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|mvidner@novell.com |nfbrown@novell.com --- Comment #4 from Martin Vidner <mvidner@novell.com> 2010-10-29 09:59:52 CEST --- I deal with the userspace setup and this looks like a kernel problem. Neil, can you help? I can't tell whether the problem is with NFS or bridging, but you should know better who to forward this to. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c5 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |koenig@linux.de --- Comment #5 from Neil Brown <nfbrown@novell.com> 2010-10-29 08:30:07 UTC --- So the UDP GETPORT reply gets through OK, but the UDP NFS/NULL reply doesn't. That seems significant. There must be some difference between the way NFS sends a reply and the way the GETPORT reply is sent. Please check if you are running 'portmap' or 'rpcbind' on the NFS server and report which. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c6 Harald Koenig <koenig@linux.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|koenig@linux.de | --- Comment #6 from Harald Koenig <koenig@linux.de> 2010-10-29 11:59:43 UTC --- (In reply to comment #5)
So the UDP GETPORT reply gets through OK, but the UDP NFS/NULL reply doesn't. That seems significant. There must be some difference between the way NFS sends a reply and the way the GETPORT reply is sent.
Please check if you are running 'portmap' or 'rpcbind' on the NFS server and report which.
rpcbind is running: # lsof | grep -i tcp.*rpc rpcbind 3715 root 8u IPv4 8267 0t0 TCP *:sunrpc (LISTEN) rpcbind 3715 root 11u IPv6 8272 0t0 TCP *:sunrpc (LISTEN) # ps p 3715 PID TTY STAT TIME COMMAND 3715 ? Ss 0:51 /sbin/rpcbind # rpm -qf /sbin/rpcbind rpcbind-0.1.6+git20080930-10.1.x86_64 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c7 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #7 from Neil Brown <nfbrown@novell.com> 2010-11-23 07:16:33 UTC --- I can definitely reproduce this. I had a theory that something was going wrong with sendmsg. nfsd uses sendmsg as does portmap, but rpcbind used sendto. However I have now ruled that out. It looks like I might have to try bisecting. It is almost certainly a problem with the bridging interface interacting poorly with sunrpc in the kernel, but I cannot guess what it would look like.. I'll try to dig more deeply tomorrow. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c8 --- Comment #8 from Neil Brown <nfbrown@novell.com> 2010-11-24 00:31:27 UTC --- More by luck than good management, I have discovered that if you compile a current opensuse-11.3 kernel with IPv6 disable, it works find. With IPv6 enabled, it doesn't. So that narrows it down a little bit. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c9 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |koenig@linux.de --- Comment #9 from Neil Brown <nfbrown@novell.com> 2010-12-02 01:21:28 UTC --- There is something really weird happening here. I have reproduce this on my notebook with openSUSE:11.3, but not on my test machine with openSUSE:Factory installed nor on a kvm virtual machine with openeSUSE:11.3 installed. May it is hardware-specific ... what network controller do you have on your NFS server ? (lspci | grep -i net). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c10 --- Comment #10 from Neil Brown <nfbrown@novell.com> 2010-12-02 06:31:23 UTC --- More exploration show that this was definitely broken back at 2.6.34-12 so it isn't a recent regression. Also it seems to work in 2.6.36 - available in 'Factory'. It affects my Intel 82577LM Gigabit network card (e1000e driver) but not the Broadcom BCM5721 card (bnx driver) in my test server. I would suggest at least trying a kernel from http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/ and confirm that works. If it does, we might have to leave it at that. Finding and backporting the fix may not be worth the effort. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c11 Harald Koenig <koenig@linux.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|koenig@linux.de | --- Comment #11 from Harald Koenig <koenig@linux.de> 2010-12-02 16:29:21 UTC --- (In reply to comment #9)
There is something really weird happening here.
I have reproduce this on my notebook with openSUSE:11.3, but not on my test machine with openSUSE:Factory installed nor on a kvm virtual machine with openeSUSE:11.3 installed.
May it is hardware-specific ... what network controller do you have on your NFS server ? (lspci | grep -i net).
my notebook (for the "real" kernel test, I was using the internal e1000 as eth0): $ /sbin/lspci | grep -i net 00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03) 03:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61) 16:00.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) and the XEN dom0 server (tested both interfaces): /sbin/lspci | grep -i net 04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c12 --- Comment #12 from Harald Koenig <koenig@linux.de> 2010-12-02 16:33:30 UTC --- (In reply to comment #10)
I would suggest at least trying a kernel from
both my notebook and the XEN server are "in production" and getting some slots for downtime (and my own work time) is hard right now (Xmas etc.;). maybe next week I can do some tests (cross fingers....) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c13 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |koenig@linux.de --- Comment #13 from Neil Brown <nfbrown@novell.com> 2011-02-22 03:11:09 UTC --- Hi, have you had a chance to try a new kernel yet to see if that overcomes the problem? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c14 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED InfoProvider|koenig@linux.de | Resolution| |NORESPONSE --- Comment #14 from Neil Brown <nfbrown@novell.com> 2011-04-11 07:10:46 UTC --- Resolving as 'no response'. Please reopen if more information becomes available as mentioned in previous comment. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=649584 https://bugzilla.novell.com/show_bug.cgi?id=649584#c15 Erik Brakkee <erik@brakkee.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |erik@brakkee.org --- Comment #15 from Erik Brakkee <erik@brakkee.org> 2012-02-11 19:22:46 UTC --- I encountered a similar issue today. What I did was I moved a KVM virtual machine from a server to a laptop. The virtual machine is running opensuse 11.3 as was the host server on which it was running before. I use a bridged setup for the network interface. My laptop is running opensuse 12.1. After moving the virtual machine on my laptop, I immediately noticed that my TVIX M-6500 box could not successfully mount the NFS shares anymore. Looking in the server logs of the virtual machine however did not show anything significant. When opening the NFS directory on the TVIX, the server logs show Feb 11 20:11:25 shikra mountd[4032]: authenticated mount request from 192.168.2.4:636 for /data (/data) This is similar output to what I get when mounting the same directory from a linux box (opensuse 11.3 and opensuse 12.1). Because of this I was thinking that perhaps the TVIX box was using UDP and the linux machines were using TCP which could be the difference. Then I tried replacing the Device model for the network card in the VM from virtio to rtl8139, and after this it works. Therefore, my conclusion is that it is somehow an interoperability issue with the virtio driver in guest and host machines. Apparently this driver is not fully downwards compatible. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com