[Bug 500855] New: EIO error writing to NFS
http://bugzilla.novell.com/show_bug.cgi?id=500855 Summary: EIO error writing to NFS Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: jnelson-suse@jamponi.net QAContact: qa@suse.de Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.0.9) Gecko/2009041500 SUSE/3.0.9-0.1.1 Firefox/3.0.9 I've noticed lately (with the latest kernel updates) that writing large files to an NFS share almost always results in an EIO error. I did not have this problem before, and I can narrow the set of changes down to two things: 1. the server us under xen (although this has been true for a few kernel releases) 2. a new kernel on client and server (both opensuse 11.1 up-to-date) I think the new kernel is to blame but I can't be sure. The easiest way to trigger it is to rsync a large (multi gigabyte) file to an NFS share. Almost invariably the process will exit with EIO. Here is one of the lines from an strace I took of rsync, (32% of the way through a 4.5GB file): 4458347 7850 <... write resumed> ) = -1 EIO (Input/output error) so it's clearly getting an EIO. dmesg just says this: nfs: server 192.168.2.1 not responding, timed out and the nfs server says nothing at all. no errors of any kind. I don't notice any other problems. Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c1
--- Comment #1 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c2
--- Comment #2 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
Zheng Chen
http://bugzilla.novell.com/show_bug.cgi?id=500855
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c3
--- Comment #3 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User sjayaraman@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c4
Suresh Jayaraman
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c5
--- Comment #5 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User sjayaraman@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c6
Suresh Jayaraman
Here is the problem: neither the client nor the server are doing anything else. There is no network load (gig-E and I'm the only client) and no client or server load, other than that induced by NFS. It seems the NFS softmount option is almost totally worthless if this is the behavior that happens - perhaps this is not the core issue but it is pointing to a deeper problem that hard mounting masks over?
Possible.. In my brief reproduction attempts I'm unable to reproduce this.. If you are able to hit this consistently at a definite pattern (say when rsync is at 32%) could you start a packet capture just before the failure is about to happen? and attach the packet capture? (for e.g. when it is 30% otherwise the dump will be huge) For capturing packets, you could do: tcpdump -s0 -w nfs-dbg.cap port 2049 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=500855
User sjayaraman@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c8
--- Comment #8 from Suresh Jayaraman
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c9
Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c10
--- Comment #10 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c11
--- Comment #11 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c12
--- Comment #12 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c13
--- Comment #13 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c14
--- Comment #14 from Jon Nelson
http://bugzilla.novell.com/show_bug.cgi?id=500855
User sjayaraman@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c15
Suresh Jayaraman
Created an attachment (id=303272) --> (http://bugzilla.novell.com/attachment.cgi?id=303272) [details] an strace of 'rm /isos/kubuntu-9.04-dvd-amd64.iso'
Can you add Neil Brown to the list of CC's on this? I tried but for some reason I couldn't.
I'm attaching a file, 'strace.txt', that shows how an rm *fails* (it reports an I/O error) however the file is still actually removed.
The most interesting part is line 115.
This is *100%* reproduceable.
Pick a large file (on some NFS share). The file I chose is 4.4G in size. rm it watch error.
NFSv3 and NFSv4, TCP and UDP.
Is this due to the short timeo value?
This is easily explainable and very much expected. You are using 'soft' mount option and using a timeo value which is too low. With 'soft' option the client will stop retrying after the timeo period. See Section E4 on NFS FAQ - http://nfs.sourceforge.net/#section_e Using 'soft' is only recommended when responsiveness is more important than data integrity and if you have to stick to 'soft' for some reason, try to adjust the timeo, retrans values appropriately. Do you see any other issues other than this? If not, feel free to close this Bug. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=500855
User jnelson-suse@jamponi.net added comment
http://bugzilla.novell.com/show_bug.cgi?id=500855#c16
Jon Nelson
participants (1)
-
bugzilla_noreply@novell.com