http://bugzilla.novell.com/show_bug.cgi?id=415607
User nfbrown@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=415607#c40
Neil Brown changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEEDINFO
Info Provider| |R.Vickers@cs.rhul.ac.uk
--- Comment #40 from Neil Brown 2009-08-12 22:47:13 MDT ---
Thanks for the trace.
As you say, no NFS traffic. It seem the sunrpc code is confused about the
status of the connection. It knows it needs to open a new connection, but
it seems to think that is already happening, despite the fact that it
isn't.
Matching the trace to the code, it seems that either XPRT_CLOSING is set,
or XPRT_CONNECTING is set.
When XPRT_CONNECTING is set, it schedules a task for at most 5 minutes in
the future which is guaranteed to clear XPRT_CONNECTING again, so that
cannot stay set for very long.
The trace you captured was for less than 5 minutes so I cannot be sure
that the task didn't fire in that time, but it seems unlikely.
So that points to XPRT_CLOSING.
There is an upstream patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif...
which changes move the setting of that flag. I'm not a all certain that
this patch will fix the problem but it is possible, and is the only
credible possibility at the moment.
The important part of the patch is moving the "set_bit(XPRT_CLOSING" from
the TCP_CLOSE_WAIT case to the TCP_LAST_ACK case of the switch.
Are you in a position to compile your own kernel? Could you apply
that patch and try it out?
I'll get that patch into the 10.1 kernel (it is already in 11.1) so it
will be in any future update, but I don't know when that is likely to be.
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.