[Bug 584720] New: ls: cannot access /nfs/turnip/windows: Stale NFS file handle
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c0 Summary: ls: cannot access /nfs/turnip/windows: Stale NFS file handle Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: jnelson-suse@jamponi.net QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100204 SUSE/3.5.8-0.1.1 Firefox/3.5.8 I have an openSUSE 11.2 client and server pair. The server exports a number of filesystems with NFSv4. The pseudo filesystem is using 'crossmnt'. One of the directories under the exported filesystem was removed with rmdir. It was not a mount and was not exported specifically. The client, when listing the mounted filesystem, took some time (it was not instant - perhaps 15-30 seconds) before it noticed that the directory was different somehow, but then misbehaved. Instead of not showing the directory (because it doesn't exist), instead the client reported a Stale NFS file handle. This was unexpected and, IMO, not correct. NOTE: the client is running a KOTD kernel, 2.6.33-25-desktop Summary: Server exports the /exports directory, and with the crossmnt option, everything beneath it. the /exports/windows directory did exist (on the server), but had nothing mounted on it. It was also empty. The client, when listing the mount (on /nfs/turnip), saw /nfs/turnip/windows, as expected. On the server, I removed the directory with rmdir /exports/windows. On the client, ls /nfs/turnip continued to show the windows directory, unexpectedly. At some point, the client began grumping about a stale NFS file handle, and continued to show the windows directory. Eventually, the client stopped grumping, and the windows directory disappeared, as expected. Expected: the windows directory should have disappeared more or less immediately from the client's point of view. Unexpected: the windows directory did not disappear immediately. After some delay the client produced error messages (stale NFS file handle), and then eventually the directory disappeared. I am readily available for debugging. Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c2 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |jnelson-suse@jamponi.net --- Comment #2 from Neil Brown <nfbrown@novell.com> 2010-03-08 21:07:08 UTC --- This is not entirely unexpected, but does seem to be a bit worse in you can than normal. The client caches information from the server and the protocol for refreshing the cache when changes happen on the server. For NFSv4, this depends on the 'ctime' of a file or directory. What should have happened is: - when you listed /nfs/turnip/ and saw 'windows', the client would have remembered this content and also stored the ctime of the directory /exports - when you removed /exports/windows, the ctime on the /exports directory should have changed - when you "ls /nfs/turnip' again it should have checked with the server for the current ctime, found that it has changed, and requested a new list of files. Apparently it didn't. 'ctime' often only has a resolution of 1 second so multiple changes in the one second can cause this confusion, though I doubt that happened here. Something else must have confused the cache-coherency protocol. What sort of filesystem contain /exports? ext3? or something else/ Could any other change have happened to /exports at much the same time as windows was removed? Are you able to reproduce this symptom, or was it a one-off? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c3 Jon Nelson <jnelson-suse@jamponi.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|jnelson-suse@jamponi.net | --- Comment #3 from Jon Nelson <jnelson-suse@jamponi.net> 2010-03-08 21:34:34 UTC --- The filesystem this time around was jfs. I also tried on another exported filesystem (ext4) and got identical results. NOTE: *these* tests performed on a different client running the *stock* (updated) openSUSE 11.2 kernel - 2.6.31.12. I did *not* get the "Stale File Handle" error on this kernel. I will be testing again using the KOTD when I can find time. Although no errors appeared, the cache coherency still seems off. I ran the following test: On the client, with the NFSv4 *root* mounted on /mnt: while true; do date; sleep 0.8; ls -la /mnt ; done On the server: mkdir /exports/foo and then, on the client, quickly memorized the 'date' output. Approx 55 seconds pass before 'foo' shows up. After a bit (say, a few minutes), I memorize the 'date' output again and switch to the server, where I rmdir /exports/foo. Approx. 25 seconds pass before 'foo' goes away, but this time without error. These times are repeatable. Other questions you asked, answered: /exports almost certainly did not undergo other changes during this time. I am able to reproduce the cache coherency issues, not the "Stale NFS handle" issues, but I will not be able to try with a KOTD kernel until later. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c4 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #4 from Neil Brown <nfbrown@novell.com> 2010-04-20 07:07:48 UTC --- I can duplicate something that is at least a lot like this. If the name you give to "ls -l" includes any path components that are in the NFS filesystem the you see changes immediately, if not you don't. So if "/mnt" is a mount point then while true; do sleep 0.9; ls -l /mnt/foo ; done will immediately show changes to 'foo', while cd /mnt/foo while true; do sleep 0.9 ; ls -l . ; done will not. "ls -l /mnt" acts like "ls -l ." because no part of "/mnt" is within the mounted filesystem. This inconsistency could certainly be seen as a bug. NFS normally claims 'close-to-open' consistency, which means if one process updates a file then closes it, and another process subsequently opens the file (with the open being after the close) then the second process will see the changes even if it is on another host. I'm not sure if this is supposed to apply equally to directories and it cannot apply exactly as you don't open a directory in order to change it. However the equivalent should be that if you open a directory to read from it after a change has been made, you should see that change. When opening the mount point or the current directory, NFS is never told that the directory was opened. The first it knows about it is a readdir request against an inode which was a mountpoint or current directory. For other directories to does the equivalent of 'open' processing during the lookup of the name of the directory. but these two directories don't have names (in the same sense). So that doesn't work. A possible solution would to revalidate the directory at the start of readdir if f_pos is zero. i.e diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index a1f6b44..df4f0a6 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -560,6 +560,9 @@ static int nfs_readdir(struct file *filp, void *dirent, filldir_t filldir) desc->entry = &my_entry; nfs_block_sillyrename(dentry); + if (filp->f_pos == 0) + /* Force attribute validity at open */ + NFS_I(inode)->cache_validity |= NFS_INO_REVAL_PAGECACHE; res = nfs_revalidate_mapping(inode, filp->f_mapping); if (res < 0) goto out; This appears to work, but may be over-zealous. I'll ask upstream. This does not address the fact that you sometimes get 'stale file handle'. I have managed to reproduce that though only once and with slightly older kernels. I might explore that side again once this side is resolved. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c5 --- Comment #5 from Neil Brown <nfbrown@novell.com> 2010-05-06 04:15:57 UTC --- Upstream is being rather quiet on this. It needs input from Al Viro and he seems to be unavailable. My current draft patch is diff --git a/fs/namei.c b/fs/namei.c index a7dce91..256ae13 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -719,7 +719,11 @@ static int do_lookup(struct nameidata *nd, struct qstr *name, done: path->mnt = mnt; path->dentry = dentry; - __follow_mount(path); + if (__follow_mount(path) && + (path->mnt->mnt_sb->s_type->fs_flags & FS_REVAL_DOT)) { + if (!path->dentry->d_op->d_revalidate(path->dentry, nd)) + return -ESTALE; + } return 0; need_lookup: @@ -1619,6 +1623,7 @@ static struct file *do_last(struct nameidata *nd, struct path *path, switch (nd->last_type) { case LAST_DOTDOT: follow_dotdot(nd); + case LAST_DOT: dir = nd->path.dentry; if (nd->path.mnt->mnt_sb->s_type->fs_flags & FS_REVAL_DOT) { if (!dir->d_op->d_revalidate(dir, nd)) { @@ -1627,7 +1632,6 @@ static struct file *do_last(struct nameidata *nd, struct path *path, } } /* fallthrough */ - case LAST_DOT: case LAST_ROOT: if (open_flag & O_CREAT) goto exit; -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c6 --- Comment #6 from Neil Brown <nfbrown@novell.com> 2010-05-31 00:29:36 UTC --- Update: Al Viro has applied a patch which fixes the problem with "ls -l .". I've submitted a simple patch to fix the problem with "ls -l /nfs/mount/point". http://www.spinics.net/lists/linux-nfs/msg13232.html No response yet. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c7 --- Comment #7 from Neil Brown <nfbrown@novell.com> 2010-08-10 05:27:07 UTC --- Still no response from upstream, which is unusual. I've sent another email. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c8 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |jnelson-suse@jamponi.net --- Comment #8 from Neil Brown <nfbrown@novell.com> 2010-08-11 03:20:06 UTC --- OK, it worked that time - upstream has accepted the 2nd patch. So this problem should be resolved in 2.6.36 and that will filter through to Factory in due course. Jon: is this still an issue for you (After all this time)? Are you still on 11.2 or have you migrated to 11.3? I'll submit patches to 11.3 (and probably Factory so it gets in soon), but will only both with 11.2 if that will particularly help you. Let me know. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c9 Jon Nelson <jnelson-suse@jamponi.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|jnelson-suse@jamponi.net | --- Comment #9 from Jon Nelson <jnelson-suse@jamponi.net> 2010-08-11 03:22:22 UTC --- Cool! I've been following the progress here. Very impressive! I'm on openSUSE 11.3, now, and in fact for one machine I even run the KOTD. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=584720 http://bugzilla.novell.com/show_bug.cgi?id=584720#c10 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #10 from Neil Brown <nfbrown@novell.com> 2010-08-11 04:10:56 UTC --- Thanks. The requires patches are now in 11.3 and Factory, so they should appear in the next kotd and the next release. So: resolving as 'fixed'. Thanks for your patience. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=584720 https://bugzilla.novell.com/show_bug.cgi?id=584720#c11 Swamp Workflow Management <swamp@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard| |maint:running:35398:moderat | |e --- Comment #11 from Swamp Workflow Management <swamp@suse.com> 2010-08-24 15:22:53 UTC --- The SWAMPID for this issue is 35398. This issue was rated as moderate. Please submit fixed packages until 2010-09-07. When done, please reassign the bug to security-team@suse.de. Patchinfo will be handled by security team. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=584720 https://bugzilla.novell.com/show_bug.cgi?id=584720#c12 Swamp Workflow Management <swamp@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard|maint:running:35398:moderat |maint:running:35398:moderat |e |e maint:released:11.3:35403 --- Comment #12 from Swamp Workflow Management <swamp@suse.com> 2010-09-08 13:08:18 UTC --- Update released for: kernel-debug, kernel-debug-base, kernel-debug-base-debuginfo, kernel-debug-debuginfo, kernel-debug-debugsource, kernel-debug-devel, kernel-debug-devel-debuginfo, kernel-default, kernel-default-base, kernel-default-base-debuginfo, kernel-default-debuginfo, kernel-default-debugsource, kernel-default-devel, kernel-default-devel-debuginfo, kernel-desktop, kernel-desktop-base, kernel-desktop-base-debuginfo, kernel-desktop-debuginfo, kernel-desktop-debugsource, kernel-desktop-devel, kernel-desktop-devel-debuginfo, kernel-devel, kernel-ec2-devel, kernel-pae, kernel-pae-base, kernel-pae-base-debuginfo, kernel-pae-debuginfo, kernel-pae-debugsource, kernel-pae-devel, kernel-pae-devel-debuginfo, kernel-source, kernel-source-vanilla, kernel-syms, kernel-trace, kernel-trace-base, kernel-trace-base-debuginfo, kernel-trace-debuginfo, kernel-trace-debugsource, kernel-trace-devel, kernel-trace-devel-debuginfo, kernel-vanilla, kernel-vanilla-base, kernel-vanilla-base-debuginfo, kernel-vanilla-debuginfo, kernel-vanilla-debugsource, kernel-vanilla-devel, kernel-vanilla-devel-debuginfo, kernel-vmi-devel, kernel-xen, kernel-xen-base, kernel-xen-base-debuginfo, kernel-xen-debuginfo, kernel-xen-debugsource, kernel-xen-devel, kernel-xen-devel-debuginfo, preload-kmp-default, preload-kmp-desktop Products: openSUSE 11.3 (debug, i586, x86_64) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=584720 https://bugzilla.novell.com/show_bug.cgi?id=584720#c Swamp Workflow Management <swamp@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard|maint:running:35398:moderat |maint:released:11.3:35403 |e maint:released:11.3:35403 | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com