[Bug 942178] New: NFS4 Java file lock problem with kernel 3.16.7-24.1
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 Bug ID: 942178 Summary: NFS4 Java file lock problem with kernel 3.16.7-24.1 Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: openSUSE 13.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: timo.boehme@ontochem.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- When locking a file on a NFS4 mounted file system via Java FileLock class under Linux kernel 3.16.7-24.1 this is not recognized by another Java program on another OpenSuse computer with kernel 3.16.7-21.1. Going back to kernel 3.16.7-21.1 all is fine. The NFS4 file system is mounted as rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=XXX.XXX.XXX.XXX,local_lock=none,addr=XXX.XXX.XXX.XXX on both machines. According to http://stackoverflow.com/questions/23562369/is-a-filelock-a-posix-advisory-f... the Java FileLock class (Sun Java 6) uses POSIX fcntl locks. The problem is critical because the standard Java Logging implementation is now broken. The implementation tests if it can exclusively lock the .lck file which secures a lock file. If the machine with kernel 3.16.7-24.1 created this file (and locked it) the other machine gets nevertheless an exclusive lock on the .lck file assuming it created the file and moves the lock file from the first process away. Somehow it seems like the local_lock parameter is set to another value than 'none' despite the fact that the mount option was 'none'. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timo.boehme@ontochem.com, | |tiwai@suse.com Flags| |needinfo?(timo.boehme@ontoc | |hem.com) --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Through a quick glance, I could find only one patch that is relevant with NFS4, and it's not clear whether this really affects. I created a KMP for NFS modules with the patch revert in OBS home:tiwai:bnc942178 repo. The KMPs are being built now, so grab the KMP corresponding to your running kernel, install it and retest. After installing the KMP, check whether the updated nfs is used, e.g. by checking "modinfo nfs | grep filename". It should show the update path, /lib/modules/3.16..../updates/.... If this works, the culprit is the patch. If not, it's somewhere else. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nfbrown@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c2 --- Comment #2 from Timo Böhme <timo.boehme@ontochem.com> --- Thanks for the quick fix. After installing and restarting the system the locking worked again. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c3 --- Comment #3 from Takashi Iwai <tiwai@suse.com> --- Thanks. So the patch 0001-NFSv4-When-returning-a-delegation-don-t-reclaim-an-i.patch seems to have some side-effect. Unfortunately Neil is now on vacation, IIRC... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c4 --- Comment #4 from Neil Brown <nfbrown@suse.com> --- Not on vacation quite yet - a few hours to go. This is really strange. That patch really fixes a bug and should have no negative consequences. The fact that does strongly suggests that there is a bug else where that has been hiding. It is possible to get a tcpdump trace showing the problem? i.e on one client run tcpdump -w /tmp/tcp.trace -iany -s 0 port 2049 & then run the java program on both this and the other client in a way that demonstrates the problem. Then kill the tcpdump, and compress and attach /tmp/tcp.trace ?? I might try to have a look next week to see what is happening. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c5 Timo Böhme <timo.boehme@ontochem.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(timo.boehme@ontoc | |hem.com) | --- Comment #5 from Timo Böhme <timo.boehme@ontochem.com> --- Created attachment 644276 --> http://bugzilla.opensuse.org/attachment.cgi?id=644276&action=edit tcpdump traces of NFS communications while locking/release-locking of a file Here are 4 TCP traces of locking a file on NFS file system and releasing the lock with a small Java test program. 2 traces show locking with kernel 3.16.7-24 which were not recognized by applications on other computers, one with the same kernel which was recognized by others and one with older kernel 3.16.7-21 which works in every case for comparison. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c6 --- Comment #6 from Timo Böhme <timo.boehme@ontochem.com> --- Created attachment 644279 --> http://bugzilla.opensuse.org/attachment.cgi?id=644279&action=edit Java test program for locking a file Java test program used to test exclusive file lock (run with Sun Java 6 and OpenJDK 1.7). Simply provide name of file to lock as parameter. File lock is released after pressing 'ENTER'. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c7 --- Comment #7 from Timo Böhme <timo.boehme@ontochem.com> --- Some observations: - lock is working on same computer/client; only other clients don't see the lock - during testing sometimes the lock worked also globally, mostly after several tries or when other clients got a lock before; after waiting approx. half a minute or so trying to lock the file worked only locally again; I have added such a working condition as trace in the attachements -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 Martin Pluskal <mpluskal@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mpluskal@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c8 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #8 from Neil Brown <nfbrown@suse.com> --- (back from vacation and conference and jetlag...) Thanks for the traces. They show that when there is a problem, the server is providing a "write" delegation. Which is interesting as that patch should change behaviour at all for "write" delegations. ...except that I now see that patch is horribly broken. I clearly needed a holiday :-( I'll fix it, do some testing, and let you know when I have something else for you to try. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=942178 http://bugzilla.opensuse.org/show_bug.cgi?id=942178#c9 --- Comment #9 from Neil Brown <nfbrown@suse.com> --- It turns out this has been fixed upstream and the fix looks quite convincing. I have submitted that fix for the 13.2 kernel. I don't know how to make a kmp, I think a new kernel will appear in http://download.opensuse.org/repositories/Kernel:/openSUSE-13.2/standard/x86... in a couple of day. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com