[Bug 784982] New: nfsd4: utime sometimes takes 40+ seconds
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c0 Summary: nfsd4: utime sometimes takes 40+ seconds Classification: openSUSE Product: openSUSE 12.2 Version: Final Platform: All OS/Version: openSUSE 12.2 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: mawa@uni-freiburg.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1 my homedir is nfs4-mounted from a 12.2-i386 server. my desktop-pc has 12.2-x64. if i terminate mutt there is a very long delay. strace shows: 13:23:51.482867 utime("/home/mawa/mail/mbox", [2012/10/13-13:23:25, 2012/10/13-13:03:06] 13:26:01.981866 close(3) = 0 Reproducible: Always Steps to Reproduce: 1.nfs4-mount homedir from 12.2-i386 server on a 12.2-x64 pc 2.start mutt 3.terminate mutt Actual Results: exit of mutt is delayed in utime call. Expected Results: no delay the problems is well known. see e.g.: http://www.spinics.net/lists/linux-nfs/msg32092.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.pr |nfbrown@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c1 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #1 from Neil Brown <nfbrown@suse.com> 2013-07-15 23:09:33 UTC --- The fix in the thread that you linked to is for a bug that was introduced in Linux 3.5. As 12.2 used 3.4, the bug should not be present, and certainly the fix would not be appropriate. Still the symptoms do look very similar. I'll see if I can get a 12.2 installation to experiment with. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c2 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |mawa@uni-freiburg.de --- Comment #2 from Neil Brown <nfbrown@suse.com> 2013-07-16 07:45:23 UTC --- I cannot reproduce this. Can you get a tcpdump of the network traffic. i.e. run tcpdump -w /tmp/mutt.pcap -s 0 -i INTERFACENAME port 2049 then perform the mutt test. Then interrupt the tcpdump, and attached the mutt.pcap file. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c3 --- Comment #3 from Martin Walter <mawa@uni-freiburg.de> 2013-07-19 10:07:15 UTC --- Created an attachment (id=548627) --> (http://bugzilla.novell.com/attachment.cgi?id=548627) little test programm -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c4 --- Comment #4 from Martin Walter <mawa@uni-freiburg.de> 2013-07-19 10:09:10 UTC --- Created an attachment (id=548629) --> (http://bugzilla.novell.com/attachment.cgi?id=548629) tcpdump -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c5 --- Comment #5 from Martin Walter <mawa@uni-freiburg.de> 2013-07-19 10:09:52 UTC --- Created an attachment (id=548630) --> (http://bugzilla.novell.com/attachment.cgi?id=548630) strace -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c6 Martin Walter <mawa@uni-freiburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|mawa@uni-freiburg.de | --- Comment #6 from Martin Walter <mawa@uni-freiburg.de> 2013-07-19 10:14:21 UTC --- The problem occurs also in the little test program. It's from http://www.spinics.net/lists/linux-nfs/msg32092.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c7 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |mawa@uni-freiburg.de --- Comment #7 from Neil Brown <nfbrown@suse.com> 2013-07-23 02:12:52 UTC --- Thanks. That looks exactly like the post you linked to - but that bug shouldn't exist in the 12.2 kernel. Exactly what kernel are you running on the server? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c8 Martin Walter <mawa@uni-freiburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|mawa@uni-freiburg.de | --- Comment #8 from Martin Walter <mawa@uni-freiburg.de> 2013-07-23 15:02:44 UTC --- $ rpm -qa 'kernel*' kernel-default-3.4.47-2.38.1.x86_64 kernel-firmware-20120719git-2.9.1.noarch $ uname -a Linux z3 3.4.47-2.38-default #1 SMP Fri May 31 20:17:40 UTC 2013 (3961086) x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/SuSE-release openSUSE 12.2 (x86_64) VERSION = 12.2 CODENAME = Mantis -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c9 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |mawa@uni-freiburg.de --- Comment #9 from Neil Brown <nfbrown@suse.com> 2013-07-29 05:13:28 UTC --- Thanks. That kernel definitely doesn't have the bug. So there must be another problem. The problem is that when the server tries to call back to the client to ask it to release the delegation it was given, the delegation doesn't get returned. As you are using gss authentication I'd guess something is going wrong with that. The server should check that it can contact the client and get a reply, and not give a delegation unless that works. So if there is a problem with the back channel, it should still work but not use delegations. That doesn't seem to be happening. Unfortunately I cannot see the back channel traffic as it uses a different port than 2049. Could you do the 'tcpdump' again but dont' specify 'port 2049'. That might collect a lot more traffic, but will at least collect all that I need. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c10 Martin Walter <mawa@uni-freiburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #548629|0 |1 is obsolete| | Attachment #548630|0 |1 is obsolete| | --- Comment #10 from Martin Walter <mawa@uni-freiburg.de> 2013-07-29 13:48:16 UTC --- Created an attachment (id=549911) --> (http://bugzilla.novell.com/attachment.cgi?id=549911) tcpdump2 and strace2 - with all ports -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c11 --- Comment #11 from Neil Brown <nfbrown@suse.com> 2013-07-30 00:22:30 UTC --- Thanks. There is no sign of any call-back attempt in that trace, so there is definitely something wrong on the server. This is different from the other bug as in that case there was a callback, but it used the wrong authentication. My guess is that something goes with with gss signing. The server must have managed to sign the 'null' probe request to check that the call-back was working This would have happened before you started the packet capture. However when it tries callback to reclaim the delegation I guess that the rpc.svcgssd doesn't work for some reason. Are there any 'gss' related errors in logs on the server? Maybe kill rpc.svcgssd and rerun it as rpc.svcgssd -f -vvvv -rrrr 2>&1 | tee /tmp/svgssd.log and then try the experiment. Maybe the log will contain something useful. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c12 Martin Walter <mawa@uni-freiburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|mawa@uni-freiburg.de | --- Comment #12 from Martin Walter <mawa@uni-freiburg.de> 2013-08-08 09:18:12 UTC --- here the log from start of rpc.svcgssd: ############### Warning: rpcsec_gss library does not support setting debug level entering poll leaving poll handling null request svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel sname = mawa@HOME DEBUG: serialize_krb5_ctx: lucid version! prepare_krb5_rfc4121_buffer: protocol 1 prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32 doing downcall mech: krb5, hndl len: 4, ctx len 52, timeout: 1376039268 (86385 from now), clnt: <null>, uid: 1003, gid: 1000, num aux grps: 4: ( 1) 1000 ( 2) 33 ( 3) 1001 ( 4) 120 sending null reply writing message: \x \x6082025006..... finished handling null request entering poll leaving poll handling null request svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel sname = host/m1.home@HOME DEBUG: serialize_krb5_ctx: lucid version! prepare_krb5_rfc4121_buffer: protocol 1 prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32 doing downcall mech: krb5, hndl len: 4, ctx len 52, timeout: 1376037370 (84487 from now), clnt: host@m1.home, uid: -1, gid: -1, num aux grps: 0: sending null reply writing message: \x \x6082026206...... finished handling null request entering poll ############ the utime-test program caused *no* output. I suggest that we close this bug and I upgrade both server and client to 12.3 . If the bug is also in 12.3 we could reopen the bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c13 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |WORKSFORME --- Comment #13 from Neil Brown <nfbrown@suse.com> 2013-08-14 01:32:51 UTC --- OK - closed. Sorry we couldn't sort this out. Nothing looks obviously wrong in the gssd trace. Do re-open if it doesn't work with 12.3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c14 Joschi Brauchle <joschibrauchle@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High Status|RESOLVED |REOPENED CC| |joschibrauchle@gmx.de Component|Kernel |Kernel AssignedTo|nfbrown@suse.com |kernel-maintainers@forge.pr | |ovo.novell.com Resolution|WORKSFORME | Product|openSUSE 12.2 |openSUSE 12.3 Target Milestone|--- |Final Severity|Normal |Major OS/Version|openSUSE 12.2 |openSUSE 12.3 --- Comment #14 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-10 22:33:31 UTC --- Sorry for reopening this without testing explicitly on openSUSE 12.3. We are experiencing this exact problem on **SLES11SP3** as NFS server, and openSUSE 12.3 clients. The problem does **not** appear on SLES11SP1 as NFS server. After a discussion on the linux-nfs mailing list (http://www.spinics.net/lists/linux-nfs/msg39040.html), there seem to be two results: 1) rpc.gssd is not running on the nfs server. Without it, delegations are not available. The delay seems to be caused by the client waiting for the delegation to timeout. => The nfsserver init script for openSUSE (12.x) and SLES11SPx does not start rpc.gssd on the nfs server (only on the nfs client) when using NFSv4. Is this by correct and by design? According to the linux-nfs list, rpc.gssd should be running on client and server to support delegations. 2) Although rpc.gssd is not started on the server, it looks like the server is giving out delegations although it should not. This seems to be the reason why the client then has to wait for its delegation to time out on the utime() call. If you feel like this bug should be reported for SLES direcly, please let me know. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c15 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO CC| |nfbrown@suse.com InfoProvider| |joschibrauchle@gmx.de --- Comment #15 from Neil Brown <nfbrown@suse.com> 2013-09-11 05:53:41 UTC --- Thanks for pointing out the probable cause. Yes, we do need to run rpc.gssd for a server, not just for a client. I'll attach a tar archive with two files:nfs.init and nfsserver.init. Please copy them to /etc/init.d/nfs and /etc/init.d/nfsserver and confirm that rpc.gssd is successfully started on the server. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c16 --- Comment #16 from Neil Brown <nfbrown@suse.com> 2013-09-11 05:54:23 UTC --- Created an attachment (id=556849) --> (http://bugzilla.novell.com/attachment.cgi?id=556849) updated init.d files -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c17 --- Comment #17 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-11 07:49:27 UTC --- Hello, I will test the files soon. As I have clients connected to the NFS server, I have to wait for some low usage time. Just one question about nfsserver.init: Under "start)" you added "do_start_gssd" to start rpc.gssd. But under "stop)" you now have ----------------- # svcgssd if [ "$NEED_SVCGSSD" = yes ]; then if [ ! -f $GSSD_CLIENT_STATE ]; then echo -n " svcgssd" killproc $SVCGSSD_BIN || { rc_status -v rc_exit } fi rm -f $GSSD_SERVER_STATE fi ----------------- where only rpc.svcgssd is stopped. What about rpc.gssd? Also, the new if statement before stopping rpc.svcgssd seems odd, is this correct? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c18 --- Comment #18 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-11 08:03:46 UTC --- I'd also like to make a suggestion to the init files (client and server) to make it a little more foolproof: There is a "RPC_PIPEFS_DIR" variable in /etc/sysconfig/nfs and the rpc.gssd daemon has a "-p pipedir" switch, which tells rpc.gssd where to look for the rpc_pipefs filesystem. The default value is "/var/lib/nfs/rpc_pipefs". Changing "RPC_PIPEFS_DIR" to something other than the default will make rpc.gssd fail to start, if not "-p $RPC_PIPEFS_DIR" is also specified. Thus, the init scripts of nfs client and server should automatically add "-p $RPC_PIPEFS_DIR" when starting rpc.gssd. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c19 --- Comment #19 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-12 13:44:25 UTC --- I would like to add another problems: 1) The init script "nfs" unmount the RPC_PIPEFS_DIR on the stop command, without checking if an nfsserver is running. 2) The init script of the "nfsserver" mounts RPC_PIPEFS_DIR on start, but never unmounts it on stop. The RPC_PIPEFS_DIR should be unmounted if and only if no daemon needed by the nfs client nor server is using it! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c20 Joschi Brauchle <joschibrauchle@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |REOPENED InfoProvider|joschibrauchle@gmx.de | --- Comment #20 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-13 08:47:24 UTC --- After testing for about 3 days in production use with about 20 NFS client machines, we made the following observation: The hangs on "utime()" calls have **not** disappeared by simply starting rpc.gssd on the server. The problem persists! I seems like a) on machines that are already connected to the NFS server when rpc.gssd is started, the hangs dissappear *mostly*. That is, running the utime-test-program causes about 1 spurious hang every 10 minutes. b) on machines that connect to the NFS server at a later time (rpc.gssd already running on the server), the hangs seem appear every "utime()" call. The server emits spurious "kernel: Please check user daemon is running." messages, although rpc.gssd is running. Thus, running a kerberized NFS server on SLES11SP3 is currently not possible in production as the client workstations are not usable. We have to revert so SP1 and wait for this issue to be fixed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c21 --- Comment #21 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-15 19:32:07 UTC --- Created an attachment (id=558013) --> (http://bugzilla.novell.com/attachment.cgi?id=558013) modified init files w/ comment 17,18,19 Please find a modified version of nfs.init and nfsserver.init attached. The files include the suggestions I made in comments 17,18 and 19. I can confirm that "rpc.gssd" is started and stopped correctly for the nfsserver with the nfsserver.init script. The interaction with an nfs client on the same machine has not been tested. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c22 --- Comment #22 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-15 19:45:22 UTC --- Do I need to open an explicit bug report for SLES11SPx for these changes to be included there? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c23 --- Comment #23 from Joschi Brauchle <joschibrauchle@gmx.de> 2013-09-26 12:40:25 UTC --- Service request 10857595131 was opened with Novell for SLES11SPx. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784982 https://bugzilla.novell.com/show_bug.cgi?id=784982#c27 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED CC| |tiwai@suse.com Resolution| |FIXED --- Comment #27 from Takashi Iwai <tiwai@suse.com> 2014-03-17 16:19:36 UTC --- OK, let's close as this bug should have been already fixed. Please reopen if the problem is reproduced in the updated stack. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com