[Bug 272268] New: lockd/statd: failed to create /var/lib/nfs/sm/: err=-21
https://bugzilla.novell.com/show_bug.cgi?id=272268 Summary: lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 Product: openSUSE 10.2 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: knweiss@science-computing.de QAContact: qa@suse.de We have a problem with openSUSE 10.2's latest kernel (2.6.18.8-0.3-default) as a NFS server and HP-UX 11.11 as a NFS client. Accessing files from the HP-UX machine running the CAD application Catia V5 (which uses locking) results in the following kernel message: lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 ^missing hostname!!! and the file access fails. The creation of the hostname file in the sm directoy failes because the hostname is the empty string (I've verified this by putting some debug code in nsm_create() function of the SUSE kernel's lockd patch). The filesystem gets exported with the following options: /net/XXX/fs1 @XXX(rw,insecure,sync,insecure_locks,no_subtree_check) The same HP-UX NFS client does NOT have this problem with a SUSE 9.3 based NFS server (kernel 2.6.11.4-21.15-smp)! Our current workaround for this problem is: echo "0" >/proc/sys/fs/nfs/nsm_use_hostnames By clearing this sysctl flag the IP adresses (instead of the hostnames) of the NFS clients will be used to populate the /var/lib/nfs/sm directory. This works so far. Here's the code from the SUSE 10.2 kernel patch which creates the filenames used: +/* + * Build the NSM file name + */ +static char * +nsm_filename(struct nsm_handle *nsm) +{ + char *name; + + name = (char *) __get_free_page(GFP_KERNEL); + if (name == NULL) + return ERR_PTR(-ENOMEM); + + if (nsm_use_hostnames) { + snprintf(name, PAGE_SIZE, "%s/%s", + NSM_SM_PATH, nsm->sm_name); + } else { + /* FIXME IPV6 */ + snprintf(name, PAGE_SIZE, "%s/%u.%u.%u.%u", + NSM_SM_PATH, + NIPQUAD(nsm->sm_addr.sin_addr)); + } + return name; +} In our error case nsm->sm_name is the empty string but nsm->sm_addr.sin_addr contains the correct IP address of the client i.e. then the file creation in nsm_create() succeeds. How does the NFS server determine the hostname of the client in this case? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |knweiss@science-computing.de ------- Comment #1 from lmb@novell.com 2007-05-08 08:18 MST ------- Does this bug persist with the 10.3 kernel? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 knweiss@science-computing.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|knweiss@science-computing.de| ------- Comment #2 from knweiss@science-computing.de 2007-05-08 09:38 MST ------- First, two small additional comments regarding my original posting: 1. We've tested UDP and TCP mounts on 10.2 - doesn't make a difference. 2. Accessing the export from the shell (changing the directory to trigger the nfs mount via automounter) works fine even on HP-UX 11.11! I.e. we only noticed this problem when we access the nfs export with Catia V5 so far. Now to your question: Well, I did not follow the 10.3 changes regarding nfs. I've now simply installed kernel 2.6.21-3-default on the 10.2 test machine and did not change any other package. Upon accessing the nfs export with catia(!) from the HP-UX 11.11 nfs client (wu0c0202) I get this on the nfs server: statd: server localhost not responding, timed out lockd: cannot monitor wu0c0202 statd: server localhost not responding, timed out lockd: cannot monitor wu0c0202 statd: server localhost not responding, timed out lockd: cannot monitor wu0c0202 statd: server localhost not responding, timed out lockd: cannot monitor wu0c0202 I've also noticed that /proc/sys/fs/nfs/nsm_use_hostnames is set to "0" by default now. Changing it to "1" doesn't make a difference however: The access with Catia V5 does not work and I don't see any files in /var/lib/nfs/sm/. Accessing the export from another Linux machine (e.g. suse 9.3) (or the HP-UX client without Catia) works fine but I don't see any files in /var/lib/nfs/sm/ either. Do you want me to update nfs-utils or other packages as well? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel- |nfbrown@novell.com |maintainers@forge.provo.nove| |ll.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 ------- Comment #3 from knweiss@science-computing.de 2007-05-11 02:52 MST ------- BTW: I have now appended fs.nfs.nsm_use_hostnames = 0 to /etc/sysctl.conf to activate the workaround persistently on our test machine. When I execute "/etc/init.d/boot.sysctl start" interactively this works fine. However, it does not work during the boot process because /etc/init.d/boot.sysctl gets executed before /etc/init.d/nfsserver which loads the nfs kernel modules which provide the /proc/sys/fs/nfs/nsm_use_hostnames file. IMHO this is also a bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |knweiss@science-computing.de ------- Comment #4 from nfbrown@novell.com 2007-05-21 22:30 MST ------- The host name (sm_name) is taken from the the lock request. i.e. the client supplies it's own name. Apparently HP-UX is supplying an empty client name. You can confirm this by using tcpdump to capture the traffic. Find the port number that lockd is using (use rpcinfo -p) and the on the server: tcpdump -s 0 -w /tmp/trace host CLIENTIP and port LOCKD_PORT The run the test that fails. Look in /tmp/trace using wireshare aka ethereal. If you attach that file to this bug I can confirm what is happening. If your client cannot be trusted to provide a valid name, then setting nsm_use_hostnames to 0 is the correct work-around. I'm surprised we made '1' the default actually. There are cases where it is needed, but in most cases '0' is safer. If you add the line install lockd modprobe --ignore-install lockd $CMDLINE_OPTS ; sysctl -q -e -p ; to /etc/modprobe.conf.local, it will make sure that sysctl setting gets set correctly. Setting NEEDINFO for result of tcpdump. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 ------- Comment #5 from knweiss@science-computing.de 2007-05-22 01:20 MST ------- Thanks for your answer and the interesting modprobe idea! Regarding the nsm_use_hostnames=1 default: The reason probably can be found in the source code comment quoted in the bug report above: /* FIXME IPV6 */ Regarding your comment "Apparently HP-UX is supplying an empty client name": The funny thing is that we don't have this problem e.g. with the SUSE 9.3 (latest errata) kernel with the *same* HPUX client. Anyway, I'll trace the locking later today and we'll see... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 knweiss@science-computing.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|knweiss@science-computing.de| ------- Comment #6 from knweiss@science-computing.de 2007-05-22 08:32 MST ------- Created an attachment (id=141500) --> (https://bugzilla.novell.com/attachment.cgi?id=141500&action=view) requested tcpdump (both udp and tcp nlockmgr port) The attached tarball contains the (brief) communication on the nlockmgr port. I've first traced the tcp port number and during a second try the udp port number because the tcp trace is basically empty. The udp trace shows the HPUX client hostname (wu0c0202). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |knweiss@science-computing.de ------- Comment #7 from nfbrown@novell.com 2007-05-24 20:18 MST ------- Thanks for the tcpdump trace. Clearly my guess was wrong. I cannot find any other way that the hostname would be empty - yet. Could you test (with nsm_use_hostnames=1) again, but with sunrpc.nlm = 1023 e.g. echo 1023 > /proc/sys/sunrpc/nlm before running the test. That will produce more trace output in the kernel logs which should be useful. BTW, I would not expect a 10.3 kernel to work with 10.2 userspace in this instance. In 10.3, statd will be un userspace, not in the kernel as it is with 10.2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 knweiss@science-computing.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|knweiss@science-computing.de| ------- Comment #8 from knweiss@science-computing.de 2007-05-25 02:09 MST ------- With echo 1023 > /proc/sys/sunrpc/nlm_debug NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period lockd: request from 353c99d3 lockd: LOCK_MSG called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=) lockd: host garbage collection lockd: nlmsvc_mark_resources lockd: LOCK called lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c99d3) lockd: 90 callback returned 0 lockd: release host lockd: request from 353c99d3 lockd: LOCK_MSG called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=) lockd: get host lockd: LOCK called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=wu0c0202) lockd: get host lockd: nsm_monitor() lockd: creating statd monitor file for lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 nsm_monitor() failed: errno=21 lockd: release host lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c99d3) lockd: 92 callback returned 0 lockd: release host lockd: request from 353c99d3 lockd: LOCK_MSG called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=) lockd: get host lockd: LOCK called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=wu0c0202) lockd: get host lockd: nsm_monitor() lockd: creating statd monitor file for lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 nsm_monitor() failed: errno=21 lockd: release host lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c99d3) lockd: 93 callback returned 0 lockd: release host lockd: request from 353c99d3 lockd: LOCK_MSG called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=) lockd: get host lockd: LOCK called lockd: nlm_lookup_host(53.60.153.211, p=17, v=4, my role=server, name=wu0c0202) lockd: get host lockd: nsm_monitor() lockd: creating statd monitor file for lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 nsm_monitor() failed: errno=21 lockd: release host lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c99d3) lockd: 94 callback returned 0 lockd: release host -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #9 from nfbrown@novell.com 2007-05-25 03:38 MST ------- Thanks. I can see exactly what is happening now. The fact that LOCK_MSG is the first locking call makes a mess of things. I will try to come up with a fix, it is unlikely to be until after the weekend. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 ------- Comment #10 from nfbrown@novell.com 2007-05-27 20:40 MST ------- Created an attachment (id=142447) --> (https://bugzilla.novell.com/attachment.cgi?id=142447&action=view) Patch to avoid empty hostname -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |knweiss@science-computing.de ------- Comment #11 from nfbrown@novell.com 2007-05-27 20:41 MST ------- Could you please try the above patch? I would really need an HP-UX client to test it properly, and I don't have one of those. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 knweiss@science-computing.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|knweiss@science-computing.de| ------- Comment #12 from knweiss@science-computing.de 2007-05-29 02:50 MST ------- Unfortunately, it did not work. Here's the output. ("KAW" is a dprintk from me to see if the modified lockd kernel module gets used or not) Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period lockd: request from 353c98dc lockd: LOCK_MSG called lockd: LOCK called KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: host garbage collection lockd: nlmsvc_mark_resources lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c98dc) lockd: 92 callback returned 0 lockd: release host lockd: request from 353c98dc lockd: LOCK_MSG called lockd: LOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: get host lockd: nsm_monitor() lockd: creating statd monitor file for lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 nsm_monitor() failed: errno=21 lockd: release host KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c98dc) lockd: 94 callback returned 0 lockd: release host lockd: request from 353c98dc lockd: LOCK_MSG called lockd: LOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: get host lockd: nsm_monitor() lockd: creating statd monitor file for lockd/statd: failed to create /var/lib/nfs/sm/: err=-21 nsm_monitor() failed: errno=21 lockd: release host However, if I go to the nfs destination directory in a shell on the HPUX client before I do the test in catiav5 it suddenly works fine: lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host lockd: call procedure 12 on (async) lockd: nlm_bind_host(353c98dc) lockd: 104 callback returned 0 lockd: release host lockd: request from 353c98dc lockd: LOCK_MSG called lockd: LOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: host garbage collection lockd: nlmsvc_mark_resources lockd: delete host lockd: nsm_monitor(w0008028) lockd: creating statd monitor file for w0008028 lockd: nlm_file_lookup (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: creating file for (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: found file ffff810426e2ca80 (count 0) lockd: nlmsvc_lock(sdb1/135, ty=0, pi=1674, 0-9223372036854775807, bl=0) lockd: nlmsvc_lookup_block f=ffff810426e2ca80 pd=1674 0-9223372036854775807 ty=0 lockd: posix_lock_file returned 0 lockd: nlmsvc_lock returned 0 lockd: LOCK status 0 lockd: release host w0008028 lockd: nlm_release_file(ffff810426e2ca80, ct = 1) KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host w0008028 lockd: call procedure 12 on w0008028 (async) lockd: nlm_bind_host(353c98dc) lockd: 105 callback returned 0 lockd: release host w0008028 lockd: request from 353c98dc lockd: UNLOCK_MSG called lockd: UNLOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: get host w0008028 lockd: nlm_file_lookup (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: found file ffff810426e2ca80 (count 0) lockd: nlmsvc_unlock(sdb1/135, pi=1674, 0-9223372036854775807) lockd: nlmsvc_cancel(sdb1/135, pi=1674, 0-9223372036854775807) lockd: nlmsvc_lookup_block f=ffff810426e2ca80 pd=1674 0-9223372036854775807 ty=2 lockd: UNLOCK status 0 lockd: release host w0008028 lockd: nlm_release_file(ffff810426e2ca80, ct = 1) lockd: closing file sdb1/135 KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host w0008028 lockd: call procedure 14 on w0008028 (async) lockd: nlm_bind_host(353c98dc) lockd: 107 callback returned 0 lockd: release host w0008028 lockd: request from 353c98dc lockd: LOCK_MSG called lockd: LOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: get host w0008028 lockd: nsm_monitor(w0008028) lockd: nlm_file_lookup (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: creating file for (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: found file ffff8104242c25c0 (count 0) lockd: nlmsvc_lock(sdb1/135, ty=0, pi=1674, 0-9223372036854775807, bl=0) lockd: nlmsvc_lookup_block f=ffff8104242c25c0 pd=1674 0-9223372036854775807 ty=0 lockd: posix_lock_file returned 0 lockd: nlmsvc_lock returned 0 lockd: LOCK status 0 lockd: release host w0008028 lockd: nlm_release_file(ffff8104242c25c0, ct = 1) KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host w0008028 lockd: call procedure 12 on w0008028 (async) lockd: nlm_bind_host(353c98dc) lockd: 108 callback returned 0 lockd: release host w0008028 lockd: request from 353c98dc lockd: UNLOCK_MSG called lockd: UNLOCK called lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=w0008028) lockd: get host w0008028 lockd: nlm_file_lookup (01000001 11000800 00000080 00000087 00000001 00000000 00000000 00000000) lockd: found file ffff8104242c25c0 (count 0) lockd: nlmsvc_unlock(sdb1/135, pi=1674, 0-9223372036854775807) lockd: nlmsvc_cancel(sdb1/135, pi=1674, 0-9223372036854775807) lockd: nlmsvc_lookup_block f=ffff8104242c25c0 pd=1674 0-9223372036854775807 ty=2 lockd: UNLOCK status 0 lockd: release host w0008028 lockd: nlm_release_file(ffff8104242c25c0, ct = 1) lockd: closing file sdb1/135 KAW lockd: nlm_lookup_host(53.60.152.220, p=17, v=4, my role=server, name=) lockd: get host w0008028 lockd: call procedure 14 on w0008028 (async) lockd: nlm_bind_host(353c98dc) lockd: 109 callback returned 0 lockd: release host w0008028 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268 ------- Comment #13 from knweiss@science-computing.de 2007-05-29 03:01 MST ------- FYI: This nfs directory is provided via the am-utils automounter, i.e. it gets mounted at the moment when I access it from catiav5. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=272268#c14
Neil Brown
participants (1)
-
bugzilla_noreply@novell.com