[Bug 990356] New: Cannot start NFSv4 only server without RPC
http://bugzilla.opensuse.org/show_bug.cgi?id=990356 Bug ID: 990356 Summary: Cannot start NFSv4 only server without RPC Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Network Assignee: bnc-team-screening@forge.provo.novell.com Reporter: trcs@gmx.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Hi. I want to setup a NFSv4 only server without RPC. I have followed the steps from the SUSE Linux Enterprise Server 12 SP1 Release Notes¹: 1) set NFS3_SERVER_SUPPORT=no in /etc/sysconfig/nfs 2) systemctl disable rpcbind.socket However, rpcbind, rpc.mountd, rpc.statd, rpcbind and rpc.idmapd are started, so both 1) and 2) has no effect. rpciod is also running but I don't know if it is related to NFS or not. I have done some tests enabling and disabling rpcbind.socket and rpcbind.service with the following results: socket service result ------------------------------------------ disabled disabled no difference masked disabled system doesn't boot² disabled masked system doesn't boot² masked masked system doesn't boot² I have also removed the references to rpcbind.socket in nfs-server.service but nothing changed. And if I remove all the rpc-xxx.yyy references in that file the system won't boot. When the system can't boot there are the following start up messages: [ OK ] Started NFS Mount Daemon [Failed] Failed to start NFS status monitor for NFSv2/3 locking.. See 'systemctl status rpc-statd.service' for details³ Starting NFS server and services... ... [ **** ] A start job is running for NFS server and services (xxx / no limit) [1] 3.1.4.5 NFSv4 only configuration [2] A start job is running for NFS server and services (xxx / no limit) [3] I cannot run the command because all the VTs are blocked (no login prompt) by the message described in [2] -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c2
--- Comment #2 from jc sl
What exactly do you mean by "with RPC" ?? RPC is the protocol that NFS uses. Without the RPC protocol, there is no NFS.
Bad title, right. What I want is to start only the mandatory programs/services/daemons or whatever is needed by a NFSv4 only server.
rpc.mountd is an intrinsic part of the NFS service, even for NFSv4. If you disable NFSv3 service, then rpc.mountd shouldn't listen for mount requests, but it sill needs to be running.
I'm not sure about this. I have read this in the CentOS and Red Hat documentation: - CentOS: rpc.mountd ─ ... This is not used with NFSv4. - Red Hat: The mounting and locking protocols have been incorporated into the NFSv4 protocol. The rpc.mountd daemon is still required on the NFS server to set up the exports, but is not involved in any over-the-wire operations. I don't understand the last part of the Red Hat's documentation paragraph. I did read quite a bit about NFS some time ago, and IIRC I did setup a perfectly working NFSv4 without rpc.mountd running on 13.2, but my memory is crap. If the mount protocol has been incorporated into the NFSv4 the protocol, why should it be needed at all?
rpcbind is not technically necessary for NFSv4, however "systemctl disable rpcbind.socket" isn't sufficient to disable it. I guess those release notes are wrong.
systemctl mask rpcbind.socket
should stop rpcbind from running. That, in turn, should stop rpc.statd from running, which is only needed for NFSv2 and NFSv3.
As I wrote in my report, if you mask rpcbind.socket the system doesn't boot. It gets stuck in a probably infinite wait: "A start job is running for NFS server and services (xxx / no limit)", with xxx being a timer. I did setup a virtual machine to confirm this behaviour so I think that it is reproducible.
rpc.idmapd, like rpc.mountd, is an intrinsic part of NFSv4 service. It doesn't make sense to try to turn it off.
I agree with rpc.idmpad being mandatory, not sure about rpc.mountd however. References: - https://www.centos.org/docs/5/html/5.2/Deployment_Guide/s2-nfs-how-daemons.h... - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm... Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c3
--- Comment #3 from Neil Brown
I agree with rpc.idmpad being mandatory, not sure about rpc.mountd however.
Having myself written a large part of the code with which the kernel communicates with mountd, I can assure you without a shadow of doubt that rpc.mountd is required for any NFS service. The Redhat documentation you found is correct. The "over-the-wire" reference means that the NFS client doesn't communication over the network directly to mountd, it only communicates with the kernel. But the kernel definitely communicates with mountd. If the NFS server systemd unit isn't starting when rpcbind.socket is masked, that suggests that nfsserver.service still 'Requires' rpcbind, rather than 'Wants' it. This was fixed in late May 2016 What does: rpm -q --changelog nfs-kernel-server | head report? If it doesn't contain * Tue May 24 2016 nfbrown@suse.com - 0001-systemd-Decouple-the-starting-and-stopping-of-rpcbin.patch 0002-systemd-unit-files-fix-up-dependencies-on-rpcbind.patch Fix systemd dependencies to ensure rpcbind is started when needed. (bsc#975265) then it is not up-to-date. If it does, then this should work and I'll need to look deeper. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c4
--- Comment #4 from jc sl
I agree with rpc.idmpad being mandatory, not sure about rpc.mountd however.
Having myself written a large part of the code with which the kernel communicates with mountd, I can assure you without a shadow of doubt that rpc.mountd is required for any NFS service. The Redhat documentation you found is correct. The "over-the-wire" reference means that the NFS client doesn't communication over the network directly to mountd, it only communicates with the kernel. But the kernel definitely communicates with mountd.
Thanks for clarifying this. Some sources made me think that it wasn't needed anymore.
If the NFS server systemd unit isn't starting when rpcbind.socket is masked, that suggests that nfsserver.service still 'Requires' rpcbind, rather than 'Wants' it.
This was fixed in late May 2016 What does:
rpm -q --changelog nfs-kernel-server | head
report? If it doesn't contain
* Tue May 24 2016 nfbrown@suse.com - 0001-systemd-Decouple-the-starting-and-stopping-of-rpcbin.patch 0002-systemd-unit-files-fix-up-dependencies-on-rpcbind.patch Fix systemd dependencies to ensure rpcbind is started when needed. (bsc#975265)
then it is not up-to-date. If it does, then this should work and I'll need to look deeper.
The output of the command does contain that text. If it helps, masking rpcbind.socket on the client does not prevent it from starting either. I have to mask the rpcbind.service explicitly and let the rpcbind.socked disabled because if it is enabled the system doesn't boot (with the aforementioned "A start job is running for NFS server and service..." message) Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c5
Neil Brown
if it is enabled the system doesn't boot
I've been experimenting, and I wonder if this is just an inappropriate delay. Could you try again and wait for at least 7 minutes. When doing some component testing I see a delay for 6.5 minutes that really shouldn't be there, but I haven't managed to trace through the code yet to understand why it is so long, or how to remove it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c6
--- Comment #6 from jc sl
if it is enabled the system doesn't boot
I've been experimenting, and I wonder if this is just an inappropriate delay. Could you try again and wait for at least 7 minutes. When doing some component testing I see a delay for 6.5 minutes that really shouldn't be there, but I haven't managed to trace through the code yet to understand why it is so long, or how to remove it.
After masking rpcbind.socket and reboot the system doesn't boot as usual, but if I wait long enough there appear extra messages: 11min 42s svc: failed to register nfsdv3 RPC service (errno 110). 16min 43s svc: failed to register nfsaclv3 RPC service (errno 110). After 30 minutes I ctrl+alt+backspaced to reboot. Weird thing is that I have not been able to reproduce the issue on the client today. rpcbind is still started when rpcbind.socket is disabled or masked, but the system boot doesn't stop waiting for anything. socket service -------------------------------- disabled enabled → rpcbind starts masked enabled → rpcbind starts masked disabled → rpcbind doesn't start enabled masked → rpcbind doesn't start masked masked → rpcbind doesn't start Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c7
--- Comment #7 from Neil Brown
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c8
--- Comment #8 from jc sl
11min 42s svc: failed to register nfsdv3 RPC service (errno 110).
this message strongly suggests that you haven't disabled NFSv3. i.e. that NFS3_SERVER_SUPPORT=no is not set in /etc/sysconfig/nfs
It should only try to register nfsdv4, not nfsdv3 or nfsaclv3. Obviously this doesn't explain all of the delay but it might explain some of it.
I had already double checked that, but one more time doesn't hurt:
grep NFS3_SERVER_SUPPORT /etc/sysconfig/nfs NFS3_SERVER_SUPPORT="no"
Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c9
--- Comment #9 from Neil Brown
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c10
--- Comment #10 from jc sl
Maybe if you run systemctl restart nfs-config
it will cause the NFS3_SERVER_SUPPORT setting to take effect. That should run more often than it does. The issue is fixed upstream but it haven't bought the change to opensuse yet.
The cause of the long delays I have found to be upstream kernel commit
Commit: 4b0ab51db32e ("SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose")
in Linux 4.3. Reverting that makes the delays go away. I've submitted this revert for the tumbleweed kernel (and SLE12-SP2) so it should appear in a kernel update in a few days.
I ran systemctl restart nfs-config but nothing seems to have changed. If I mask rpcbind.socket and restart I still get ... [ OK ] Starting NFS status monitor for NFSv2/3 locking... [FAILED] Failed to start NFS status monitor for NFSv2/3 locking.. See 'systemctl status rpc-statd.service' for details Starting NFS server and services... ... 11min 40s svc: failed to register nfsdv3 RPC service (errno 110). ... When rpcbind.socket is not masked and the system boots: ... [ OK ] Starting NFS status monitor for NFSv2/3 locking... [ OK ] Starting NFS mount daemon [ OK ] Started NFS status monitor for NFSv2/3 locking... ... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c11
--- Comment #11 from Neil Brown
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c12
--- Comment #12 from jc sl
When the nfs-config service runs, it runs /usr/lib/systemd/scripts/nfs-utils_env.sh
which reads /etc/sysconfig/nfs and writes /run/sysconfig/nfs-utils
so /run/sysconfig/nfs-utils should contain
RPCNFSDARGS= --no-nfs-version 2 --no-nfs-version 3 4
Here that line is different: $ grep RPCNFSDARGS /run/sysconfig/nfs-utils RPCNFSDARGS= --nfs-version 4 --nfs-version 4.1 4 Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c13
--- Comment #13 from Neil Brown
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c16
--- Comment #16 from jc sl
Argg.... Could you please edit /usr/lib/systemd/scripts/nfs-utils_env.sh and change the one instance of VERSION_PARAMS (all uppercase) to version_params (all lower case)
While you are there, if there is still a "version_parms" somewhere (missing 'a'), change it to "version_params" as well.
That shold cause the /run/sysconfig file to be correct, and then NFSv2 and NFSv3 won't be enabled.
Hi. There was one instance of both VERSION_PARAMS and version_parms. One fixed there are two modified lines in /run/sysconfig/nfs-utils. I add the values of the changed variables before and after the edition: # Before RPCMOUNTDARGS= RPCNFSDARGS= --nfs-version 4 --nfs-version 4.1 4 # After RPCMOUNTDARGS= --no-nfs-version 2 --no-nfs-version 3 --nfs-version 4 --nfs-version 4.1 RPCNFSDARGS= --no-nfs-version 2 --no-nfs-version 3 --nfs-version 4 --nfs-version 4.1 4 Now the system boots correctly with rpcbind.socket masked. However it takes its time, more than 7 minutes (that is about 5 minutes more than the usual boot time). It seems that NFSv2/3 locking still wants to be started: ... Starting NFS status monitor for NFSv2/3 locking... Starting NFS mount daemon... ... [ OK ] Started NFS mount daemon [FAILED] Failed to start NFS status monitor for NFSv2/3 locking.. See 'systemctl status rpc-statd.service' for details Starting NFS server and services... ... [ **** ] A start job is running for NFS server and services (xxx / no limit) ... Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c17
--- Comment #17 from jc sl
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c18
--- Comment #18 from Neil Brown
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c19
--- Comment #19 from jc sl
Once you get nfs-client 1.3.4 installed, those changes should stay. The delays caused by rpcbind not running should be in the next kernel update, though I don't know how the timetable for that works.
Ok, I'll wait until it is released.
rpc.statd: Starting NFS status monitor for NFSv2/3 locking...
will always be started by default, but it will fail if rpcbind isn't running. A failure of rpc.statd won't cause nfsdv4 to fail.
Out of curiosity, why is it started unconditionally?
You probably have rpcbind.service still enabled. I'm not sure if you did that or some system thing did. Anyway, if you systemctl disasble rpcbind.server and keep rpcbind.socket masked, it should start at all. Then "NFS status monitor" won't successfully start either, but nfsd will.
Yes, rpcbind.service was enabled. At this point I don't know whether it was the default or I changed it. Once disabled the "Starting NFS status monitor..." message still is shown and fails, and the boot delay gets back. So I'll leave it enabled until the updated packages appear. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c20
--- Comment #20 from Neil Brown
Out of curiosity, why is it started unconditionally?
Because it is not very easy to arrange for something to start conditionally in systemd when the condition is based on a config file. It could be done with a "generator" to parse the config file and create a drop-in which modified the start-up behaviour of some service. But that sort of complexity is best kept for when it is really needed. On the other hand, I just wrote a generator to make sure nfs-server start-up was ordered properly w.r.t. various mount points becoming available. Extending that to only make nfs-server only depend on rpc-statd when v2 or v3 is enabled would certainly be possible... BTW kernel-default-4.7.0-2.2 should fix the delays. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c21
--- Comment #21 from jc sl
Out of curiosity, why is it started unconditionally?
Because it is not very easy to arrange for something to start conditionally in systemd when the condition is based on a config file.
It could be done with a "generator" to parse the config file and create a drop-in which modified the start-up behaviour of some service. But that sort of complexity is best kept for when it is really needed.
On the other hand, I just wrote a generator to make sure nfs-server start-up was ordered properly w.r.t. various mount points becoming available. Extending that to only make nfs-server only depend on rpc-statd when v2 or v3 is enabled would certainly be possible...
BTW kernel-default-4.7.0-2.2 should fix the delays.
Thanks for the explanation. It would be nice if it were started only when needed, but I understand that maintenance is also quite important. If it doesn't have side effects keeping it simple is better. Back on the issue, after the kernel and related packages has been updated the server doesn't start. I think that all comes from this error: nfs-config.service: Failed at step EXEC spawning /usr/libexec/nfs-utils/nfs-utils_env.sh: No such file or directory Indeed the file isn't there. zypper se -f nfs-utils_env.sh shows that the file comes from nfs-client, but rpm -ql nfs-client says that the file is in /usr/lib/systemd/scripts/nfs-utils_env.sh. Greetings. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c22
--- Comment #22 from Neil Brown
nfs-config.service: Failed at step EXEC spawning /usr/libexec/nfs-utils/nfs-utils_env.sh: No such file or directory
Oh dear, that was careless. I knew that had changed upstream, but forgot to allow for it. And 'libexec' doesn't exist on openSUSE, so I need to change it to use /usr/lib/nfs-utils... I've submitted a new update. The built pack should appear in http://download.opensuse.org/repositories/Base:/System/openSUSE_Tumbleweed/ shortly. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c24
jc sl
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
Michal Hlavac
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
http://bugzilla.opensuse.org/show_bug.cgi?id=990356#c25
jc sl
http://bugzilla.opensuse.org/show_bug.cgi?id=990356
jc sl
participants (1)
-
bugzilla_noreply@novell.com