[Bug 849387] New: NFS file systems unmounted 15 minutes after reboot
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c0 Summary: NFS file systems unmounted 15 minutes after reboot Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Normal Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: R.Vickers@cs.rhul.ac.uk QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 We have a server which is both an NFS server and NFS client. After booting all the NFS file systems in fstab successfully mount, but 15 minutes later an sm-notify failure causes them to be unmounted again. This has happened after at least 2 reboots. So there are 2 mysteries here: (1) Why does sm-notify fail? (2) Why does this failure cause systemd to tear down the NFS client services? The key messages in the log are 2013-11-07T08:31:59.117550+00:00 teaching sm-notify[1953]: Version 1.2.7 starting 2013-11-07T08:31:59.284828+00:00 teaching sm-notify[1953]: Backgrounding to notify hosts... 2013-11-07T08:31:59.296556+00:00 teaching nfs[1899]: Starting NFS client services: sm-notify idmapd..done 2013-11-07T08:32:04.435916+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec572] RPC status 1 2013-11-07T08:32:04.528118+00:00 teaching sm-notify[2022]: Version 1.2.7 starting 2013-11-07T08:32:04.528564+00:00 teaching sm-notify[2022]: Already notifying clients; Exiting! 2013-11-07T08:32:04.529097+00:00 teaching nfsserver[1900]: Starting kernel based NFS server: idmapd mountd statd nfsd sm-notify..done 2013-11-07T08:32:11.616881+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec573] RPC status 1 2013-11-07T08:32:15.621965+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec574] RPC status 1 2013-11-07T08:32:23.632100+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec575] RPC status 1 2013-11-07T08:32:39.644970+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec576] RPC status 1 2013-11-07T08:33:11.658055+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec577] RPC status 1 2013-11-07T08:34:15.724413+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec578] RPC status 1 2013-11-07T08:36:15.826645+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec579] RPC status 1 2013-11-07T08:38:15.919664+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec57a] RPC status 1 2013-11-07T08:40:15.960524+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec57b] RPC status 1 2013-11-07T08:42:16.062979+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec57c] RPC status 1 2013-11-07T08:44:16.165623+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec57d] RPC status 1 2013-11-07T08:46:16.253668+00:00 teaching sm-notify[1960]: nsm_parse_reply: [0x527ec57e] RPC status 1 2013-11-07T08:48:16.353764+00:00 teaching sm-notify[1960]: Unable to notify mailhost.cs.rhul.ac.uk, giving up 2013-11-07T08:48:16.465687+00:00 teaching nfs[7573]: Shutting down NFS client services:umount.nfs4: /rmt/csfiles/pgrads: device is busy 2013-11-07T08:48:16.497619+00:00 teaching nfs[7573]: umount.nfs4: /rmt/csfiles/staff: device is busy 2013-11-07T08:48:16.567285+00:00 teaching nfs[7573]: umount: /var/lib/nfs/rpc_pipefs: target is busy. 2013-11-07T08:48:16.568078+00:00 teaching nfs[7573]: (In some cases useful info about processes that use 2013-11-07T08:48:16.568571+00:00 teaching nfs[7573]: the device is found by lsof(8) or fuser(1)) 2013-11-07T08:48:16.569186+00:00 teaching nfs[7573]: ..failed 2013-11-07T08:48:16.569418+00:00 teaching systemd[1]: nfs.service: control process exited, code=exited status=1 2013-11-07T08:48:16.581167+00:00 teaching systemd[1]: Unit nfs.service entered failed state There is only one host which sm-notify is failing to contact. It is an NFS client and the corresponding file in /var/lib/nfs is -rw------- 1 statd nogroup 94 Jul 31 13:20 /var/lib/nfs/sm.bak/mailhost.cs.rhul.ac.uk and its contents are 0100007f 000186b5 00000003 00000010 cbb2ce7f5dc6110000c869900388ffff 134.219.205.131 teaching If I run sm-notify by hand I get: teaching# sm-notify -df -m 1 sm-notify: Version 1.2.7 starting sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Effective UID, GID: 103, 65534 sm-notify: Sending PMAP_GETPORT for 100024, 1, udp sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Host mailhost.cs.rhul.ac.uk due in 2 seconds sm-notify: Received packet... sm-notify: nsm_parse_reply: [0x527306d7] RPC status 1 sm-notify: Host mailhost.cs.rhul.ac.uk due in 2 seconds sm-notify: Sending PMAP_GETPORT for 100024, 1, udp sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Host mailhost.cs.rhul.ac.uk due in 4 seconds sm-notify: Received packet... sm-notify: nsm_parse_reply: [0x527306d8] RPC status 1 sm-notify: Host mailhost.cs.rhul.ac.uk due in 4 seconds sm-notify: Sending PMAP_GETPORT for 100024, 1, udp sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Host mailhost.cs.rhul.ac.uk due in 8 seconds sm-notify: Received packet... sm-notify: nsm_parse_reply: [0x527306d9] RPC status 1 sm-notify: Host mailhost.cs.rhul.ac.uk due in 8 seconds sm-notify: Sending PMAP_GETPORT for 100024, 1, udp sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Host mailhost.cs.rhul.ac.uk due in 16 seconds sm-notify: Received packet... sm-notify: nsm_parse_reply: [0x527306da] RPC status 1 sm-notify: Host mailhost.cs.rhul.ac.uk due in 16 seconds sm-notify: Sending PMAP_GETPORT for 100024, 1, udp sm-notify: Added host mailhost.cs.rhul.ac.uk to notify list sm-notify: Host mailhost.cs.rhul.ac.uk due in 32 seconds sm-notify: Received packet... sm-notify: nsm_parse_reply: [0x527306db] RPC status 1 sm-notify: Host mailhost.cs.rhul.ac.uk due in 32 seconds sm-notify: Unable to notify mailhost.cs.rhul.ac.uk, giving up Reproducible: Always Steps to Reproduce: It is reproducable on my server, but I don't know how to reproduce it on other machines. Both server and client are running opensuse 12.3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c zhang jiajun <jzhang@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jzhang@suse.com AssignedTo|bnc-team-screening@forge.pr |suse-beta@cboltz.de |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c1 Christian Boltz <suse-beta@cboltz.de> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|suse-beta@cboltz.de |nfbrown@suse.com --- Comment #1 from Christian Boltz <suse-beta@cboltz.de> 2013-11-08 13:00:45 CET --- (In reply to comment #1)
So there are 2 mysteries here:
There's a 3rd mystery - why did Jia Jun Zhang assign this bug to me? I don't know much about NFS, and also don't see anything that could be related to AppArmor (which is the only possible reason why this bug could have been assigned to me). I'll reassign to Neil, the maintainer of nfs-utils. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c2 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |fcrozat@suse.com --- Comment #2 from Neil Brown <nfbrown@suse.com> 2013-11-10 23:11:02 UTC --- sm-notify fails presumably because mailhost.cs.rhul.ac.uk doesn't want to talk to it. "RPC status 1" means the rpc request was denied by the server. This could be a configuration problem on mailhost.cs.rhul.ac.uk, but is certainly shouldn't be fatal. Looks like a systemd problem ... or at least a systemd weirdness. Frederic: can you offer any explanation? A child of a process run by /etc/init.d/nfs is failing and this seems to cause systemd to run "/etc/init.d/nfs stop". Surely it shouldn't be doing that! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c3 Frederic Crozat <fcrozat@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW CC| |systemd-maintainers@suse.de InfoProvider|fcrozat@suse.com | --- Comment #3 from Frederic Crozat <fcrozat@suse.com> 2013-12-11 12:58:24 UTC --- for "forking" service (like nfs), systemd tried to track one of the PID as the "main" PID. Usually, it is best handled when a PIDFile exists (and systemd is notified of this pidfile, using PIDFile: in the LSB header) otherwise, systemd will pick "one" PID and uses it as "main" PID. If it disappear, systemd will stop the service. The "best" way to handle this would be to replace the nfs initscript in several service for each daemon started. In the mean time, try adding in the LSB header: # X-Systemd-RemainAfterExit: true This will prevent systemd to "stop" the service if the main (guessed) PID is terminated. Of course, if you have a way to specify the real PID to track, use it instead. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c4 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |R.Vickers@cs.rhul.ac.uk --- Comment #4 from Neil Brown <nfbrown@suse.com> 2013-12-12 02:25:05 UTC --- Thanks Frederic. Bob: can you please check if adding that "# X-Systemd....." line near the top of /etc/init.d/nfs make the problem go away? I've been looking into creating a set of unit files for nfs but it'll be a while before that make it's way into an update. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c5 --- Comment #5 from Bob Vickers <R.Vickers@cs.rhul.ac.uk> 2013-12-12 12:32:59 UTC --- Thanks for the suggestion which I have just implemented. We have scheduled reboots of this server every Thursday morning so we should find out next week if it works (I read your message just to late to do it before today's reboot). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c6 --- Comment #6 from Bob Vickers <R.Vickers@cs.rhul.ac.uk> 2013-12-19 14:50:46 UTC --- That seems to have done the trick. The file /var/lib/nfs/sm.bak/mailhost.cs.rhul.ac.uk still exists, and I still see the message sm-notify[1946]: Unable to notify mailhost.cs.rhul.ac.uk, giving up but now NFS file systems remain mounted. Hoorah! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c7 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|R.Vickers@cs.rhul.ac.uk | --- Comment #7 from Neil Brown <nfbrown@suse.com> 2014-01-21 02:24:57 UTC --- Great! I hope to re-write the nfs init scripts to be systemd unit files soonish. So I won't commit the above change but will just replace everything instead :-) I'll leave this bug open to make sure I remember the make sure sm-notify can fail without aborting everything. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c8 --- Comment #8 from Swamp Workflow Management <swamp@suse.de> 2014-02-12 11:05:15 UTC --- openSUSE-RU-2014:0227-1: An update that has three recommended fixes can now be installed. Category: recommended (moderate) Bug References: 849387,849476,859221 CVE References: Sources used: openSUSE 13.1 (src): nfs-utils-1.2.8-4.9.1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=849387 https://bugzilla.novell.com/show_bug.cgi?id=849387#c9 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #9 from Neil Brown <nfbrown@suse.com> 2014-02-18 02:38:19 UTC --- This is fixed now and my under-development systemd unit files don't have this problem, so closing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com