[Bug 819515] New: Non NFS V4 Mounts Stall the Boot Process for Minutes upon Minutes Until Huge Value Time-outs
https://bugzilla.novell.com/show_bug.cgi?id=819515 https://bugzilla.novell.com/show_bug.cgi?id=819515#c0 Summary: Non NFS V4 Mounts Stall the Boot Process for Minutes upon Minutes Until Huge Value Time-outs Classification: openSUSE Product: openSUSE 12.2 Version: Final Platform: x86-64 OS/Version: openSUSE 12.2 Status: NEW Severity: Normal Priority: P5 - None Component: Bootloader AssignedTo: jsrain@suse.com ReportedBy: secure@aphofis.com QAContact: jsrain@suse.com Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0 The new boot loading and starting of services has gone back to stalling when it is unable to mount a NFS Drive when that Device is not running. We had this problem in 11.1/11.2/11.3/11.4...At version 11.4 the mount options were changed from default to massively decrease the time out value and to basically ignore the device IP if it could not be reached. Currently on an X_64 Quad Core the time out period is in excess of 5 minutes ontop of other services being started which are lightening fast. If I didn't know better I would swear the PC had frozen and no fancy SysRqs Key lets you halt or bypass or shutdown the PC. I would think most just reset the hardware and go make a cup of coffee whilst it loads the services again You can ESC the splash screen showing the service started but stalled looking for the NFS device. With or Without the splash screen the very long timeout creates the feeling the PC has frozen especially as the list of other services run like lightening. I dont know why we are back at 11.1 where this issue was first reported. I dont know if the developers there have ANY NFS drives on their PC's but I can assure you they dont as this error sticks out like...well you know. Reproducible: Always Steps to Reproduce: 1.Create a NFS non V4 2.Boot the PC when the NFS drive's PC is not operating on the Network 3.ESC from the boot splash screen Actual Results: The fast boot is wonderful and the starting of services wonderful.....except where there is a service that cannot complete...Its more than the excessive timeout with NFS. The same is true if ANY of the services started at boot are not present Expected Results: Time to put NFS into the kernel for good so it can constantly retry to connect to a NFS server when ever it comes online. The mounting of network drives must be the responsibility of the kernel I really think if you want my opinion. When any of the services are loading and any fail, I think we ask the user very quickly that there is an issue..."continue or skip option window" There's a whole lot of bugs in 12.2 that are on NEW satus which are probably directly related to this issue and probably more who think their PC is just freezing on Boot. I was going to update the dependencies but thought better as so many of the current bugs appears to be related to this one. TELL ME PRECISELY what files you need as mounting of services during quick boot have changed things a little bit -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c
Jiri Srain
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c1
Scott Couston
From there another service starts which invokes the kernel handling of NFS drives.
My comment about putting this issue to bed forever, with respect to my comment about moving all NFS Services to the kernel was based on the following thoughts. It is helpful to me and I hope yourself if I use the example of Novell Netware. The Netware (kernel) for want of a better term will continue to try to connect back to any network drive that is either present or no present. In the same thoughts, why cant we leave the entire job of NFS drives to the kernel and NO other service running outside of it. At the moment they way 12.2 handles NFS drives after LSB: fails to start the kernel is invoked somewhere to perform...(I have no Idea) All I know is that NFS Services now both server and client, all up to 5 minutes and will wait until a huge timeout parameter is met before the boot process goes any further. I think, in solving this issue we also need to add the smarts that IF an NFS service cannot be started due its opposition client/server not being found; the kernel will continue to try and mount the NFS mount point and when the PC comes online, the NFS is auto mounted without any user intervention. Somewhat like Netware's shell retryes to reconnect any network drive, device or complete the NDS structure on a dynamic basis. The other thought I had is why dont we have all NFS Services advertise continually and this is when a lost NFS mount is dynamically connected. Sometime when the client PC cannot find its server and both PC's come online later, the NFS should auto dynamically load due to us making changes to advertised services becoming available without the user doing everything Jiri, We got to fix this for Enterprise builds. This is such a fundamental flaw that just watching the boot process bot stall and knowing when its opposite devices comes online; no auto connect or reconnect takes place. Anyway mate, let me know what logs you require from me to help and I have NO idea how this issue escapes the day to day of Suse v X and NFS continues to fail; without any developer realises the problem themselves??? Let me know how I can help Scott -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c
FeiXiang Zhang
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c2
--- Comment #2 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c3
--- Comment #3 from Scott Couston
If you have filesystems listed in /etc/fstab, then the correct thing for the boot process to do is to wait for all the filesystems to be available before continuing.
If you don't want that then you have 2 options. 1/ add the "bg" option to the NFS mounts in /etc/fstab. This will cause the mount to be attempted once with a fairly short timeout, and if that fails, to continue attempting in the back ground and let the system boot complete.
2/ use an automounter to have the NFS filesystem only mounted on demand. That way an unavailable NFS server will only block the applications that need it, not the whole system.
If you have hard mount requests for NFS filesystem in /etc/fstab, but expect the system to boot when the server is not available, then that is a configuration error.
I didn't really create this is a bug to help myself only. The issue is the default setting after using Yast>Network Services both >>NFS Server and Client are totally inadequate and create the illusion, to the user, the PC has frozen. If there are options let us put them in Yast GUI but as a default something has to change. If at time of boot the NFS Server and/or client is not available;sure wait a reasonable time like 6 seconds but dont stall the whole boot process of a PC to the point users think that their PC has frozen. The issue of NFS v4 or not using is is not clear in help. The option to use GSS Security is totally unclear. With respect to services being activating and being run as PART of the kernel with respect to NFS drives; I didn't make clear. I dont understand why, with Network Drives (mount points) that the kernel cannot monitor ALL Network mount points and exported services! If any become available after boot; or unavailable, the kernel should auto mount or auto dismount as a dynamic property. We shouldn't need services to be running to accomplish. To solve this reoccurring problem, can we just have a default that mounts if its there or waits 6 or so seconds and if not there continue the boot process - Its as simple as that! When we work out what to do with Yast in the future we can add the options in the GUI...Users just want it to work. Admins expect Yast GUI to offer clear choices and all options and simple WORK - NOTHING MORE.. WE can no longer sit between editing files and have GUI's that accomplish everything an admin wants to do...Hate it or not; but the GUI just simply has to work! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c4
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c5
--- Comment #5 from Scott Couston
So it seems you really issue is with Yast. Consequently I am reassigning this to yast developers.
Main issues seem to be: - autofs is much better for mounting NFS filesystems than /etc/fstab. It should be possible and easy to configure NFS mounts to use autofs - if /etc/fstab is to be used, the "bg" option should be available so that missing servers don't cause the boot to hang.
There might be others that I missed.
Thank you so much for ageing to commit manpower to this problem. This bug goes back to 10.3 and this is the first commitment of resources and for that, I thank you. I dont think any NETWORK drive should be written to /etc/fstab. Auto fs should sit and wait for NFS drives to become present and then not present. If a network drive becomes available, auto mount, the when not available auto dismount and so forth. I think the NFS Server needs to advertise relentlessly and if NFS clients become online the drive is auto mounted. The ability to run application over a NFS drive is perfectly wonderful right now. I run Evolution and all locally held data is on an NFS drive. It makes backing up data from a single source very easy to manage. Its a small data server concept that can be hugely taken advantage of Most of Yast no-loner or has never worked, take Security Centre and Hardening in Yast. under Miscellaneous change the file permission to 'secure' or 'paranoid' and the PC dead and destroyed and not able to be backed out of. Again, thank you for the commitment of resources in this one small area -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c6
--- Comment #6 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c7
--- Comment #7 from Scott Couston
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c
FeiXiang Zhang
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c
Steffen Winterfeldt
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c8
--- Comment #8 from Scott Couston
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c
Scott Couston
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c9
Scott Couston
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c10
Scott Couston
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c11
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c12
--- Comment #12 from Scott Couston
Neil has already given you several solutions to the problem. We're not going to rewrite NFS in the kernel in a way that violates the spec because you don't want to use them.
Your reports get closed as WONTFIX because you don't accept the reasons we give for why a particular issue isn't a code problem and then expect us to develop based on your personal requirements.
For example, if you want YaST issues fixed that the YaST developers don't have time for or don't agree with, write the code yourself and submit it. openSUSE is the community project. Support is best-effort and development is driven by people working on the things they want to work on.
Thank you Jeff. Agree No Contest from me personally. Understand complexity of limited resources and funding. Disappointed from a product Open Suse point of view that our Product Fails in this regard. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=819515
https://bugzilla.novell.com/show_bug.cgi?id=819515#c13
Scott Couston
http://bugzilla.novell.com/show_bug.cgi?id=819515
Jiri Slaby
participants (1)
-
bugzilla_noreply@novell.com