-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2010-05-16 at 12:54 -0700, Marc Chamberlin wrote:
On 5/5/2010 1:54 PM, Carlos E. R. wrote:
Wowza! This took a hell of a lot of digging to figure out why my second network interface was coming up so slowly! I still cannot say exactly where the fault really is, but I can make a pretty good guess... It appears that, with openSuSE11.2, a lot of work has gone into reworking the boot up process to make it faster. But in doing so, there is now a staging problem that I believe is leading to a deadlock, which in turn is resolved via a timeout... Hence the slowness of my second interface to finally start up..
I have a central server on my network that is handling chores such as DHCP, Named, and Bacula (a backup service) amongst others. The Bacula daemon requires an external NAS multi-terabyte storage drive to be mounted as a CIFS device. To accomplish that, I was using fstab to mount this external drive, which is connected to the network serviced via my second interface card - eth1. My suspicion is that fstab is now being used to set up the mounts BEFORE the second interface eth1 is initialized. It was through a lot of instrumentation of scripts that I finally discovered that fstab was failing to mount my external storage drive because it could not resolve the network name for the external storage device. (Even though I had it defined in my hosts file, which totally surprised me!!!) This causes the mount process to enter a long timeout period and somehow that is blocking the initialization of the second interface card - eth1. So we are deadlocked for awhile until the initial attempt to mount this drive times out.. When the timeout occurs, eth1 is initialized properly, the rest of the daemons, which are also dependent on eth1 being initialized, such as DCHP, Named etc can also run, and the rest of my system, except for Bacula and that particular mount point, comes up.... (I could do a mount -a at this point manually, and also manually restart the Bacula daemons...)
Wow.
The workaround is to use the automount service instead, to mount the storage drive that Bacula needs, when referenced. This holds off the need to resolve the network name for this device until after the Bacula daemons actually refer to it, which luckily enough is not done/needed during the boot up process...
I think you can use the "nofail" option in fstab, so that the mount process will not halt. Or perhaps, "noauto", which makes it not even attempt to mount it. But then, the "nfs" client service script will not attempt to mount it either. My guess is that it might suceed with "nofail". I'm not sure. Or perhaps there is an option to tell the initial mount not to try mount remote filesystems (it is made this early because a system may need to mount /usr remotely). And you are right, network service starts after "local_fs". However, there is a "remote_fs", which I'm not sure which script provides. The idea would be to make sure that the script providing "remote_fs" has as as "Required-Start" full network. Ah, it is defined in /etc/insserv.conf (man 8 insserv): # # All remote filesystems are mounted (note in some cases /usr may # be remote. Most applications that care will probably require # both $local_fs and $remote_fs) # $remote_fs $local_fs +nfs +smbfs So, it is "nfs". And it is in the required start: # Provides: nfs # Required-Start: $network $portmap Now, when service "network" succeeds, is your second interface up? Are you sure your eth1 is listed as "MANDATORY"? I think it is here: /etc/sysconfig/network/config:MANDATORY_DEVICES=""
One comment I might add, which the openSuSE/Linux core teams should really think about... The role of a system architect requires people who have had a LOT of experience with designing and developing operating systems. And I do mean a LOT!!!! Avoiding deadlocks such as this requires an enormous about of experience and insight when redesigning how an OS starts up or shuts down. From my observations of the recent releases of openSuSE, I would have to say that quality of the system architecture is starting to show signs of design flaws that are caused by inexperience...
Not much to object about that :-) Maybe there is something else we do not know (we: you, me, etc) about how this is suppossed to work. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvwV18ACgkQtTMYHG2NR9WldACfbnG1jws+JKFxwC27X+N3WUlT UDMAnj9n+cU2jZQCN1NUEuq91QN/rrlW =QGFE -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org