Re: [opensuse] Slow startup of network interface
On 4/25/2010 3:09 PM, Hans Witvliet wrote:
On Sun, 2010-04-25 at 13:47 -0700, Marc Chamberlin wrote:
I have a system acting as a gateway for my SOHO network, running openSuSE11.2 x86_64 which as two different NICs installed. eth0 is my external network interface and eth1 is my internal network interface. Both are configured to initialize during boot up and I use the traditional ifup/ifdown method of controlling my network. eth0 will set up just fine during boot up and be fully functional by the time the KDE desktop is up an running. But eth1 is taking several minutes (from after the KDE desktop is up and running) before it comes online. This temporarily breaks several services such as Named, DHCP, James (my email server), though when eth1 finally gets up an running these services do start up automatically as well. BUT this also causes problems on other computers on my internal network... What is worse however is that I mount several remote disk directories from fstab and these mounts do NOT automatically establish themselves until I manually do a mount -a as superuser on this gateway/server. And that breaks other services such as vsftp and bacula. This is a serious problem for me, as this system MUST be able to reboot and come up ok with all these services running without my attendance. (I run this system remotely and use a crude means of keeping the system working, i.e. I use a device called iBoot which will power cycle any computer which stops responding to pings from it... hence the reason it must be able to reboot without my attendance...)
So how do I debug and fix this slow startup of eth1? I would really prefer to fix this issue and not find/use workarounds.. Any help offered sure will be appreciated, I am stuck and have ran out of ideas myself...
Marc...
Hi Marc,
You should have a look at the file: /etc/sysconfig/network/config
here are two fields you must change: MANDATORY_DEVICES="" and WAIT_FOR_INTERFACES="30"
By default, the system iniializes, but waits no longer for any device more than 30 seconds. For several systems i had to raise it to 90 seconds or even higher.... (ymmv) And as shown, the field for mandatory devices is empty.
In case your system acts as nfs/dhcp/ldap/what-ever-server, the startup of those services could happen before your network is up, with all kinds of unpredictable concequences.
So make it both eth0 and eth1 mandatory, and increase the timeout.
hw
Thanks Hans for your reply and thoughts... I tried to do as you suggested, but no joy! The eth1 interface remains slow to initialize... So do you or anyone else have any suggestions I might try? Thanks again in advance... Marc..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2010-05-05 22:10, Marc Chamberlin wrote:
On 4/25/2010 3:09 PM, Hans Witvliet wrote:
On Sun, 2010-04-25 at 13:47 -0700, Marc Chamberlin wrote:
my attendance. (I run this system remotely and use a crude means of keeping the system working, i.e. I use a device called iBoot which will power cycle any computer which stops responding to pings from it... hence the reason it must be able to reboot without my attendance...)
Powercycling a sustem that way could destroy the filesystem.
You should have a look at the file: /etc/sysconfig/network/config
here are two fields you must change: MANDATORY_DEVICES="" and WAIT_FOR_INTERFACES="30"
By default, the system iniializes, but waits no longer for any device more than 30 seconds. For several systems i had to raise it to 90 seconds or even higher.... (ymmv) And as shown, the field for mandatory devices is empty.
In case your system acts as nfs/dhcp/ldap/what-ever-server, the startup of those services could happen before your network is up, with all kinds of unpredictable concequences.
So make it both eth0 and eth1 mandatory, and increase the timeout.
Thanks Hans for your reply and thoughts... I tried to do as you suggested, but no joy! The eth1 interface remains slow to initialize... So do you or anyone else have any suggestions I might try? Thanks again in advance...
Simply set it up to wait as long as necessary (mandatory interface). Determining why it is slow is something different and we have no data. It could be a bad cable, for example. Look at errors in the interface, for example. ifstat, ethtool... - -- Cheers / Saludos, Carlos E. R. (from 11.2 x86_64 "Emerald" GM (Elessar)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAkvh2wEACgkQU92UU+smfQV1OQCffCb7ybiGqambUtMBtOgr2K9s xUUAnjpDxFhF/GaSQecvRH4BUDn/Wqh+ =5n90 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 5/5/2010 1:54 PM, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2010-05-05 22:10, Marc Chamberlin wrote:
On 4/25/2010 3:09 PM, Hans Witvliet wrote:
On Sun, 2010-04-25 at 13:47 -0700, Marc Chamberlin wrote:
my attendance. (I run this system remotely and use a crude means of keeping the system working, i.e. I use a device called iBoot which will power cycle any computer which stops responding to pings from it... hence the reason it must be able to reboot without my attendance...)
Powercycling a sustem that way could destroy the filesystem.
Granted, but I got no other options... So it is a chance I have to take... (Which is also why I back up my systems every night!!!)
You should have a look at the file: /etc/sysconfig/network/config
here are two fields you must change: MANDATORY_DEVICES="" and WAIT_FOR_INTERFACES="30"
By default, the system iniializes, but waits no longer for any device more than 30 seconds. For several systems i had to raise it to 90 seconds or even higher.... (ymmv) And as shown, the field for mandatory devices is empty.
In case your system acts as nfs/dhcp/ldap/what-ever-server, the startup of those services could happen before your network is up, with all kinds of unpredictable concequences.
So make it both eth0 and eth1 mandatory, and increase the timeout.
Thanks Hans for your reply and thoughts... I tried to do as you suggested, but no joy! The eth1 interface remains slow to initialize... So do you or anyone else have any suggestions I might try? Thanks again in advance...
Simply set it up to wait as long as necessary (mandatory interface).
Determining why it is slow is something different and we have no data. It could be a bad cable, for example. Look at errors in the interface, for example. ifstat, ethtool...
Wowza! This took a hell of a lot of digging to figure out why my second network interface was coming up so slowly! I still cannot say exactly where the fault really is, but I can make a pretty good guess... It appears that, with openSuSE11.2, a lot of work has gone into reworking the boot up process to make it faster. But in doing so, there is now a staging problem that I believe is leading to a deadlock, which in turn is resolved via a timeout... Hence the slowness of my second interface to finally start up.. I have a central server on my network that is handling chores such as DHCP, Named, and Bacula (a backup service) amongst others. The Bacula daemon requires an external NAS multi-terabyte storage drive to be mounted as a CIFS device. To accomplish that, I was using fstab to mount this external drive, which is connected to the network serviced via my second interface card - eth1. My suspicion is that fstab is now being used to set up the mounts BEFORE the second interface eth1 is initialized. It was through a lot of instrumentation of scripts that I finally discovered that fstab was failing to mount my external storage drive because it could not resolve the network name for the external storage device. (Even though I had it defined in my hosts file, which totally surprised me!!!) This causes the mount process to enter a long timeout period and somehow that is blocking the initialization of the second interface card - eth1. So we are deadlocked for awhile until the initial attempt to mount this drive times out.. When the timeout occurs, eth1 is initialized properly, the rest of the daemons, which are also dependent on eth1 being initialized, such as DCHP, Named etc can also run, and the rest of my system, except for Bacula and that particular mount point, comes up.... (I could do a mount -a at this point manually, and also manually restart the Bacula daemons...) The workaround is to use the automount service instead, to mount the storage drive that Bacula needs, when referenced. This holds off the need to resolve the network name for this device until after the Bacula daemons actually refer to it, which luckily enough is not done/needed during the boot up process... One comment I might add, which the openSuSE/Linux core teams should really think about... The role of a system architect requires people who have had a LOT of experience with designing and developing operating systems. And I do mean a LOT!!!! Avoiding deadlocks such as this requires an enormous about of experience and insight when redesigning how an OS starts up or shuts down. From my observations of the recent releases of openSuSE, I would have to say that quality of the system architecture is starting to show signs of design flaws that are caused by inexperience... Marc..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2010-05-16 at 12:54 -0700, Marc Chamberlin wrote:
On 5/5/2010 1:54 PM, Carlos E. R. wrote:
Wowza! This took a hell of a lot of digging to figure out why my second network interface was coming up so slowly! I still cannot say exactly where the fault really is, but I can make a pretty good guess... It appears that, with openSuSE11.2, a lot of work has gone into reworking the boot up process to make it faster. But in doing so, there is now a staging problem that I believe is leading to a deadlock, which in turn is resolved via a timeout... Hence the slowness of my second interface to finally start up..
I have a central server on my network that is handling chores such as DHCP, Named, and Bacula (a backup service) amongst others. The Bacula daemon requires an external NAS multi-terabyte storage drive to be mounted as a CIFS device. To accomplish that, I was using fstab to mount this external drive, which is connected to the network serviced via my second interface card - eth1. My suspicion is that fstab is now being used to set up the mounts BEFORE the second interface eth1 is initialized. It was through a lot of instrumentation of scripts that I finally discovered that fstab was failing to mount my external storage drive because it could not resolve the network name for the external storage device. (Even though I had it defined in my hosts file, which totally surprised me!!!) This causes the mount process to enter a long timeout period and somehow that is blocking the initialization of the second interface card - eth1. So we are deadlocked for awhile until the initial attempt to mount this drive times out.. When the timeout occurs, eth1 is initialized properly, the rest of the daemons, which are also dependent on eth1 being initialized, such as DCHP, Named etc can also run, and the rest of my system, except for Bacula and that particular mount point, comes up.... (I could do a mount -a at this point manually, and also manually restart the Bacula daemons...)
Wow.
The workaround is to use the automount service instead, to mount the storage drive that Bacula needs, when referenced. This holds off the need to resolve the network name for this device until after the Bacula daemons actually refer to it, which luckily enough is not done/needed during the boot up process...
I think you can use the "nofail" option in fstab, so that the mount process will not halt. Or perhaps, "noauto", which makes it not even attempt to mount it. But then, the "nfs" client service script will not attempt to mount it either. My guess is that it might suceed with "nofail". I'm not sure. Or perhaps there is an option to tell the initial mount not to try mount remote filesystems (it is made this early because a system may need to mount /usr remotely). And you are right, network service starts after "local_fs". However, there is a "remote_fs", which I'm not sure which script provides. The idea would be to make sure that the script providing "remote_fs" has as as "Required-Start" full network. Ah, it is defined in /etc/insserv.conf (man 8 insserv): # # All remote filesystems are mounted (note in some cases /usr may # be remote. Most applications that care will probably require # both $local_fs and $remote_fs) # $remote_fs $local_fs +nfs +smbfs So, it is "nfs". And it is in the required start: # Provides: nfs # Required-Start: $network $portmap Now, when service "network" succeeds, is your second interface up? Are you sure your eth1 is listed as "MANDATORY"? I think it is here: /etc/sysconfig/network/config:MANDATORY_DEVICES=""
One comment I might add, which the openSuSE/Linux core teams should really think about... The role of a system architect requires people who have had a LOT of experience with designing and developing operating systems. And I do mean a LOT!!!! Avoiding deadlocks such as this requires an enormous about of experience and insight when redesigning how an OS starts up or shuts down. From my observations of the recent releases of openSuSE, I would have to say that quality of the system architecture is starting to show signs of design flaws that are caused by inexperience...
Not much to object about that :-) Maybe there is something else we do not know (we: you, me, etc) about how this is suppossed to work. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvwV18ACgkQtTMYHG2NR9WldACfbnG1jws+JKFxwC27X+N3WUlT UDMAnj9n+cU2jZQCN1NUEuq91QN/rrlW =QGFE -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 5/16/2010 1:36 PM, Carlos E. R. wrote:
On Sunday, 2010-05-16 at 12:54 -0700, Marc Chamberlin wrote:
On 5/5/2010 1:54 PM, Carlos E. R. wrote:
Carlos - Thanks for your response... I took a bit of time to try an chase down an answer to your questions, so I temporarily reconfigured my system back to using fstab to mount the network drive...
I think you can use the "nofail" option in fstab, so that the mount process will not halt. Or perhaps, "noauto", which makes it not even attempt to mount it. But then, the "nfs" client service script will not attempt to mount it either. My guess is that it might suceed with "nofail". I'm not sure. Or perhaps there is an option to tell the initial mount not to try mount remote filesystems (it is made this early because a system may need to mount /usr remotely).
Neither the "nofail" nor the "noauto" options corrected the delay of getting the second network interface started.
And you are right, network service starts after "local_fs". However, there is a "remote_fs", which I'm not sure which script provides. The idea would be to make sure that the script providing "remote_fs" has as as "Required-Start" full network.
Ah, it is defined in /etc/insserv.conf (man 8 insserv):
# # All remote filesystems are mounted (note in some cases /usr may # be remote. Most applications that care will probably require # both $local_fs and $remote_fs) # $remote_fs $local_fs +nfs +smbfs
So, it is "nfs". And it is in the required start:
# Provides: nfs # Required-Start: $network $portmap
Now, when service "network" succeeds, is your second interface up?
I tried to instrument the network startup script in init.d and discovered a hitch... The network is started up before the syslog daemon, therefore I cannot get it to record what is happening when the script is ran, during boot up... ( I was using the -xv option for bash, embedded echo statements, and an ifconfig at then end of the "start" section) And syslog is defined as being dependent on network... I am afraid I do not know enough about Linux to figure out how to instrument the start up of the network daemon during the early phases of boot up.. so will need some help to accomplish that in order to answer your question...
Are you sure your eth1 is listed as "MANDATORY"? I think it is here:
/etc/sysconfig/network/config:MANDATORY_DEVICES=""
Yes, both eth0 and eth1 are specified as mandatory.... Marc...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2010-05-18 01:06, Marc Chamberlin wrote:
On 5/16/2010 1:36 PM, Carlos E. R. wrote:
On Sunday, 2010-05-16 at 12:54 -0700, Marc Chamberlin wrote:
On 5/5/2010 1:54 PM, Carlos E. R. wrote:
Carlos - Thanks for your response... I took a bit of time to try an chase down an answer to your questions, so I temporarily reconfigured my system back to using fstab to mount the network drive...
I think you can use the "nofail" option in fstab, so that the mount process will not halt. Or perhaps, "noauto", which makes it not even attempt to mount it. But then, the "nfs" client service script will not attempt to mount it either. My guess is that it might suceed with "nofail". I'm not sure. Or perhaps there is an option to tell the initial mount not to try mount remote filesystems (it is made this early because a system may need to mount /usr remotely).
Neither the "nofail" nor the "noauto" options corrected the delay of getting the second network interface started.
That's not good...
And you are right, network service starts after "local_fs". However, there is a "remote_fs", which I'm not sure which script provides. The idea would be to make sure that the script providing "remote_fs" has as as "Required-Start" full network.
Ah, it is defined in /etc/insserv.conf (man 8 insserv):
# # All remote filesystems are mounted (note in some cases /usr may # be remote. Most applications that care will probably require # both $local_fs and $remote_fs) # $remote_fs $local_fs +nfs +smbfs
So, it is "nfs". And it is in the required start:
# Provides: nfs # Required-Start: $network $portmap
Now, when service "network" succeeds, is your second interface up?
I tried to instrument the network startup script in init.d and discovered a hitch... The network is started up before the syslog daemon, therefore I cannot get it to record what is happening when the script is ran, during boot up... ( I was using the -xv option for bash, embedded echo statements, and an ifconfig at then end of the "start" section) And syslog is defined as being dependent on network... I am afraid I do not know enough about Linux to figure out how to instrument the start up of the network daemon during the early phases of boot up.. so will need some help to accomplish that in order to answer your question...
Mmmm... there are some services that are split in two, one of them named "early" or something similar. I guess syslog starts after network because the logs can be sent over the network to another machine. Indeed, there is a service named "earlysyslog" and another is "syslog". The first does not depend on anything, the second depends on the network. And looking at the code of the "earlysyslog" script, I see that the author is not very sure that it will work! Look: case "$SYSLOG_DAEMON" in syslog-ng) while read line ; do case "$line" in \#*|"") continue ;; *udp\ *|*udp\(*) exit 0 ;; *tcp\ *|*tcp\(*) exit 0 ;; esac done < ${config} ;; *) # in hope this works with the rsyslog.early.conf file # (hard to implement for rsyslog with its includes/if # statements)... while read select action ; do case "$select" in \#*|"") continue ;; esac case "$action" in *@*) exit 0 ;; esac done < ${config} ;; esac and rsyslog is the default in oS 11.2, so... that's your problem precisely, network init problems will not be logged.
Are you sure your eth1 is listed as "MANDATORY"? I think it is here:
/etc/sysconfig/network/config:MANDATORY_DEVICES=""
Yes, both eth0 and eth1 are specified as mandatory....
Well... dunno. You have a solution that works, so you may leave it at that. You could report the problem in Bugzilla, so that devs try to solve it for good sometime. But I can tell you what I would do, every body has his own preferred bag of tricks O:-) I would define the lines in fstab for those external mounts as "noauto,nofail". This means that they will not be automatically mounted, nor will they abort the sequence attempting to fsck a non existing device. I hope. Then I would add a new service that would run "mount /externalwhatever" adjusted to be run late in the start sequence (the "skeleton script is just that). Easier, "/etc/init.d/after.local" should work: #!/bin/bash mount /externalwhatever And, if that hangs or delays, you could try: mount /externalwhatever & so that it runs in the background. - -- Cheers / Saludos, Carlos E. R. (from 11.2 x86_64 "Emerald" GM (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iF4EAREIAAYFAkvyWfQACgkQja8UbcUWM1w85AD/fGcDdcjpXWWbvOq13oQK2dgE MA1k7ChKAoUVQUfvaEYA/2VgdJuv5lEVqAi95Txj5RqESUlZD0gChniFOOz5YSqN =Fvp6 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Carlos E. R.
-
Marc Chamberlin