http://bugzilla.novell.com/show_bug.cgi?id=518219
Summary: sm-notify runs before dhcpcd sets hostname Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: i586 OS/Version: openSUSE 11.1 Status: NEW Severity: Normal Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: zuziak@math.ku.dk QAContact: qa@suse.de Found By: ---
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.11) Gecko/2009060200 SUSE/3.0.11-0.1.1 Firefox/3.0.11
Sometimes sm-notify is run before the dhcp client has set the correct hostname. This means that the notify messages sent to nfs servers contain the wrong hostname so any locks held by the client are not released.
This seems to happen on about every other boot when parallel execution of init scripts is disabled.
Reproducible: Sometimes
http://bugzilla.novell.com/show_bug.cgi?id=518219
User meissner@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c1
Marcus Meissner meissner@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |meissner@novell.com, | |mt@novell.com AssignedTo|bnc-team-screening@forge.pr |nfbrown@novell.com |ovo.novell.com |
--- Comment #1 from Marcus Meissner meissner@novell.com 2009-07-02 01:04:13 MDT --- hmm.
dhcp / nfs-client?
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c2
--- Comment #2 from Martin Zuziak zuziak@math.ku.dk 2009-07-02 01:47:14 MDT --- dhcpcd-3.2.3-44.1 and nfs-client-1.1.3-18.2.1.
http://bugzilla.novell.com/show_bug.cgi?id=518219
User nfbrown@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c3
Neil Brown nfbrown@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |mt@novell.com
--- Comment #3 from Neil Brown nfbrown@novell.com 2009-07-09 22:16:29 MDT --- /etc/init.d/nfs declares: # Required-Start: $network $portmap
so it shouldn't be started before the network ??? Maybe that doesn't really work for dhcp ?? Maybe that is a problem with dhcp... I presume dhcp-client is being run from init scripts, not from network-manger after login ??
Marius: what would you expect should happen here. Do the initscripts wait for the dhcp process to finish before 'nfs' gets run? Or do I need to hook nfs in to some script that dhcpclient runs when the network is finally up?
http://bugzilla.novell.com/show_bug.cgi?id=518219
User mt@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c4
Marius Tomaschewski mt@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|mt@novell.com |
--- Comment #4 from Marius Tomaschewski mt@novell.com 2009-07-10 01:56:28 MDT --- There are several scenarios possible.
There are two network scripts:
- /etc/init.d/network: this is the script referenced via $network and is started before nfs ($remote_fs scripts). It activates only "ifup" interfaces, that don't require an already mounted $remote_fs. Basically normal ethernets, bonds, bridges and vlans. But it does not start WLAN, PPP (ISDN, DSL or whatever) and also not the NetworkManager when enabled.
- /etc/init.d/network-remotefs: this is a second network script that starts either the NetworkManager or all the "ifup" interfaces, that require $remote_fs (WLAN, PPP, ...).
Except of /etc/init.d/nfs, also the /etc/init.d/nfsserver script calls sm-notify. It does not have any dependencies to $remote_fs: # Required-Start: $network $named $portmap even all it's binaries are in /usr.
IMO a bug and it is probably the script calling it before the hostname is set.
The question is also, where the interface setting the hostname is started. /etc/init.d/network waits for the mandatory interfaces and also until dhcp finishes. This should work.
But when there are two interfaces: one started in network (and used for the nfs mounts) and second one started in network-remotefs that sets the hostname, it is a configuration problem, not a bug.
Next possibility are some if-up.d/ifservices or NetworkManager hooks, e.g.: /etc/NetworkManager/dispatcher.d/nfs This script does not care about the interface at all and supports only one interface at all, restarting the nfs client and umounting all...
But I don't think that this is the problem, since this script is called very late (network-remotefs => NetworkManager => nfs-hook).
http://bugzilla.novell.com/show_bug.cgi?id=518219
User mt@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c5
Marius Tomaschewski mt@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |zuziak@math.ku.dk
--- Comment #5 from Marius Tomaschewski mt@novell.com 2009-07-10 01:58:39 MDT --- Martin, please provide more information about your (network) setup and attach the /var/log/boot.msg file.
http://bugzilla.novell.com/show_bug.cgi?id=518219
User mt@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c6
--- Comment #6 from Marius Tomaschewski mt@novell.com 2009-07-10 02:17:31 MDT --- Another possibility is that the timeouts:
/etc/sysconfig/network/config WAIT_FOR_INTERFACES="30" /etc/sysconfig/network/dhcp DHCLIENT_WAIT_AT_BOOT="15"
are not sufficient for your setup. When the dhcp client needs too long to setup the interface, /etc/init.d/network will continue and report "in background".
Common reason is a bridge with enabled STP. Please verify that all your bridges have this two settings in ifcfg-* files: BRIDGE_STP='off' BRIDGE_FORWARDDELAY='0'
A bridge with enabled STP may need up to 50 secs before it enables the interface / starts to forward packets, see also:
http://tldp.org/HOWTO/BRIDGE-STP-HOWTO/advanced-bridge.html#STP
you can tune STP e.g. using
BRIDGE_FORWARDDELAY='4' BRIDGE_HELLOTIME='1' BRIDGE_MAXAGE='4'
and increasing the timeouts to: WAIT_FOR_INTERFACES="60" DHCLIENT_WAIT_AT_BOOT="30"
When the ISC dhclient is in use (not dhcpcd), if may need +15sec:
# Note: RFC 2131 specifies, that the dhcp client should wait a random time # between one and ten seconds to desynchronize the use of DHCP at startup.
http://bugzilla.novell.com/show_bug.cgi?id=518219
User nfbrown@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c7
--- Comment #7 from Neil Brown nfbrown@novell.com 2009-07-10 02:29:34 MDT --- I just discovered that there is a bug which causes sm-notify not to work at all - bug #520311. Maybe that is the problem and it is something to do with dhcp?
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c8
--- Comment #8 from Martin Zuziak zuziak@math.ku.dk 2009-07-10 06:13:57 MDT --- Created an attachment (id=304394) --> (http://bugzilla.novell.com/attachment.cgi?id=304394) boot.msg
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c9
--- Comment #9 from Martin Zuziak zuziak@math.ku.dk 2009-07-10 06:14:47 MDT --- Created an attachment (id=304395) --> (http://bugzilla.novell.com/attachment.cgi?id=304395) /etc/sysconfig/network/config
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c10
--- Comment #10 from Martin Zuziak zuziak@math.ku.dk 2009-07-10 06:15:13 MDT --- Created an attachment (id=304396) --> (http://bugzilla.novell.com/attachment.cgi?id=304396) /etc/sysconfig/network/dhcp
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c11
--- Comment #11 from Martin Zuziak zuziak@math.ku.dk 2009-07-10 06:15:37 MDT --- Created an attachment (id=304397) --> (http://bugzilla.novell.com/attachment.cgi?id=304397) /etc/sysconfig/network/ifcfg-eth0
http://bugzilla.novell.com/show_bug.cgi?id=518219
User zuziak@math.ku.dk added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c12
Martin Zuziak zuziak@math.ku.dk changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|zuziak@math.ku.dk |
--- Comment #12 from Martin Zuziak zuziak@math.ku.dk 2009-07-10 06:18:31 MDT --- This problem is far less prevalent than I thought. I've been keeping an eye on my openSuSE 11.1 machines (about 15) and they don't show this problem. So I've only seen it on my test machine which is a virtual machine running on vmware-server (on another openSuSE 11.1). I assumed that this problem was the cause of some stale NFS locks on 11.1 clients but that seems to be caused by bug #520311 instead.
Running on vmware-server makes it slow which might be what triggers this. I'm sorry I should have mentioned that, but it would be nice if it would work on virtual machines also.
eth0 isn't set up as mandatory in /etc/sysconfig/network/config. I've tried doing that but it didn't help. I guess it should be mandatory automatically as a physical interface with STARTMODE='auto'.
Anyway, there's only a single ethernet adapter, it has no exotic configuration. I've attached boot.msg and config, dhcp and ifcfg-eth0 from /etc/sysconfig/network.
From the log: I've inserted a loop in the nfs init script just before sm-notify
runs. It logs the current hostname and sleeps for 1 second until the hostname is correct.
Jul 10 13:25:13 linux ifup: eth0 device: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10) Jul 10 13:25:15 linux logger: nfs: hostname is linux Jul 10 13:25:15 linux SuSEfirewall2: SuSEfirewall2 not active Jul 10 13:25:15 linux dhcpcd[2501]: eth0: setting hostname to `pc000c292d9978.math.ku.dk' Jul 10 13:25:15 linux dhcpcd[2501]: eth0: forking to background Jul 10 13:25:15 linux dhcpcd[3012]: eth0: waiting for 1800 seconds Jul 10 13:25:15 linux dhcpcd[2501]: eth0: exiting Jul 10 13:25:16 linux logger: nfs: hostname is pc000c292d9978.math.ku.dk Jul 10 13:25:17 linux sm-notify[3018]: sm-notify running as root. chown /var/lib/nfs/sm to choose different us er
Even when I set eth0 as mandatory it doesn't show in the following output. Is that a problem?
# /etc/init.d/rc5.d/S*network start -o debug fake
CONFIG = INTERFACE = AVAILABLE_IFACES = eth0 PHYSICAL_IFACES = eth0 BONDING_IFACES = VLAN_IFACES = DIALUP_IFACES = TUNNEL_IFACES = BRIDGE_IFACES = SLAVE_IFACES = MANDATORY_DEVICES = VIRTUAL_IFACES = SKIP = start order : eth0 ; ; Setting up network interfaces: ifup eth0 -o rc eth0 returned 0 done .. still waiting for hotplug devices: SUCCESS_IFACES= eth0 MANDATORY_DEVICES= .. final SUCCESS_IFACES= eth0 MANDATORY_DEVICES= FAILED=0 ifup-route noiface -o rc Setting up service network . . . . . . . .
http://bugzilla.novell.com/show_bug.cgi?id=518219
User nfbrown@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c13
Neil Brown nfbrown@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |nfbrown@novell.com AssignedTo|nfbrown@novell.com |bnc-team-screening@forge.pr | |ovo.novell.com
--- Comment #13 from Neil Brown nfbrown@novell.com 2009-10-30 01:07:14 MDT --- (sorry for the delays...)
I think this is more of an network/interface issue than an sm-notify issue. I'm not even sure what is expected in the output at the end of the last comment.
So could this please be assigned to someone with understand of network configuration, dhcp, etc.
http://bugzilla.novell.com/show_bug.cgi?id=518219
zhu rensheng rszhu@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rszhu@novell.com AssignedTo|bnc-team-screening@forge.pr |tambet@novell.com |ovo.novell.com |
http://bugzilla.novell.com/show_bug.cgi?id=518219
User tambet@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c14
Tambet Ingo tambet@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|tambet@novell.com |bnc-team-screening@forge.pr | |ovo.novell.com
--- Comment #14 from Tambet Ingo tambet@novell.com 2009-10-30 02:32:17 MDT --- This has nothing to do with NetworkManager, restoring the previous assignee.
http://bugzilla.novell.com/show_bug.cgi?id=518219
User meissner@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c15
Marcus Meissner meissner@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |mt@novell.com |ovo.novell.com |
--- Comment #15 from Marcus Meissner meissner@novell.com 2009-10-30 02:37:47 MDT --- -> marius perhaps.
http://bugzilla.novell.com/show_bug.cgi?id=518219
User mt@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=518219#c16
Marius Tomaschewski mt@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
--- Comment #16 from Marius Tomaschewski mt@novell.com 2009-11-13 05:53:27 MST --- (In reply to comment #12)
eth0 isn't set up as mandatory in /etc/sysconfig/network/config. I've tried doing that but it didn't help. I guess it should be mandatory automatically as a physical interface with STARTMODE='auto'.
Yes. When MANDATORY_DEVICES in network/config is not set, physical interfaces are added to the set.
The documentation of MANDATORY_DEVICES variable does not have the "boot" option (set automatically when started via init). Without it, there are no mandatory devices -- waiting at boot isn't executed. Further, the network script runs twice with: localfs and remotefs filter.
Please use instead: rcnetwork start -o boot debug fake localfs
It should look like this (OK, this is more complex setup):
# rcnetwork start -o boot debug fake localfs
* Modifications by localfs filter: PHYSICAL_IFACES => eth0 eth1 eth2 + = eth0 eth1 eth2 NOT_PHYSICAL_IFACES => br0 br1 + = br0 br1 MANDATORY_DEVICES => eth0 eth1 + = eth0 eth1 MANDATORY_SLAVES => + = VIRTUAL_IFACES => br0 br1 + = br0 br1 CONFIG = INTERFACE = AVAILABLE_IFACES = br0 br1 eth0 eth1 eth2 PHYSICAL_IFACES = eth0 eth1 eth2 BONDING_IFACES = VLAN_IFACES = DIALUP_IFACES = TUNNEL_IFACES = BRIDGE_IFACES = br0 br1 SLAVE_IFACES = eth0 eth1 MANDATORY_DEVICES = eth0 eth1 __NSC__ VIRTUAL_IFACES = br0 br1 SKIP = start order : eth0 eth1 eth2 ; eth0 eth1 __NSC__ ; br0 br1 Setting up (localfs) network interfaces: ifup lo -o rc onboot lo returned 0 done ifup eth0 -o rc onboot eth0 returned 0 done ifup eth1 -o rc onboot eth1 returned 0 done ifup eth2 -o rc onboot eth2 returned 0 done .. still waiting for hotplug devices: SUCCESS_IFACES= lo eth0 eth1 eth2 MANDATORY_DEVICES=eth0 eth1 __NSC__ .. final SUCCESS_IFACES= lo eth0 eth1 eth2 MANDATORY_DEVICES= FAILED=0 ifup br0 -o rc onboot br0 returned 0 done ifup br1 -o rc onboot br1 returned 0 done ifup-route noiface -o rc onboot Setting up service (localfs) network . . . . . . . . . . done
Anyway, there's only a single ethernet adapter, it has no exotic configuration. I've attached boot.msg and config, dhcp and ifcfg-eth0 from /etc/sysconfig/network.
Yes, I don't see anything special in your configs.
I guess I know where the problem is: ifup-dhcp does not know when exactly the dhcp client sets the hostname. It may return OK to the network script too early. I've to take a closer look how to fix this (synchonize them)...
http://bugzilla.novell.com/show_bug.cgi?id=518219#c17
--- Comment #17 from Marius Tomaschewski mt@novell.com 2009-11-18 19:35:02 UTC --- I'm able to reproduce the problem. It is like in above comment. Hostname is set too late in dhcpcd and ifup-dhcp does not wait until it is set. (I guess, due changes done to fix 427681)...
We are working on a fix.
http://bugzilla.novell.com/show_bug.cgi?id=518219
http://bugzilla.novell.com/show_bug.cgi?id=518219#c20
Swamp Workflow Management swamp@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard| |maint:released:11.2:29703
--- Comment #20 from Swamp Workflow Management swamp@suse.com 2010-01-04 12:55:02 UTC --- Update released for: sysconfig, sysconfig-debuginfo, sysconfig-debugsource Products: openSUSE 11.2 (debug, i586, x86_64)
http://bugzilla.novell.com/show_bug.cgi?id=518219
http://bugzilla.novell.com/show_bug.cgi?id=518219#c21
Swamp Workflow Management swamp@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard|maint:released:11.2:29703 |maint:released:11.2:29702
--- Comment #21 from Swamp Workflow Management swamp@suse.com 2010-01-04 12:55:33 UTC --- Update released for: dhcpcd, dhcpcd-debuginfo, dhcpcd-debugsource Products: openSUSE 11.2 (debug, i586, x86_64)
http://bugzilla.novell.com/show_bug.cgi?id=518219
http://bugzilla.novell.com/show_bug.cgi?id=518219#c22
Marius Tomaschewski mt@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED
--- Comment #22 from Marius Tomaschewski mt@novell.com 2010-01-22 11:34:21 UTC --- submitted package and patchinfo to 11.1 and sle11 as well
http://bugzilla.novell.com/show_bug.cgi?id=518219
http://bugzilla.novell.com/show_bug.cgi?id=518219#c23
Swamp Workflow Management swamp@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard|maint:released:11.2:29702 |maint:released:11.2:29702 | |maint:released:11.1:30429
--- Comment #23 from Swamp Workflow Management swamp@suse.com 2010-02-01 11:02:00 UTC --- Update released for: dhcpcd, dhcpcd-debuginfo, dhcpcd-debugsource, sysconfig, sysconfig-debuginfo, sysconfig-debugsource Products: openSUSE 11.1 (debug, i586, ppc, x86_64)
http://bugzilla.novell.com/show_bug.cgi?id=518219 http://bugzilla.novell.com/show_bug.cgi?id=518219#c24
--- Comment #24 from Bernhard Wiedemann bwiedemann@suse.com --- This is an autogenerated message for OBS integration: This bug (518219) was mentioned in https://build.opensuse.org/request/show/26967 11.2:Test / sysconfig https://build.opensuse.org/request/show/30275 11.1:Test / dhcpcd https://build.opensuse.org/request/show/34696 Factory / dhcp