[Bug 427313] New: dbus segfaults at boot when LDAP client is enabled
https://bugzilla.novell.com/show_bug.cgi?id=427313 User syseng@adnovum.ch added comment https://bugzilla.novell.com/show_bug.cgi?id=427313#c1 Summary: dbus segfaults at boot when LDAP client is enabled Product: openSUSE 11.0 Version: Final Platform: x86-64 OS/Version: openSUSE 11.0 Status: NEW Severity: Normal Priority: P5 - None Component: Booting AssignedTo: jsrain@novell.com ReportedBy: syseng@adnovum.ch QAContact: jsrain@novell.com Found By: --- System is configured as an LDAP client for authenticating users and looking up automount entries. When LDAP is enabled in /etc/nsswitch.conf, then dbus-daemon produces a segmentation fault at boot after the kernel has loaded and the system is entering runlevel 5. Error message: Starting D-Bus daemon/etc/init.d/dbus: line 45: 1401 segmentation fault $DBUS_DAEMON_BIN $DBUS_DAEMON_PARAMETER If a 32bit OS is installed on EMT-64 hardware then boot process also freezes when entering runlevel 5 and acpid daemon and LDAP client is enabled. Also /usr/sbin/automount produces a core file in /. There is no core file from dbus-daemon. This behaviour can be reproduced by enabling or disabling ldap in /etc/nsswitch.conf. /etc/nsswitch.conf: # dbus-daemon starts at boot passwd: files group: files # dbus-daemon segfault at boot passwd: files ldap group: files ldap # dbus-daemon segfault at boot passwd: compat group: compat passwd_compat: ldap group_compat: ldap Hardware: Dell Precision 670, VMware Workstation 6.0.5 Operating System: Opensuse 11.0 (32bit and 64bit Editions + latest patches adnws005:~ # uname -a Linux adnws005 2.6.25.5-1.1-pae #1 SMP 2008-06-07 01:55:22 +0200 i686 i686 i386 GNU/Linux adnws005:/ # rpm -qa |grep dbus dbus-1-qt3-0.62-179.1 dbus-1-qt3-devel-0.62-179.1 dbus-1-x11-1.2.1-18.1 ndesk-dbus-0.6.0-28.1 dbus-1-glib-devel-0.74-88.1 dbus-1-1.2.1-15.1 dbus-1-python-0.82.4-49.1 dbus-1-glib-0.74-88.1 libdbus-1-qt3-0-0.8.1-24.1 ndesk-dbus-glib-0.4.1-0.1 dbus-1-devel-1.2.1-15.1 adnws005:~ # cat /etc/openldap/ldap.conf # # /etc/ldap.conf for SUSE Linux and LDAPS # uri ldaps://ldap-net12.example.com:636 ldaps://ldap-net5.example.com:636 ldaps://ldap-net1.example.com:636 base ou=zh,dc=example,dc=com scope one ldap_version 3 # # SSL/TLS Settings (cert checking does not work) # ssl on # sslpath /etc/ssl/certs/cert7.db # tls_cacertfile /etc/ssl/certs/adnovum-ca.pem tls_reqcert never tls_checkpeer no tls_crlcheck none # # Bind User # binddn cn=proxyagent,ou=special_users,dc=example,dc=com bindpw ********* # # Misc. Settings # debug 0 timelimit 30 bind_timelimit 30 idle_timelimit 60 # # Change NSS search base due to localized automount tables # nss_base_passwd ou=people,dc=example,dc=com nss_base_shadow ou=people,dc=example,dc=com nss_base_group ou=group,dc=example,dc=com nss_base_hosts ou=hosts,dc=example,dc=com nss_base_services ou=services,dc=example,dc=com nss_base_networks ou=networks,dc=example,dc=com nss_base_protocols ou=protocols,dc=example,dc=com nss_base_rpc ou=rpc,dc=example,dc=com nss_base_ethers ou=ethers,dc=example,dc=com nss_base_netmasks ou=networks,dc=example,dc=com nss_base_netgroup ou=netgroup,dc=example,dc=com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c1
--- Comment #1 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c2
--- Comment #2 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
Jiri Srain
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c3
--- Comment #3 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c4
--- Comment #4 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c5
--- Comment #5 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c6
Ralf Haferkamp
adnws005:~ # cat /etc/openldap/ldap.conf # # /etc/ldap.conf for SUSE Linux and LDAPS # uri ldaps://ldap-net12.example.com:636 ldaps://ldap-net5.example.com:636 ldaps://ldap-net1.example.com:636 base ou=zh,dc=example,dc=com scope one ldap_version 3 # # SSL/TLS Settings (cert checking does not work) # ssl on # sslpath /etc/ssl/certs/cert7.db # tls_cacertfile /etc/ssl/certs/adnovum-ca.pem tls_reqcert never tls_checkpeer no tls_crlcheck none # # Bind User # binddn cn=proxyagent,ou=special_users,dc=example,dc=com bindpw ********* # # Misc. Settings # debug 0 timelimit 30 bind_timelimit 30 idle_timelimit 60 # # Change NSS search base due to localized automount tables # nss_base_passwd ou=people,dc=example,dc=com nss_base_shadow ou=people,dc=example,dc=com nss_base_group ou=group,dc=example,dc=com nss_base_hosts ou=hosts,dc=example,dc=com nss_base_services ou=services,dc=example,dc=com nss_base_networks ou=networks,dc=example,dc=com nss_base_protocols ou=protocols,dc=example,dc=com nss_base_rpc ou=rpc,dc=example,dc=com nss_base_ethers ou=ethers,dc=example,dc=com nss_base_netmasks ou=networks,dc=example,dc=com nss_base_netgroup ou=netgroup,dc=example,dc=com
You should add a line bind_policy soft to you /etc/ldap.conf file. That should at least avoid the system hanging during boot. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c7
--- Comment #7 from Bernd Nies
/etc/ldap.conf is replaced by a symlink to /etc/openldap.conf because I don't understand why having two different ldap.conf, one used by automounter, the other used by NSS. /etc/ldap.conf and /etc/openldap/ldap.conf are for completely differnt
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c8
--- Comment #8 from Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c9
--- Comment #9 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c10
--- Comment #10 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c11
Timo Hoenig
https://bugzilla.novell.com/show_bug.cgi?id=427313
User R.Vickers@cs.rhul.ac.uk added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c12
Bob Vickers
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c13
--- Comment #13 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c14
Timo Hoenig
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c15
--- Comment #15 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c16
--- Comment #16 from Timo Hoenig
Just a wild guess: I assume the screwed parts are nscd and/or lib_nss and dbus-daemon is a side effect.
Doesn't sound too wild ;-)
(1) Dbus doesn't need to lookup LDAP at boot because all system users/groups are in /etc/passwd and /etc/group and the lookup order in /etc/nsswitch.conf is "files ldap".
OK, are we sure that this is working -- I mean, if D-Bus hangs somewhere looking up some uid/gid that sounds fishy. It does hang, right?
(2) nscd crashes very frequently (with less than an hour) on openSUSE 11.0. On Suse Linux 10.1 with the same setup it ran at least fine for months.
OK, but if nscd crashes, does that affect D-Bus?
(3) The lib_nss seems to have another bug where it ignores the lookup order defined in /etc/nsswitch.conf. It occurs when trying to get a hostname wit a multicast reserved .local domainname with getent: See https://bugzilla.novell.com/show_bug.cgi?id=435261
OK, do we have something like that? D-Bus starts, looks up {u,g}id for messagebus, this ends up in nscd/lib_nss, this fails as of (3). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c17
Ralf Haferkamp
It is quite clear that D-Bus and LDAP are not playing well as the dependencies are screwed
* Netowkr needs D-Bus -- at least when using NetworkManager * LDAP needs D-Bus LDAP doesn't need DBus. Ok, in order to successfully resolve users it needs Network which might need DBus, but nss_ldap can handle the case when the network is not available and (!) "bind_policy soft" is set. We set "bind_policy soft" by default.
* D-Bus can't lookup uid/gid as it assumes that this needs to go through LDAP What kind of lookups does it need to do on startup? And how is it handled e.g. with other network based NSS sources (winbind)? I am not exactly sure what the remaining problem of the bug is. The original bug report was about dbus crashing during boot. I am not able to reproduce the problem here, but it might just be another incarnation of bug#407552, for which a fix should hopefully be release rather soon now. "bind_policy soft" doesn't cause dbus to crash anymore, that setting is absolutely essential to successfully use nss_ldap nowadays.
The VMware problem might be something completely different. I suspect it is similar to the Thunderbird/ldap (bnc#157078) problem, conflicting versions of libaries might be loaded an cause problems. IIRC vmware comes with an own version of the openssl libaries which cause us trouble in the past. Bernd: could you please try to temporary disable SSL for nss_ldap to verify my assumption ("ssl off" in /etc/ldap.conf). If you can start VMware then we know at least that that problem is unrelated to dbus. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c18
--- Comment #18 from Timo Hoenig
(In reply to comment #14 from Timo Hoenig)
* Netowkr needs D-Bus -- at least when using NetworkManager * LDAP needs D-Bus LDAP doesn't need DBus.
That, of course should read LDAP needs network.
Ok, in order to successfully resolve users it needs Network which might need DBus, but nss_ldap can handle the case when the network is not available and (!) "bind_policy soft" is set. We set "bind_policy_soft" by default.
Since when? According to comment it wasn't set? Or did the user use a own configuration file (not our defaults)?
* D-Bus can't lookup uid/gid as it assumes that this needs to go through LDAP What kind of lookups does it need to do on startup? And how is it handled e.g. with other network based NSS sources (winbind)? I am not exactly sure what the remaining problem of the bug is. The original bug report was about dbus crashing during boot. I am not able to reproduce the problem here, but it might just be another incarnation of bug#407552, for which a fix should hopefully be release rather soon now. "bind_policy soft" doesn't cause dbus to crash anymore, that setting is absolutely essential to successfully use nss_ldap nowadays.
OK. All this sounds as if one who can reproduce the issue should wait for the fix (bug #407552) and retest if D-Bus still crashes.
The VMware problem might be something completely different. I suspect it is similar to the Thunderbird/ldap (bnc#157078) problem, conflicting versions of libaries might be loaded an cause problems. IIRC vmware comes with an own version of the openssl libaries which cause us trouble in the past. Bernd: could you please try to temporary disable SSL for nss_ldap to verify my assumption ("ssl off" in /etc/ldap.conf). If you can start VMware then we know at least that that problem is unrelated to dbus.
OK, thanks for all the information, Ralf -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c19
--- Comment #19 from Ralf Haferkamp
Hi,
Just a wild guess: I assume the screwed parts are nscd and/or lib_nss and dbus-daemon is a side effect.
(1) Dbus doesn't need to lookup LDAP at boot because all system users/groups are in /etc/passwd and /etc/group and the lookup order in /etc/nsswitch.conf is "files ldap". Unfortunately this is not that easy. It is perfectly ok for a user in /etc/passwd to be a member of an LDAP based group. So in order to find out the group memberships of a local user the NSS system might endup trying to do LDAP lookups. (This is were it could be getting hairy ;) ). In cases were the initgroups() call is causing this trouble we have a chance to blacklist users in the nss_ldap configuration ( nss_initgroups_ignoreusers ). Timo: Does dbus any initgroups()-call for users != root?
(2) nscd crashes very frequently (with less than an hour) on openSUSE 11.0. On Suse Linux 10.1 with the same setup it ran at least fine for months. Hopefully we'll get a more robust nscd for 11.0 soon. (see Petr's comments in bnc#157078#c59
(3) The lib_nss seems to have another bug where it ignores the lookup order defined in /etc/nsswitch.conf. It occurs when trying to get a hostname wit a multicast reserved .local domainname with getent: See https://bugzilla.novell.com/show_bug.cgi?id=435261 I'd say this is completely unrelated to this bug here as the ".local" domain is handled completely different.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c20
--- Comment #20 from Bernd Nies
OK, are we sure that this is working -- I mean, if D-Bus hangs somewhere looking up some uid/gid that sounds fishy. It does hang, right?
In most cases dbus-daemon segfaulted when "bind_policy soft" was not set in /etc/ldap.conf. Sometimes the boot process hung when entering runlevel 5 and dbus was launched. I thought it could also be a acpi problem. It was not clear because the printed boot messages seem not to be always in sync with the daemons started. But it never hung or crashed with disabled LDAP or "bind_policy soft". When it crashed at boot there were no core files from that process - just from automount.
OK, but if nscd crashes, does that affect D-Bus?
No. It affects when one wants to launch VMware Workstation (freezes) or Thunderbird (segfaults) as LDAP user. Restarting nscd periodically fixed that issue. I needed to increase the cache sizes from 211 to 1999 in /etc/nscd.conf because we already have more than 211 users in LDAP and the cache already got filled when one types "getent passwd".
OK, do we have something like that? D-Bus starts, looks up {u,g}id for messagebus, this ends up in nscd/lib_nss, this fails as of (3).
Sounds reasonable.
Bernd: could you please try to temporary disable SSL for nss_ldap to verify my assumption ("ssl off" in /etc/ldap.conf). If you can start VMware then we know at least that that problem is unrelated to dbus.
This was already one of the first steps for identifying the problem because it seems that very few people use LDAP and even fewer use LDAP over SSL. Disabling SSL did not help. Just disabling LDAP at all or start crashed nscd. Bye, Bernd -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
(In reply to comment #17 from Ralf Haferkamp)
(In reply to comment #14 from Timo Hoenig) [..] Ok, in order to successfully resolve users it needs Network which might need DBus, but nss_ldap can handle the case when the network is not available and (!) "bind_policy soft" is set. We set "bind_policy_soft" by default.
Since when? I don't no the exact SUSE version, but IIRC it is present since 10.2 at least
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c21
--- Comment #21 from Ralf Haferkamp
According to comment it wasn't set? Or did the user use a own configuration file (not our defaults)? I guess it's a manual configuration. See comment #7.
* D-Bus can't lookup uid/gid as it assumes that this needs to go through LDAP What kind of lookups does it need to do on startup? And how is it handled e.g. with other network based NSS sources (winbind)? I am not exactly sure what the remaining problem of the bug is. The original bug report was about dbus crashing during boot. I am not able to reproduce the problem here, but it might just be another incarnation of bug#407552, for which a fix should hopefully be release rather soon now. "bind_policy soft" doesn't cause dbus to crash anymore, that setting is absolutely essential to successfully use nss_ldap nowadays.
OK. All this sounds as if one who can reproduce the issue should wait for the fix (bug #407552) and retest if D-Bus still crashes. Yes probably. An alternative would be to test with the Packages from http://download.opensuse.org/repositories/network:/ldap:/OpenLDAP:/RE24/open... Those have the fixes already on board.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c22
--- Comment #22 from Bernd Nies
According to comment it wasn't set? Or did the user use a own configuration file (not our defaults)?
The /etc/ldap.conf was written by an Autoyast init script because our LDAP setup cannot be done with (Auto)yast. For example we have different automount search bases for different enterprise locations. I created the entire Autoyast setup procedure when Suse 9.0 was the latest release and just adopted the changes for every release. At that time there was no "bind_policy soft" in the default config file. Obviously it came around with Suse 10.1. Bye Bernd -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
I created the entire Autoyast setup procedure when Suse 9.0 was the latest release and just adopted the changes for every release. At that time there was no "bind_policy soft" in the default config file. Yes, the different "bind_policy" feature was made available in nss_ldap at some
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c23
Ralf Haferkamp
OK, but if nscd crashes, does that affect D-Bus?
No. It affects when one wants to launch VMware Workstation (freezes) or Thunderbird (segfaults) as LDAP user. Restarting nscd periodically fixed that issue. If restarting nscd fixes the startup freezes of VMware it suggests, that this is a separate issue which most probably has nothing to do with DBus. Please file a separate bug report for that.
I needed to increase the cache sizes from 211 to 1999 in /etc/nscd.conf because we already have more than 211 users in LDAP and the cache already got filled when one types "getent passwd". Please note that setting the cachesize in nscd.conf to N does not mean that nscd will only cache N users/groups/... the relation is a bit more complex AFAIK.
Bernd: could you please try to temporary disable SSL for nss_ldap to verify my assumption ("ssl off" in /etc/ldap.conf). If you can start VMware then we know at least that that problem is unrelated to dbus.
This was already one of the first steps for identifying the problem because it seems that very few people use LDAP and even fewer use LDAP over SSL. Well my experience regarding the usage is a bit different :)
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=427313
User thoenig@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c24
--- Comment #24 from Timo Hoenig
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c25
--- Comment #25 from Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=427313
User syseng@adnovum.ch added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c26
--- Comment #26 from Bernd Nies
https://bugzilla.novell.com/show_bug.cgi?id=427313
User rhafer@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=427313#c27
Ralf Haferkamp
participants (1)
-
bugzilla_noreply@novell.com