[Bug 1136139] New: sssd ad/ldap domain are offline after first boot (cannot resolv srv _ldap._srv)
http://bugzilla.suse.com/show_bug.cgi?id=1136139 Bug ID: 1136139 Summary: sssd ad/ldap domain are offline after first boot (cannot resolv srv _ldap._srv) Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: luizluca@tre-sc.jus.br QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 805903 --> http://bugzilla.suse.com/attachment.cgi?id=805903&action=edit sssd logs since boot until after workaround is done (and sssd is working) Hello, After a fresh 15.1 install, sssd starts all domains (AD or LDAP) in offline mode when it boots. My system uses DHCP and wicked. I can also reproduce always it in a LiveDVD-like system that uses overlayfs with a fixed squashfs. I didn't happen with 15.0. If I restart sssd service afterwards, it simply works as expected. I can also make it work with "kill -s SIGUSR2 $(pidof sssd)". "netconfig -f update" is not enough. However, it will be enough if I remove /etc/resolv.conf first. It looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1379415 When I increased debug-level, it shows that it cannot resolve SRV _ldap._srv for a LDAP domain: sd[be[mydomain]]] [resolve_srv_send] (0x0200): The status of SRV lookup is neutral sssd[be[mydomain]]] [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'mydomain.xxx.com' sssd[be[mydomain]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.mydomain.xxx.com' sssd[be[mydomain]]] [request_watch_destructor] (0x0400): Deleting request watch sssd[be[mydomain]]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers sssd[be[mydomain]]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working' sssd[be[mydomain]]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158237]: SRV lookup error sssd[be[mydomain]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'LDAP' as 'not resolved' sssd[be[mydomain]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158237]: SRV lookup error sssd[be[mydomain]]] [be_resolve_server_process] (0x1000): Trying with the next one! sssd[be[mydomain]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' sssd[be[mydomain]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' sssd[be[mydomain]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues. sssd[be[mydomain]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP' sssd[be[mydomain]]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Erro de entrada/saída sssd[be[mydomain]]] [dp_req_done] (0x0400): DP Request [Online Check #62]: Request handler finished [0]: Sucesso sssd[be[mydomain]]] [_dp_req_recv] (0x0400): DP Request [Online Check #62]: Receiving request data. sssd[be[mydomain]]] [dp_req_destructor] (0x0400): DP Request [Online Check #62]: Request removed. sssd[be[mydomain]]] [dp_req_destructor] (0x0400): Number of active DP request: 0 sssd[be[mydomain]]] [be_check_online_done] (0x0400): Backend is offline sssd[be[mydomain]]] [be_ptask_execute] (0x0400): Back end is offline sssd[be[mydomain]]] [be_ptask_schedule] (0x0400): Task [enumeration]: scheduling task 300 seconds from now [1558642974] "Erro de entrada/saída" = Input/output error "Sucesso" = Succcess And sniffing network, I cannot see any of these sssd DNS queries while any other system client (dig) can do it. The USR2 signal seems to force resolv.conf reload: [sssd] [signal_res_init] (0x0040): Reloading Resolv.conf. And everything works afterwards. sssd does seem to monitor /etc/resolv.conf for changes. However, it is failing in this specific situation. /etc/resolv.conf is missing in squashfs (or in a clean installation). It is created as symlink to /var/run/netconfig/resolv.conf by netconfig. The same workaround works for both AD or LDAP domains. I'll attach complete debugs for an AD domain. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c1
--- Comment #1 from Luiz Angelo Daros de Luca
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c2
--- Comment #2 from Luiz Angelo Daros de Luca
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c9
Samuel Cabrero
sssd could read /etc/resolv.conf link and monitor both /etc/resolv.conf and "/etc/resolv.conf target" for changes (if inotify can monitor files before path is available).
I am still not able to reproduce. Could you attach the full autoinst.xml please? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c10
--- Comment #10 from Samuel Cabrero
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c15
Samuel Cabrero
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c16
--- Comment #16 from Luiz Angelo Daros de Luca
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c18
--- Comment #18 from Samuel Cabrero
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c19
--- Comment #19 from Daniel Bischof
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c20
Andrew Daugherity
Could you try again with this new patched package please?
https://build.opensuse.org/package/show/home:scabrero:branches:openSUSE:Leap: 15.1:Update/sssd
I have also hit this bug, but unfortunately those packages do not fix the problem for me. Like the others' reports, restarting sssd or signaling with USR2 fixes it, but it fails on boot.
I think I have found the problem. SSSD setup a inotify watch over resolv.conf and its parent directory to be notified even when resolv.conf is created or changed, but as it is not resolving the symlink it is watching the directory containing the symlink and not /var/run/netconfig.
This is probably the right track, as I do not have this issue in Leap 15.0, which has the same version of sssd (1.16.1) but an older sysconfig-netconfig which writes /etc/resolv.conf as a normal file rather than a symlink. However, sssd debug logs do not indicate it re-reading resolv.conf after wicked has updated it. Rather, it still keeps trying lookups using an empty list of DNS servers, e.g.: ==== (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working' (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158237]: SRV lookup error (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'AD' as 'not resolved' (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158237]: SRV lookup error (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues. (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD' (Mon Jul 22 16:28:02 2019) [sssd[be[dor.tamu.edu]]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error ==== But this is well after the network was up: # journalctl -u 'wicked*' -u sssd -- Logs begin at Mon 2019-07-22 16:26:54 CDT, end at Mon 2019-07-22 16:30:20 CDT. -- Jul 22 16:27:01 dhcp-125 sssd[648]: Starting up Jul 22 16:27:01 dhcp-125 sssd[be[762]: Starting up Jul 22 16:27:02 dhcp-125 sssd[766]: Starting up Jul 22 16:27:02 dhcp-125 sssd[764]: Starting up Jul 22 16:27:02 dhcp-125 sssd[765]: Starting up Jul 22 16:27:05 dhcp-125 sssd[be[762]: Backend is offline Jul 22 16:27:05 dhcp-125 wickedd-dhcp4[639]: eth0: Request to acquire DHCPv4 lease with UUID 292a365d-f0fd-0800-a602-000004000> Jul 22 16:27:06 dhcp-125 wickedd-dhcp4[639]: eth0: Committed DHCPv4 lease with address 10.95.0.125 (lease time 22576 sec, rene> Jul 22 16:27:11 dhcp-125 wicked[768]: lo up Jul 22 16:27:11 dhcp-125 wicked[768]: eth0 up And in fact: # stat $(readlink /etc/resolv.conf) File: /var/run/netconfig/resolv.conf [...] Modify: 2019-07-22 16:27:07.167912609 -0500 ---- This "replace resolv.conf with a symlink" change (for what benefit?) seems to have had unintended consequences. If I replace it with a normal file, sssd starts properly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c21
--- Comment #21 from Andrew Daugherity
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c22
--- Comment #22 from Andrew Daugherity
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c24
Samuel Cabrero
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c27
Ladislav Slezák
It is just yast2-installer leaving a broken link as /etc/resolv.conf after a second stage with network. This seems to breaks inotify. Yast2-installer should not leave left-overs and sssd should deal with /etc/resolv.conf as a (temporary) broken link.
Lada - can you comment this part? Thanks
Knut was looking at the problem with the /etc/resolv.conf symlink in past. Could you check it please? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c28
--- Comment #28 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c29
--- Comment #29 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c30
Knut Alejandro Anderssen González
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c31
Samuel Cabrero
Samuel,
The bug seems to be solved after last patches, do you still need something from us?.
Hi Knut, not at the moment but thanks for asking. I am still waiting for upstream to merge the patch, so I leave the bug open as a reminder. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c33
--- Comment #33 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c34
--- Comment #34 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c36
--- Comment #36 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1136139
http://bugzilla.suse.com/show_bug.cgi?id=1136139#c37
Samuel Cabrero
http://bugzilla.suse.com/show_bug.cgi?id=1136139
Knut Alejandro Anderssen González
participants (1)
-
bugzilla_noreply@novell.com