https://www.reddit.com/r/openSUSE/comments/sp5m9u/idp_problem_postmortem/
Yesterday I fixed a small outage that likely started 2022-02-03 08:16 and continued til 2022-02-09 16:30 UTC.
The effect was that user password changes via https://idp-portal.suse.com threw an error. Maybe other IDP functions to create and update accounts were also affected.
Background: Since SUSE split out from MicroFocus in 2020 and could not continue using their Novell Accessmanager service for handling openSUSE user accounts. Since then we operate our own identity Provider (IDP) using Univention Corporate Server (UCS). That is a Debian-based solution with professional support.
So what was the problem?
The IDP setup uses a main server that gets all the writes via Kerberos and several replicas that handle the authentication, mostly via LDAP. Yesterday we learned that password-updates were broken.
With the help of Univention support I could find that kpasswd did not work in a shell and with tcpdump -epni eth0 host 10.x.x.x I could see it try to communicate over UDP port 88 and see a reply of "Port unreachable". So I checked the main server and indeed, ss -uanp showed that port 88 was only bound to half of the IPs, but not the one it tried to reach.
With a simple /etc/init.d/heimdal-kdc restart on the main server, the kerberos process started to listen on all IPs and thus password changes were fixed. While the immediate outage was over, I still spent the next morning to find out why it failed like this. With systemd-analyze plot > plot.svg I could see that kdc was started long before the network-online.target was reached. Since this is still using old SysV-init scripts, I added a $network to its Required-Start line and on next boot, the .svg looked better. This gave us back an IDP that is working even after a boot.
The only remaining mystery is why this issue has not shown up earlier. At least https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=heimdal-kdc does not have reports in that direction and the debian.tar.xz in https://packages.debian.org/de/bullseye/heimdal-kdc contains the same problematic Required-Start line. So that mystery will probably remain...
participants (1)
-
Bernhard M. Wiedemann