Re: [heroes] forums unreachable for past 30+ minutes

18 Jan 2022


      On 18/01/2022 01.49, Felix Miata wrote:
...
I have a help request response ready, but can't submit it. :(
I fixed it and here is the writeup:

https://www.reddit.com/r/openSUSE/comments/s6s4u6/service_outage_postmortem/
...
Some people have noticed problems with our forums earlier today. It was down from around 00:01 to 06:16 UTC
Other services were also affected: wiki, id.o.o and with that, logins to openqa and chat.o.o were also impossible.
So why did this outage happen? Here is what I found: on our login-proxy we have a custom AppArmor profile for apache2-worker / httpd processes to limit what it can do in case of a break-in. Also, we have a symlink `/etc/systemd/system/timers.target.wants/suse-online-update.timer -> /usr/lib/systemd/system/suse-online-update.timer` that auto-installs updates daily.
And these two nice security features both did what we asked them to and as computers do what we say (not do what we mean), it installed a new `apache2-worker-2.4.51-3.37.1` package that included a minor upstream version update from the previous 2.4.43 version and that now wanted to create a `/run/httpd.pid.Gy7vP` on start, but the AppArmor profile prevented that, so startup failed and there was no proxying to the services behind and not even a nice error-503 page with an upside-down chameleon.
And since the job defaults to "daily" which seems to be an alias for 00:00, there were no admins around to fix it.
So now, apart from the immediate fix of the AppArmor profile to allow not just `/run/httpd.pid` but also `/run/httpd.pid*`, I used `systemctl edit suse-online-update.timer` to ensure that updates happen at a more convenient time of day:
[Timer]
    OnCalendar=*-*-* 8:00:00
So chances are, the next outage will not be as long.
Ciao
Bernhard M.

Re: [heroes] forums unreachable for past 30+ minutes

Bernhard M. Wiedemann