Hello, (all times mentioned below are UTC) 1) identifying the problem yesterday 2018-01-03 at 19:37 the IRC bot sent monitoring messages that the websites events.opensuse.org and progress.opensuse.org are giving error 500. I verified the issue and I noticed that the mysql service on all three nodes is down. I pinged darix to help me with getting it back up, who figured out that it was caused because logrotate ran at the same time on all nodes (~19:30), which also restarted the mysql service, and due to that the cluster went down. 2) solving the problem After we verified that we have recent backups (on the second node), we first took a backup of /var/lib/mysql on all three nodes. Then we started recreating the cluster on the master node, which was successful and brought the websites up. Then we continued on the two slave nodes successfully. As last step we verified that all the websites were up again, and there was no data loss. We finished around 20:15, so the total downtime was around 45 minutes. 3) what we did to avoid the problem from happening again We changed the time that the daily cron jobs are running on the second and the third node, in order to avoid the issue is happening again. So now the daily cron jobs on the first node will run on 19:00, on the second node at 19:30 and on the third node at 20:00. 4) what we could also do to improve the situation - more frequent backups (eg 4 times per day) - enable backups on galera3 as well - make sure that the auto-update script doesn't run at the same time on all three hosts - add connect.opensuse.org webpage to the monitoring Special thanks to darix that saved the day! -- Theo Chatzimichos <tampakrap@opensuse.org> <tchatzimichos@suse.com> System Administrator SUSE Operations and Services Team