Hello, Am Mittwoch, 10. Januar 2018, 11:54:03 CET schrieb Theo Chatzimichos:
It happened again twice, so it was definitely not a logrotate issue, but instead it was an upstream bug, possibly this [1]. So now I updated the hosts and mariadb was updated to a new patch version. Let's see if it crashes again now, if it does I'll file a ticket against our package.
It turned out that the mariadb update didn't change anything, and the cluster crashed at 19:31 UTC again - at least it was timely ;-) At least now we know what triggers the crash - as I already guessed [1] yesterday, it's the backup script (no kidding!) This script does database dumps and then optimizes all tables. The good news is that creating the database dumps works. The problematic part is optimizing all tables, therefore we disabled this part of the script now. After this change, the galera cluster survived two test runs of the backup script. The relevant part of the script that triggers the crash is: MYSQL_CHECK="/usr/bin/mysqlcheck" # ... MYSQL="/usr/bin/mysql" # ... function optimize() { if [ -x "$MYSQL_CHECK" ]; then LOG "Starting automatic repair and optimization of the databases/tables" "$MYSQL_CHECK" \ --all-databases \ --skip-database=lost+found \ --compress \ --auto-repair \ --optimize \ -u root 1>/dev/null 2>"$TMPFILE" "$MYSQL" -e "FLUSH QUERY CACHE;" 2>>"$TMPFILE" fi } It typically takes 2 seconds from writing the log entry to the crash. I found two bugreports that describe our problem exactly, including an exact match of our mysqld.log: https://github.com/codership/galera/issues/486 https://jira.percona.com/browse/PXC-881 I also found http://msutic.blogspot.de/2015/10/confusion-and-problems-with-lostfound.html - but the precondition "/var/lib/mysql/lost+found/ exists" doesn't match in our setup. However, we have root-owned mysql_upgrade_info (only) on galera1. It's a file, not a directory like lost+found would be, but it _could_ [2] still somehow be related. Regards, Christian Boltz [1] educated guess after reading the backup script, checking the database dumps' content and timestamps etc. [2] wild guess, and given that lost+found looks like a database directory to mysql while a file doesn't, I doubt the mysql_upgrade_info file is really the problem. OTOH - who would have thought that optimizing all tables crashes the cluster? ;-) -- In C we had to code our own bugs. In C++ we can inherit them. [Prof. Gerald Karam] -- To unsubscribe, e-mail: heroes+unsubscribe@opensuse.org To contact the owner, e-mail: heroes+owner@opensuse.org