[Bug 516339] New: mysqld regularly crashes
http://bugzilla.novell.com/show_bug.cgi?id=516339 Summary: mysqld regularly crashes Classification: openSUSE Product: openSUSE 11.0 Version: Final Platform: x86-64 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: per.jessen@enidan.com QAContact: qa@suse.de Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070730 SUSE/2.0.0.6-25 Firefox/2.0.0.6 I'm not sure where this report really belongs, but as I am running an openSUSE built mysql binary I thought I'd start here. Not quite every day but perhaps 3-4 times a week, my mysql server crashes. It happens at roughly the same time every day, during some morning batch processing. Usually the table 'quarantine_archive' will need repairing, but not always. It has some 38 million rows. The batchjob that is running does a few hundred queries like this: mysql -Ddb_spamchek -h localhost -u spamchek_batch --password=spamchek -s -B -e "INSERT IGNORE INTO quarantine_archive SELECT * FROM quarantine WHERE state=1 AND domain='example.com'; DELETE FROM quarantine WHERE state=1 AND domain='example.com'" They're not running in parallel, but one-by-one. I'll attach the mysqld.log and some hardware info. Reproducible: Sometimes -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c1 --- Comment #1 from Per Jessen <per.jessen@enidan.com> 2009-06-25 01:17:18 MDT --- Created an attachment (id=300276) --> (http://bugzilla.novell.com/attachment.cgi?id=300276) mysqld log file I have replaced the actual domain names with 'example.com'. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c2 --- Comment #2 from Per Jessen <per.jessen@enidan.com> 2009-06-25 01:19:59 MDT --- Created an attachment (id=300279) --> (http://bugzilla.novell.com/attachment.cgi?id=300279) output from hwinfo -all -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User meissner@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c3 Marcus Meissner <meissner@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |meissner@novell.com Info Provider| |per.jessen@enidan.com AssignedTo|bnc-team-screening@forge.pr |mhrusecky@novell.com |ovo.novell.com | --- Comment #3 from Marcus Meissner <meissner@novell.com> 2009-06-25 11:28:15 MDT --- its still lackking info... can you attach gdb to mysqld and try getting a better backtrace? perhaps with debuginfo installed too? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c4 Per Jessen <per.jessen@enidan.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|per.jessen@enidan.com | --- Comment #4 from Per Jessen <per.jessen@enidan.com> 2009-06-26 00:47:43 MDT --- How about a core dump? That would be easy to acquire. I can certainly attach gdb, but this problem usually happens between 5 and 6 in the morning. On the weekend it might be possible to delay the batchjob and then try to reproduce with gdb attached. Can you provide detailed instructions for gdb, please. I take 'debuginfo' to mean the 'mysql-debug' package? Yep, I can install that, but I guess you would also me to run it instead of the regular server? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User meissner@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c5 --- Comment #5 from Marcus Meissner <meissner@novell.com> 2009-06-26 02:27:38 MDT --- mysql-debuginfo actually, from http://download.opensuse.org/debug/update/11.0/ these are just addon debuginfos for gdb only, no special binaries. a core would be fine too, try getting a backtrace using gdb from it. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User mhrusecky@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c6 Michal Hrusecky <mhrusecky@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |per.jessen@enidan.com --- Comment #6 from Michal Hrusecky <mhrusecky@novell.com> 2009-07-02 07:43:28 MDT --- Trace may also provide some clue to what's going wrong. You can get trace easily by installing mysql-debug package and setting MYSQLD_DEBUG=yes in /etc/sysconfig/mysqld. After restarting MySQL, trace can be found in /var/lib/mysql/mysqld.trace. I guess it will be quite huge trace in your case... Maybe the best way will be uploading something like last thousand of lines of trace (tail -n 1000 /var/lib/mysql/mysqld.trace > mysql-upload.trace) and keeping copy of trace somewhere locally (if some additional info will be needed later). I guess this bug wouldn't be easy to solve. As you are running openSUSE 11.0, you may be interested in trying MySQL from openSUSE 11.1 in the meantime (don't know if it is not affected as well). It can be found in server:database:STABLE repository (http://download.opensuse.org/repositories/server:/database:/STABLE/openSUSE_...) It may cause some problems with other packages linked against MySQL but if you are using MySQL only as a server, it may worth trying. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c7 --- Comment #7 from Per Jessen <per.jessen@enidan.com> 2009-07-02 07:53:32 MDT --- I've enabled core-files in mysqld_safe, and I'm just waiting for it to happen again. The last crash was about 40 hours ago. I'm willing to try the trace, but given that this server is running about 50 queries/sec on average, the trace file could be gigantic. I only have some 250Gb space for it. Upgrading the server might be an option, I'll check. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 Michal Hrusecky <mhrusecky@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c8 --- Comment #8 from Per Jessen <per.jessen@enidan.com> 2009-07-20 02:03:12 MDT --- I've have had a few crashes since the last update, but no core files. I've just now realized that I had to modify /proc/sys/fs/suid_dumpable to get the core. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c9 --- Comment #9 from Per Jessen <per.jessen@enidan.com> 2009-07-27 00:35:05 MDT --- Okay, I've finally managed to get a couple of core dumps, three in fact. They're all about 2.3Gb. Compressed they're about 400M each. I can make them available if needed. I'm attaching the backtraces. The mysql-debuginfo didn't match: # rpm --install /home/per/downloads/mysql-debuginfo-5.0.51a-27.2.x86_64.rpm error: Failed dependencies: mysql = 5.0.51a-27.2 is needed by mysql-debuginfo-5.0.51a-27.2.x86_64 I installed it with nodeps, but I'm guessing it won't be much good. Let me know what you need me to do. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c10 --- Comment #10 from Per Jessen <per.jessen@enidan.com> 2009-07-27 00:37:08 MDT --- Created an attachment (id=307810) --> (http://bugzilla.novell.com/attachment.cgi?id=307810) backtrace -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c11 --- Comment #11 from Per Jessen <per.jessen@enidan.com> 2009-07-27 00:37:56 MDT --- Created an attachment (id=307811) --> (http://bugzilla.novell.com/attachment.cgi?id=307811) backtrace -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c12 Per Jessen <per.jessen@enidan.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|per.jessen@enidan.com | --- Comment #12 from Per Jessen <per.jessen@enidan.com> 2009-07-27 00:38:57 MDT --- Created an attachment (id=307815) --> (http://bugzilla.novell.com/attachment.cgi?id=307815) backtrace -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 Michal Hrusecky <mhrusecky@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c13 --- Comment #13 from Per Jessen <per.jessen@enidan.com> 2009-07-31 03:16:21 MDT --- Have now had another 3-4 dumps, and the backtraces all look very similar. I'm not certain, but it looks like this is more prone to happen when the server is doing a lot of work. We are very tempted to try to upgrade to a newer version this weekend. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User mhrusecky@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c14 --- Comment #14 from Michal Hrusecky <mhrusecky@novell.com> 2009-08-03 06:33:41 MDT --- My finding so far: According to the backtraces, MySQL crashes somewhere around bmove512 function while working with key cache in MyISAM storage engine. Most likely it seems like there is some kind of bad pointer (I read code around this function and although it looks quite ugly, it looks correct). So what is needed to fix this is to find out where the bad pointer comes from and how something like this could happened. Possibilities that came across my mind so far: Somehow corrupted key file. If this is the case, dumping table, deleting it and recreating it should help. But as MySQL is trying fix key file after the crash (mentioned in logs) it looks like this is not very probable. Some kind of race condition somewhere. This would explain why it is happening only under heavy load, but it will be hard to find and fix. Newer version differs in some mutexes so this is quite possible. As problem seems to be MyISAM related, possible workaround may be to migrate affected table to different storage engine. For example InnoDB. Still trying to figure out what went wrong, but wanted to provide some feedback in the meantime. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c15 --- Comment #15 from Per Jessen <per.jessen@enidan.com> 2009-08-04 02:22:27 MDT --- (In reply to comment #14)
My finding so far:
Possibilities that came across my mind so far:
Somehow corrupted key file. If this is the case, dumping table, deleting it and recreating it should help. But as MySQL is trying fix key file after the crash (mentioned in logs) it looks like this is not very probable.
I did in fact dump the table, truncated it and have tried to reload it since. Just reloading it causes the problem, so it is in fact now very reproducable.
Some kind of race condition somewhere. This would explain why it is happening only under heavy load, but it will be hard to find and fix. Newer version differs in some mutexes so this is quite possible.
It is no longer only under heavy load, see above.
As problem seems to be MyISAM related, possible workaround may be to migrate affected table to different storage engine. For example InnoDB.
Out of the question.
Still trying to figure out what went wrong, but wanted to provide some feedback in the meantime.
Thanks, feedback much appreciated. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c16 --- Comment #16 from Per Jessen <per.jessen@enidan.com> 2009-08-04 02:27:10 MDT --- I should have added - the table in question is large, some 40 million rows or thereabouts. It isn't actually used during the day, it is an archive table which is updated once a day. When I try to reload it now, it is not in use by anything else. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User mhrusecky@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c17 Michal Hrusecky <mhrusecky@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |per.jessen@enidan.com --- Comment #17 from Michal Hrusecky <mhrusecky@novell.com> 2009-08-04 07:08:34 MDT --- If problem is well reproducible now, can you provide an example of dump? I don't mean all 40 milions rows, but how the tables look like and example of data, I can probably generate something similar and try to reproduce the problem on my machine... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c18 Per Jessen <per.jessen@enidan.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|per.jessen@enidan.com | --- Comment #18 from Per Jessen <per.jessen@enidan.com> 2009-08-05 01:21:37 MDT --- Here is a partial dump: http://jessen.ch/files/bug516339-mysqldump-quarantine-archive.lzma (7Mb) I did think about reproducing this myself, but I don't have anything similar hardware I can test on. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c19 --- Comment #19 from Per Jessen <per.jessen@enidan.com> 2009-08-28 03:07:59 MDT --- I've stopped the daily maintenance of the quarantine_archive table, which has reduced the crashes considerably. Nonetheless, I still had a crash 4 days ago which did not involve this table. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c20 --- Comment #20 from Per Jessen <per.jessen@enidan.com> 2009-09-02 07:40:13 MDT --- Any progress report on this? Today I had 7 consecutive crashes within 3 minutes. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User mhrusecky@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c21 --- Comment #21 from Michal Hrusecky <mhrusecky@novell.com> 2009-09-09 05:33:09 MDT --- Sorry, I still don't know what might cause this. MySQL is obviously trying to access some unallocated memory but I still can't find out where this wrong address comes from :-( -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c22 --- Comment #22 from Per Jessen <per.jessen@enidan.com> 2009-09-09 06:33:56 MDT --- Michal, as you've probably seen I ran out of time and had to go for the upgrade. I'm now running 5.1.36 and it seems to be stable. I'd like to keep this open for another week or so, but if it's still running stable then, we can close this. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User per.jessen@enidan.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c23 --- Comment #23 from Per Jessen <per.jessen@enidan.com> 2009-09-29 01:24:06 MDT --- Well, 5.1.36 has been running stable for 18 days now which is good enough for me. Feel free to close this issue. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=516339 User mhrusecky@novell.com added comment http://bugzilla.novell.com/show_bug.cgi?id=516339#c24 Michal Hrusecky <mhrusecky@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |WONTFIX --- Comment #24 from Michal Hrusecky <mhrusecky@novell.com> 2009-10-05 05:33:17 MDT --- Ok, thanks for bugreport, closing as WONTFIX as seems like this issue is fixed in new version and it is hard to fix in current one. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com