[Bug 547074] New: Kernel Oops in connection with mysqldump or some other mysql queries
http://bugzilla.novell.com/show_bug.cgi?id=547074 User lisa@underdogmedia.com added comment http://bugzilla.novell.com/show_bug.cgi?id=547074#c1 Summary: Kernel Oops in connection with mysqldump or some other mysql queries Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.1 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: lisa@underdogmedia.com QAContact: qa@suse.de Found By: --- Created an attachment (id=322555) --> (http://bugzilla.novell.com/attachment.cgi?id=322555) my.cnf files before (during) and after the event User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 Mysqldump hangs with ** DEAD ** processes in the processlist and concurrently an Oops is reported in the kernel messages. We've seen this in relation to other scripts that issue mysql queries SHOW TABLES LIKE 'xxx' So far this has only been seen in production. After forcibly restarting mysql and initiating the mysqldump prior to any other scripts - it completes successfully. We have seen this on both openSUSE 11.1 (x86_64) openSUSE 10.3 (X86-64) with mysql versions: Server version: 5.0.67 SUSE MySQL RPM Server version: 5.0.45 SUSE MySQL RPM Following are examples of the ** DEAD ** processes: | *** DEAD *** | SHOW TRIGGERS LIKE 'pubstats\_333' | *** DEAD *** | SHOW TRIGGERS LIKE 'advstats\_132' | 4019 | LogicMonitor | 207.178.145.207:44287 | NULL | Query | 7660 | *** DEAD *** | show table status from logicmonitor Following is a couple examples of the Oops info from the kernel: Program Xnest tried to access /dev/mem between 0->8000000. BUG: unable to handle kernel paging request at 00007f83a78aa000 IP: [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 PGD 22a5dc067 PUD 22a5b9067 PMD 22a5bd067 PTE 0 Oops: 0002 [1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 2 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler dell_rbu(X) binfmt_misc ipv6 fuse loop dm_mod joydev bnx2 ses sg enclosure sr_mod cdrom shpchp pci_hotplug rtc_cmos iTCO_wdt rtc_core iTCO_vendor_support dcdbas(X) rtc_lib button pcspkr i5000_edac serio_raw edac_core usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata dock thermal processor thermal_sys hwmon megaraid_sas scsi_mod Supported: Yes, External Pid: 5332, comm: mysqld Tainted: G 2.6.27.29-0.1-default #1 RIP: 0010:[<ffffffffa00fe5c9>] [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP: 0018:ffff88022ada1e18 EFLAGS: 00010297 RAX: 00007f83a78aa000 RBX: 00007f82bae4d8e8 RCX: 0000000000000028 RDX: 00000000000000b3 RSI: 0000000000000000 RDI: ffff88022c8eb924 RBP: ffff88022ada1f78 R08: ffff88017b94c8c0 R09: 0000000000000292 R10: 000432f66fa83498 R11: ffffffff80220fe6 R12: 00007f82b429c9a0 R13: 00007f82b58aee98 R14: 00007f82bae4aa30 R15: 00007f82bae4aa28 FS: 00007f82bae4d950(0000) GS:ffff88022f157ec0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f83a78aa000 CR3: 000000022c184000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mysqld (pid: 5332, threadinfo ffff88022ada0000, task ffff8802268a05c0) Stack: 00000000000009b4 0000000000000ff0 0000000000000010 00000000802afefe 0000000000000010 0000100000000001 00007f82b58aee98 000009b4ae11d200 612f636f72702f2f 0000000061616161 00007f83a78a9000 00007f83a78aa000 Call Trace: Inexact backtrace: [<ffffffff802b4952>] ? sys_newfstat+0x20/0x29 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b Code: 85 08 ff ff ff 48 03 85 f8 fe ff ff 48 89 45 e0 48 89 55 d8 89 4d d4 c7 45 d0 00 00 00 00 eb 1b 48 8b 45 d8 0f b6 10 48 8b 45 e0 <88> 10 48 83 45 e0 01 48 83 45 d8 01 83 45 d0 01 8b 45 d0 3b 45 RIP [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP <ffff88022ada1e18> CR2: 00007f83a78aa000 ---[ end trace 0c5179e3ad8fa603 ]--- BUG: unable to handle kernel paging request at 00007f83a78a8000 IP: [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 PGD 22a5dc067 PUD 22a5b9067 PMD 22a5bd067 PTE 0 Oops: 0002 [2] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler dell_rbu(X) binfmt_misc ipv6 fuse loop dm_mod joydev bnx2 ses sg enclosure sr_mod cdrom shpchp pci_hotplug rtc_cmos iTCO_wdt rtc_core iTCO_vendor_support dcdbas(X) rtc_lib button pcspkr i5000_edac serio_raw edac_core usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata dock thermal processor thermal_sys hwmon megaraid_sas scsi_mod Supported: Yes, External Pid: 5881, comm: mysqld Tainted: G D 2.6.27.29-0.1-default #1 RIP: 0010:[<ffffffffa00fe5c9>] [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP: 0018:ffff880228d5de18 EFLAGS: 00010283 RAX: 00007f83a78a8000 RBX: 00007f83a77e48e8 RCX: 0000000000000020 RDX: 0000000000000009 RSI: 0000000000000000 RDI: ffff88022c8eb924 RBP: ffff880228d5df78 R08: ffffffff806dc380 R09: 0000000000000292 R10: 0001e0d9de9eaa7e R11: ffffffff80220fe6 R12: 00007f82ae009eb0 R13: 00007f82ad54f128 R14: 00007f83a77e1a30 R15: 00007f83a77e1a28 FS: 00007f83a77e4950(0000) GS:ffff88022f0d8ac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f83a78a8000 CR3: 000000022c184000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mysqld (pid: 5881, threadinfo ffff880228d5c000, task ffff88022ac7a6c0) Stack: 00000000000009b1 ffff88019609c000 000000000000000a ffffffff802afefe 000000000000000a 00001000802b475d 00007f82ad54f128 000009b1b5496920 612f636f72702f2f 0000000061616161 00007f83a78a7000 00007f83a78a8000 Call Trace: Inexact backtrace: [<ffffffff802afefe>] ? do_sys_open+0xb9/0xc5 [<ffffffff802b4952>] ? sys_newfstat+0x20/0x29 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b Code: 85 08 ff ff ff 48 03 85 f8 fe ff ff 48 89 45 e0 48 89 55 d8 89 4d d4 c7 45 d0 00 00 00 00 eb 1b 48 8b 45 d8 0f b6 10 48 8b 45 e0 <88> 10 48 83 45 e0 01 48 83 45 d8 01 83 45 d0 01 8b 45 d0 3b 45 RIP [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP <ffff880228d5de18> CR2: 00007f83a78a8000 ---[ end trace 0c5179e3ad8fa603 ]--- BUG: unable to handle kernel paging request at 00007f83a78ac000 IP: [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 PGD 22a5dc067 PUD 22a5b9067 PMD 22a5bd067 PTE 0 Oops: 0002 [3] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0e:0d.0/local_cpus CPU 3 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler dell_rbu(X) binfmt_misc ipv6 fuse loop dm_mod joydev bnx2 ses sg enclosure sr_mod cdrom shpchp pci_hotplug rtc_cmos iTCO_wdt rtc_core iTCO_vendor_support dcdbas(X) rtc_lib button pcspkr i5000_edac serio_raw edac_core usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata dock thermal processor thermal_sys hwmon megaraid_sas scsi_mod Supported: Yes, External Pid: 5815, comm: mysqld Tainted: G D 2.6.27.29-0.1-default #1 RIP: 0010:[<ffffffffa00fe5c9>] [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP: 0018:ffff8802278f5e18 EFLAGS: 00010283 RAX: 00007f83a78ac000 RBX: 00007f82bad8a8e8 RCX: 0000000000000020 RDX: 0000000000000090 RSI: 0000000000000000 RDI: ffff88022c8eb924 RBP: ffff8802278f5f78 R08: ffff8800280876f0 R09: 0000000000000292 R10: 000454252003b65c R11: 0000000000000003 R12: 00007f82b52702f0 R13: 00007f82b4f8ac08 R14: 00007f82bad87a30 R15: 00007f82bad87a28 FS: 00007f82bad8a950(0000) GS:ffff88022f18fec0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f83a78ac000 CR3: 000000022c184000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mysqld (pid: 5815, threadinfo ffff8802278f4000, task ffff8802279b8700) Stack: 000000000000001a ffff88022dc44000 0000000000000006 ffffffff802afefe 0000000000000006 00001000802b475d 00007f82b4f8ac08 0000001ab5131270 612f636f72702f2f 0000000061616161 00007f83a78ab000 00007f83a78ac000 Call Trace: Inexact backtrace: [<ffffffff802afefe>] ? do_sys_open+0xb9/0xc5 [<ffffffff802b4952>] ? sys_newfstat+0x20/0x29 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b Code: 85 08 ff ff ff 48 03 85 f8 fe ff ff 48 89 45 e0 48 89 55 d8 89 4d d4 c7 45 d0 00 00 00 00 eb 1b 48 8b 45 d8 0f b6 10 48 8b 45 e0 <88> 10 48 83 45 e0 01 48 83 45 d8 01 83 45 d0 01 8b 45 d0 3b 45 RIP [<ffffffffa00fe5c9>] 0xffffffffa00fe5c9 RSP <ffff8802278f5e18> CR2: 00007f83a78ac000 ---[ end trace 0c5179e3ad8fa603 ]--- Reproducible: Sometimes Steps to Reproduce: We have only seen this on a system that is in production. I haven't been able to reproduce on a non-live system. We moved our production system to our backup system and the issue followed it. I am able to reproduce this issue almost all the time on the production system if I do the following: 1. run a script that does mysql dumps of each database outputting them to text files. 2. watch the processlist in mysql and wait for ** DEAD ** process 3. check dmesg and look for Oops Actual Results: 1. the mysqldump script hangs - 2. a ** DEAD ** process is listed in the mysql processlist 3. and Oops is reported in the dmesg output Expected Results: 1. mysqldump should finish without incident. This started happening on 10/27/09 I've attached the my.cnf that was in effect when this starting happening (my.cnf_before). We have since tried different settings to optimize our system and see if we can fix this issue. A copy of the my.cnf that is currently in use is also attached (my.cnf_after) . We had not done any upgrades of the system OS or of mysql anytime in the months prior to this event. We have since applied system patches and run the DELL hardware diagnostics (all passes). We have also replicated the event on another system, with hardware that also passes all diagnostics. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=547074 zhu rensheng <rszhu@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rszhu@novell.com AssignedTo|bnc-team-screening@forge.pr |kernel-maintainers@forge.pr |ovo.novell.com |ovo.novell.com -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=547074 http://bugzilla.novell.com/show_bug.cgi?id=547074#c1 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High CC| |jeffm@novell.com AssignedTo|kernel-maintainers@forge.pr |rjw@novell.com |ovo.novell.com | --- Comment #1 from Jeff Mahoney <jeffm@novell.com> 2009-11-21 17:40:11 UTC --- This is _really_ strange. This is one of the most frequently used call paths in the kernel. sys_newfstat+0x20/0x29 is the end of sys_newfstat, where it calls cp_new_stat. 0x00007f83a78aa000 is a userspace address, but faulting on a userspace address shouldn't cause an oops. Rafael, any ideas? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=547074 http://bugzilla.novell.com/show_bug.cgi?id=547074#c2 Rafael Wysocki <rjw@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |lisa@underdogmedia.com --- Comment #2 from Rafael Wysocki <rjw@novell.com> 2009-11-23 23:11:20 UTC --- (In reply to comment #1)
This is _really_ strange. This is one of the most frequently used call paths in the kernel. sys_newfstat+0x20/0x29 is the end of sys_newfstat, where it calls cp_new_stat. 0x00007f83a78aa000 is a userspace address, but faulting on a userspace address shouldn't cause an oops.
Rafael, any ideas?
It looks like a page table corruption of some sort. I guess the mysqldump script is the only program failing like this on the affected systems? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=547074 https://bugzilla.novell.com/show_bug.cgi?id=547074#c3 Rafael Wysocki <rjw@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |CLOSED InfoProvider|lisa@underdogmedia.com | Resolution| |NORESPONSE --- Comment #3 from Rafael Wysocki <rjw@novell.com> 2011-02-03 22:16:52 UTC --- openSUSE 11.1 is not supported any more, so closing. Please reopen if the problem is still present in openSUSE 11.3/11.4. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com