[Bug 353252] New: Spamassassin sa-learn causing kernel errors
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c2 Summary: Spamassassin sa-learn causing kernel errors Product: openSUSE 10.3 Version: Final Platform: i686 OS/Version: openSUSE 10.3 Status: NEW Severity: Major Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: drankinatty@suddenlinkmail.com QAContact: qa@suse.de Found By: --- The sa-learn part of spam assassin is causing kernel errors when run under openSuSE 10.3. The error and backtrace and calltrace are shown in the error log and output shown below: Jan 10 13:00:02 bonza /usr/sbin/cron[9911]: (david) CMD (/home/david/linux/scripts/Learn_as_spam_cron) Jan 10 13:00:02 bonza imapd[9931]: PAM audit_log_acct_message() failed: Operation not permitted Jan 10 13:00:06 bonza kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044 Jan 10 13:00:06 bonza kernel: printing eip: Jan 10 13:00:06 bonza kernel: c02c5027 Jan 10 13:00:06 bonza kernel: *pde = 00000000 Jan 10 13:00:06 bonza kernel: Oops: 0002 [#2] Jan 10 13:00:06 bonza kernel: SMP Jan 10 13:00:06 bonza kernel: last sysfs file: /class/input/input4/event4/dev Jan 10 13:00:06 bonza kernel: Modules linked in: joydev st ide_disk ide_cd ide_core af_packet iptable_filter ip_tables ip6_tables x_tables apparmor loop 8139cp 8139too mii button sr_mod i2c_nforce2 shpchp nvidia_agp cdrom rtc_cmos rtc_core rtc_lib pci_hotplug agpgart i2c_core sg dm_mirror dm_log ehci_hcd sd_mod ohci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Jan 10 13:00:06 bonza kernel: CPU: 0 Jan 10 13:00:06 bonza kernel: EIP: 0060:[<c02c5027>] Tainted: G N VLI Jan 10 13:00:06 bonza kernel: EFLAGS: 00210286 (2.6.22.13-0.3-default #1) Jan 10 13:00:06 bonza kernel: EIP is at _spin_lock+0x1/0xf Jan 10 13:00:06 bonza kernel: eax: 00000044 ebx: d4cdf978 ecx: d4cdf888 edx: 00000001 Jan 10 13:00:06 bonza kernel: esi: 00000000 edi: 0000004e ebp: d4cdf930 esp: d940dd84 Jan 10 13:00:06 bonza kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Jan 10 13:00:06 bonza kernel: Process sa-learn (pid: 10213, ti=d940c000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ task=ce19bab0 task.ti=d940c000) Jan 10 13:00:06 bonza kernel: Stack: c018de77 c1990b40 c1990b60 c1990b80 c1990ba0 c1990bc0 d4cdf978 d4cdf888 Jan 10 13:00:06 bonza kernel: 00000000 0000004e 0000004e c0180a14 00000080 d4cdfa7c e23cdabc 00019ed8 Jan 10 13:00:06 bonza kernel: dfffeaa0 00000096 000280d2 c015add0 00004950 000024a8 00000000 00cf6c00 Jan 10 13:00:06 bonza kernel: Call Trace: Jan 10 13:00:06 bonza kernel: [<c018de77>] remove_inode_buffers+0x3b/0xc8 Jan 10 13:00:06 bonza kernel: [<c0180a14>] shrink_icache_memory+0xc5/0x1b6 Jan 10 13:00:06 bonza kernel: [<c015add0>] shrink_slab+0xde/0x13b Jan 10 13:00:06 bonza kernel: [<c015b6b8>] try_to_free_pages+0x173/0x247 Jan 10 13:00:06 bonza kernel: [<c015731a>] __alloc_pages+0x1a1/0x2d6 Jan 10 13:00:06 bonza kernel: [<c0162361>] find_mergeable_anon_vma+0x5e/0xb2 Jan 10 13:00:06 bonza kernel: [<c015f61a>] __handle_mm_fault+0x482/0xa83 Jan 10 13:00:06 bonza kernel: [<c011c854>] kunmap_atomic+0x54/0x7c Jan 10 13:00:06 bonza kernel: [<c011c860>] kunmap_atomic+0x60/0x7c Jan 10 13:00:06 bonza kernel: [<c0137c62>] getnstimeofday+0x30/0xbe Jan 10 13:00:06 bonza kernel: [<c0161668>] vma_adjust+0x1ca/0x360 Jan 10 13:00:06 bonza kernel: [<c02c67ca>] do_page_fault+0x26c/0x5a6 Jan 10 13:00:06 bonza kernel: [<c02c655e>] do_page_fault+0x0/0x5a6 Jan 10 13:00:06 bonza kernel: [<c02c5362>] error_code+0x72/0x80 Jan 10 13:00:06 bonza kernel: ======================= Jan 10 13:00:06 bonza kernel: Code: 05 90 ff 00 30 d2 89 d0 c3 89 c2 90 81 28 00 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 90 81 02 00 00 00 01 30 c9 89 c8 c3 90 <fe> 08 79 09 f3 90 80 38 00 7e f9 eb f2 c3 90 81 28 00 00 00 01 Jan 10 13:00:06 bonza kernel: EIP: [<c02c5027>] _spin_lock+0x1/0xf SS:ESP 0068:d940dd84 The PAM error is not the cause of the sa-learn failure. See: https://bugzilla.novell.com/show_bug.cgi?id=331683 The output suggests that there is a perl error causing the mess. I will attach the file for your review. The installed packages are: 09:47 bonza~> uname -r 2.6.22.13-0.3-default 09:47 bonza~> rpm -qa | grep spamassassin perl-spamassassin-3.2.3-11.1 spamassassin-3.2.3-11.1 The script calling sa-learn is: #!/bin/bash /usr/bin/fetchmail -a -s -n --folder Mail/spam-learn -m '/usr/bin/sa-learn --spam' Please let me know what additional information I can provide and I will be happy to run any tests and provide any additional information you may require. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c1 --- Comment #1 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-11 08:51:24 MST --- Created an attachment (id=190265) --> (https://bugzilla.novell.com/attachment.cgi?id=190265) sa-learn script output showing perl errors -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 Mark Gordon <mtgordon@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #190265|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 Martin Mrazik <mmrazik@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.provo.novell.com |kernel-maintainers@forge.provo.novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 Martin Mrazik <mmrazik@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Other |Kernel -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c2 --- Comment #2 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-15 13:13:27 MST --- Guys, It looks like spamassassin starts the cascading error that accumulates over time. Before 13:00:01 after the server restart and ~ 12:40, there were no PAM errors despite the large amount of mail going across the server. You can see the start of the problem in the logs the first time the spamassassin daemon (program) is called on the hour to clean and learn from spam in the Mail/spam-learn imap folder: Jan 15 13:00:01 bonza /usr/sbin/cron[4029]: (david) CMD (/home/david/linux/scripts/Learn_as_spam_cron) Jan 15 13:00:25 bonza imapd[4050]: PAM audit_log_acct_message() failed: Operation not permitted Jan 15 13:01:32 bonza sshd[5039]: Accepted publickey for david from 192.168.7.101 port 41909 ssh2 Jan 15 13:01:38 bonza su: (to root) david on /dev/pts/2 Jan 15 13:05:01 bonza /usr/sbin/cron[5100]: (drr) CMD (/usr/local/bin/Learn_as_spam_cron) Jan 15 13:05:03 bonza imapd[5102]: PAM audit_log_acct_message() failed: Operation not permitted Jan 15 13:06:53 bonza imapd[5333]: PAM audit_log_acct_message() failed: Operation not permitted Jan 15 13:06:55 bonza imapd[5329]: PAM audit_log_acct_message() failed: Operation not permitted Jan 15 13:06:55 bonza imapd[5435]: PAM audit_log_acct_message() failed: Operation not permitted Jan 15 13:06:56 bonza imapd[5440]: PAM audit_log_acct_message() failed: Operation not permitted I am including the syslog (/var/log/messages) for approximately the 20 hours before this problem cascaded into a kernel melt-down. Hopefully, there will be something useful in the log. The pertinent part of the log to look at is the Learn_as_spam_cron script entries that are followed by the PAM error that appear to ultimately cascade into kernel errors over the course of a day. Before this meltdown, the server had been up ~3 days. For the record, the permission on the script are: root@bonza:/home/david/linux/error # l ../scripts/ total 16 drwxr-xr-x 2 david users 4096 2008-01-06 20:47 ./ drwxr-xr-x 7 david users 4096 2008-01-12 19:28 ../ -rwxr----- 1 david users 95 2008-01-06 20:47 Learn_as_spam_cron* I'll attach the log after this post goes through. If you want/need anything else or want me to run more tests let me know. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c3 --- Comment #3 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-15 13:20:23 MST --- Created an attachment (id=190615) --> (https://bugzilla.novell.com/attachment.cgi?id=190615) /var/logmessages Hope this helps! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c4 --- Comment #4 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-15 21:08:46 MST --- The process that generates the error is fetchmail, but the error is a result of authentication against the UW imap server: /usr/bin/fetchmail -a -v -n --folder Mail/spam-learn --ssl --sslcert imap.key.pub fetchmail: 6.3.8 querying mail.******.com (protocol IMAP) at Tue 15 Jan 2008 09:55:20 PM CST: poll started Trying to connect to 192.168.7.15/993...connected. fetchmail: Issuer Organization: RankinLaw fetchmail: Issuer CommonName: mail.******.com fetchmail: Server CommonName: mail.******.com fetchmail: mail.rbpllc.com key fingerprint: BD:9C:2D:7A:DB:<snipped> fetchmail: Server certificate verification error: self signed certificate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ fetchmail: IMAP< * OK [CAPABILITY IMAP4REV1 LITERAL+ SASL-IR LOGIN-REFERRALS AUTH=PLAIN AUTH=LOGIN] bonza IMAP4rev1 2006c.374 at Tue, 15 Jan 2008 21:55:20 -0600 (CST) fetchmail: IMAP> A0001 CAPABILITY fetchmail: IMAP< * CAPABILITY IMAP4REV1 LITERAL+ IDLE UIDPLUS NAMESPACE MAILBOX-REFERRALS BINARY UNSELECT SCAN SORT THREAD=REFERENCES THREAD=ORDEREDSUBJECT MULTIAPPEND SASL-IR LOGIN-REFERRALS AUTH=PLAIN AUTH=LOGIN fetchmail: IMAP< A0001 OK CAPABILITY completed fetchmail: IMAP> A0002 LOGIN "david" * fetchmail: IMAP< A0002 OK [CAPABILITY IMAP4REV1 LITERAL+ IDLE UIDPLUS NAMESPACE MAILBOX-REFERRALS BINARY UNSELECT SCAN SORT THREAD=REFERENCES THREAD=ORDEREDSUBJECT MULTIAPPEND] User david authenticated fetchmail: IMAP> A0003 SELECT "Mail/spam-learn" fetchmail: IMAP< * 0 EXISTS fetchmail: IMAP< * 0 RECENT fetchmail: IMAP< * OK [UIDVALIDITY 1168962957] UID validity status fetchmail: IMAP< * OK [UIDNEXT 3238] Predicted next UID fetchmail: IMAP< * FLAGS (Junk NonJunk \Answered \Flagged \Deleted \Draft \Seen) fetchmail: IMAP< * OK [PERMANENTFLAGS (Junk NonJunk \* \Answered \Flagged \Deleted \Draft \Seen)] Permanent flags fetchmail: IMAP< A0003 OK [READ-WRITE] SELECT completed fetchmail: No mail for david at mail.*****.com (folder Mail/spam-learn) fetchmail: IMAP> A0004 LOGOUT fetchmail: IMAP< * BYE bonza.*****.com IMAP4rev1 server terminating connection fetchmail: IMAP< A0004 OK LOGOUT completed fetchmail: 6.3.8 querying mail.******.com (protocol IMAP) at Tue 15 Jan 2008 09:55:20 PM CST: poll completed fetchmail: normal termination, status 1 If run fetchmail with --sslcertck, authentication fails, but no PAM errors are generated: 21:41 bonza~/linux/scripts/test> /usr/bin/fetchmail -a -v -n --folder Mail/spam-learn --ssl --sslcert imap.key.pub --sslcertck fetchmail: 6.3.8 querying mail.*****.com (protocol IMAP) at Tue 15 Jan 2008 09:41:43 PM CST: poll started Trying to connect to 192.168.7.15/993...connected. fetchmail: Issuer Organization: RankinLaw fetchmail: Issuer CommonName: mail.******.com fetchmail: Server CommonName: mail.******.com fetchmail: mail.rbpllc.com key fingerprint: BD:9C:2D:7A:DB:19:0A: fetchmail: Server certificate verification error: self signed certificate 12789:error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed:s3_clnt.c:894: fetchmail: SSL connection failed. fetchmail: socket error while fetching from david@mail.*****.com fetchmail: 6.3.8 querying mail.*****.com (protocol IMAP) at Tue 15 Jan 2008 09:41:43 PM CST: poll completed fetchmail: Query status=2 (SOCKET) fetchmail: normal termination, status 2 This tells us what process is causing the imapd PAM error, but does not answer why it causes kernel errors or a breakdown in the kernel protections. Something in the interchange is violating somebody else's space. It is repeatable 100% of the time. Just run fetchmail against a UW imap setup with a self-signed certificate. After fetchmail is run, subsequent imap mail server access by any mail client will then generate the imapd PAM when the folders are accessed. These may be a few more clues to help. I hope. Again, if you need more information, just ask. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c5 --- Comment #5 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-16 09:18:53 MST --- Damn!, I hate being wrong. fetchmail may be "one of the causes" for the imap PAM error, but it is *not* the "sole cause" of the problem. For the previous post, I disabled all calls to fetchmail on the system and rebooted and watched the logs for a while and had no errors. I check the logs again this morning and they are flooded with imapd PAM errors. AARGH! Seriously, I've attached another log that covers the time from reboot last night to present. Hope this help! Again, if there is anything I can do, just ask. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c6 --- Comment #6 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-16 09:22:15 MST --- Created an attachment (id=190730) --> (https://bugzilla.novell.com/attachment.cgi?id=190730) syslog from reboot 15/01/08 w/fetchmail disabled After identifying fetchmail as one of the causes of the imapd PAM error, fetchmail was disabled (all scripts removed that call it) and the system was rebooted. 100s of imapd PAM error still occur. This is the syslog from reboot on 15/01/08 until resent. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c7 --- Comment #7 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-16 09:36:59 MST --- Guys, I may have had another stroke of genius! (or confusion from an actual stroke), but could this be related to the fact I am running a bios raid? For some strange reason, it looks like smartd is triggering the imapd PAM errors in the latest logs, could the nvidia BIOS raid be causing the PAM error as it attempts to mirror the imapd transactions regardless of what process actually created the imapd action? Just another SWAG (swinging wild ass guess) with a bit of logic applied or totally misapplied as the case may be. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c8 --- Comment #8 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-26 00:55:35 MST --- Guys, I have captured another crash and trace after the sa-lean process was called. Hopefully this little additional information will help. Here is the information: Jan 25 22:00:01 bonza /usr/sbin/cron[12116]: (david) CMD (/home/david/linux/scripts/Learn_as_spam_cron) Jan 25 22:00:07 bonza syslog-ng[2830]: SIGHUP received, restarting syslog-ng Jan 25 22:00:08 bonza syslog-ng[2830]: new configuration initialized Jan 25 22:00:57 bonza su: (to nobody) root on none Jan 25 22:01:08 bonza syslog-ng[2830]: last message repeated 6 times Jan 25 22:02:00 bonza kernel: BUG: unable to handle kernel paging request at virtual address 00009100 Jan 25 22:02:00 bonza kernel: printing eip: Jan 25 22:02:00 bonza kernel: c017e889 Jan 25 22:02:00 bonza kernel: *pde = 00000000 Jan 25 22:02:00 bonza kernel: Oops: 0000 [#1] Jan 25 22:02:00 bonza kernel: SMP Jan 25 22:02:00 bonza kernel: last sysfs file: /class/input/input4/event4/dev Jan 25 22:02:00 bonza kernel: Modules linked in: joydev nls_utf8 cifs af_packet iptable_filter ip_tables ip6_tables x_tables apparmor loop 8139cp 8139too mii button rtc_cmos rtc_core rtc_lib nvidia_agp agpgart i2c_nforce2 i2c_core shpchp pci_hotplug sr_mod cdrom sg dm_mirror dm_log sd_mod ehci_hcd ohci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Jan 25 22:02:00 bonza kernel: CPU: 0 Jan 25 22:02:00 bonza kernel: EIP: 0060:[<c017e889>] Tainted: G N VLI Jan 25 22:02:00 bonza kernel: EFLAGS: 00010206 (2.6.22.13-0.3-default #1) Jan 25 22:02:00 bonza kernel: EIP is at __d_lookup+0xb9/0xd9 Jan 25 22:02:00 bonza kernel: eax: d8973cb5 ebx: d4cdfbe8 ecx: 00000011 edx: c2019180 Jan 25 22:02:00 bonza kernel: esi: c04b36bc edi: 00009100 ebp: c04d825c esp: c2ca3db8 Jan 25 22:02:00 bonza kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Jan 25 22:02:00 bonza kernel: Process find (pid: 12436, ti=c2ca2000 task=c4096570 task.ti=c2ca2000) Jan 25 22:02:00 bonza kernel: Stack: c2ca3e3c 0000000a d8973cb5 c9ce3000 d8973cb5 c04b36bc c9ce3000 c2ca3f04 Jan 25 22:02:00 bonza kernel: c0175f80 c2ca3e48 c2ca3e3c c2ca3f04 dfff1ca0 d8973cb5 c04b36bc c9ce3000 Jan 25 22:02:00 bonza kernel: c2ca3f04 c0177eb5 d34020ac 0000000c c9ce300a dfff1d00 00000000 db1268f0 Jan 25 22:02:00 bonza kernel: Call Trace: Jan 25 22:02:00 bonza kernel: [<c0175f80>] do_lookup+0x24/0x140 Jan 25 22:02:00 bonza kernel: [<c0177eb5>] __link_path_walk+0x7c9/0xc26 Jan 25 22:02:00 bonza kernel: [<c018237d>] mntput_no_expire+0x13/0x6c Jan 25 22:02:00 bonza kernel: [<c01783bb>] link_path_walk+0xa9/0xb3 Jan 25 22:02:00 bonza kernel: [<c0178356>] link_path_walk+0x44/0xb3 Jan 25 22:02:00 bonza kernel: [<c0178644>] do_path_lookup+0x168/0x1cc Jan 25 22:02:00 bonza kernel: [<c0177413>] getname+0x59/0xab Jan 25 22:02:00 bonza kernel: [<c0178e29>] __user_walk_fd+0x2f/0x40 Jan 25 22:02:00 bonza kernel: [<c01729b1>] vfs_lstat_fd+0x16/0x3d Jan 25 22:02:00 bonza kernel: [<c0172a1d>] sys_lstat64+0xf/0x23 Jan 25 22:02:00 bonza kernel: [<c0104e22>] sysenter_past_esp+0x6b/0xa9 Jan 25 22:02:00 bonza kernel: ======================= Jan 25 22:02:00 bonza kernel: Code: 54 24 0c e8 26 08 05 00 85 c0 75 14 31 d2 f6 43 04 10 75 05 90 ff 03 89 da b0 01 86 43 08 eb 21 b0 01 86 43 08 8b 3f 85 ff 74 14 <8b> 07 0f 18 00 90 8d 5f f0 8b 44 24 08 39 43 1c 75 e8 eb 84 31 Jan 25 22:02:00 bonza kernel: EIP: [<c017e889>] __d_lookup+0xb9/0xd9 SS:ESP 0068:c2ca3db8 Jan 25 22:02:00 bonza kernel: klogd 1.4.1, ---------- state change ---------- -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c9 --- Comment #9 from David Rankin <drankinatty@suddenlinkmail.com> 2008-01-31 07:34:41 MST --- Here are a few more log entries. Hopefully they will add another piece to the puzzle: Jan 31 08:13:05 bonza syslog-ng[2748]: STATS: dropped 0 Jan 31 08:13:30 bonza kernel: VM: killing process sa-learn Jan 31 08:13:30 bonza kernel: swap_free: Unused swap offset entry 00000040 This machine never swaps: 08:26 bonza~> free -tm total used free shared buffers cached Mem: 2027 1790 237 0 482 804 -/+ buffers/cache: 503 1524 Swap: 2125 0 2125 Total: 4153 1790 2362 So why did the kernel have to free an unused swap offset after killing sa-learn? Let me know what else I can send you. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 Greg Kroah-Hartman <gregkh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.provo.novell.com |jeffm@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #190615|text/plain |application/x-bzip2 mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c10 --- Comment #10 from David Rankin <drankinatty@suddenlinkmail.com> 2008-02-11 10:37:33 MST --- Guys, This may be a pure kernel/Gigabyte issue as opposed to a sa-learn only issue. The sa-learn problem looks like it may the the result of a larger issue concerning kernel stability/instability on the gigabyte platform. I have experienced several additional meltdowns. Some the logs are blank. Some the logs contain useful information. The latest, yesterday involved the simple routine daily mandb process. The following was collected: Feb 10 04:30:09 bonza kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044 Feb 10 04:30:09 bonza kernel: printing eip: Feb 10 04:30:09 bonza kernel: c02c6157 Feb 10 04:30:09 bonza kernel: *pde = 00000000 Feb 10 04:30:09 bonza kernel: Oops: 0002 [#2] Feb 10 04:30:09 bonza kernel: SMP Feb 10 04:30:09 bonza kernel: last sysfs file: /devices/platform/i2c-adapter/i2c-9191/9191-0290/name Feb 10 04:30:09 bonza kernel: Modules linked in: af_packet it87 hwmon_vid hwmon i2c_isa iptable_filter ip_tables ip6_tables x_tables apparmor loop 8139cp 8139too mii button sr_mod cdrom shpchp i2c_nforce2 rtc_cmos pci_hotplug rtc_core nvidia_agp rtc_lib i2c_core agpgart sg dm_mirror dm_log sd_mod ehci_hcd ohci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Feb 10 04:30:09 bonza kernel: CPU: 0 Feb 10 04:30:09 bonza kernel: EIP: 0060:[<c02c6157>] Tainted: G N VLI Feb 10 04:30:09 bonza kernel: EFLAGS: 00010286 (2.6.22.16-0.2-default #1) Feb 10 04:30:09 bonza kernel: EIP is at _spin_lock+0x1/0xf Feb 10 04:30:09 bonza kernel: eax: 00000044 ebx: d4cdf978 ecx: d4cdf888 edx: 00000001 Feb 10 04:30:09 bonza kernel: esi: 00000000 edi: 00000039 ebp: d4cdf930 esp: d56dfd84 Feb 10 04:30:09 bonza kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Feb 10 04:30:09 bonza kernel: Process mandb (pid: 3975, ti=d56de000 task=f7f79570 task.ti=d56de000) Feb 10 04:30:09 bonza kernel: Stack: c018eee3 00000282 00000246 00000000 dfafba40 dfb77e00 d4cdf978 d4cdf888 Feb 10 04:30:09 bonza kernel: 00000000 00000039 00000039 c0181a80 00000080 d4cdfc68 d4ccbe94 00027fd8 Feb 10 04:30:09 bonza kernel: dfffeaa0 0000008e 000280d2 c015bdfc 00003c90 00001e48 00000000 013fec00 Feb 10 04:30:09 bonza kernel: Call Trace: Feb 10 04:30:09 bonza kernel: [<c018eee3>] remove_inode_buffers+0x3b/0xc8 Feb 10 04:30:09 bonza kernel: [<c0181a80>] shrink_icache_memory+0xc5/0x1b6 Feb 10 04:30:09 bonza kernel: [<c015bdfc>] shrink_slab+0xde/0x13b Feb 10 04:30:09 bonza kernel: [<c015c6e4>] try_to_free_pages+0x173/0x247 Feb 10 04:30:09 bonza kernel: [<c015833e>] __alloc_pages+0x1a1/0x2d6 Feb 10 04:30:09 bonza kernel: [<c0163392>] find_mergeable_anon_vma+0x5e/0xb2 Feb 10 04:30:09 bonza kernel: [<c0160646>] __handle_mm_fault+0x482/0xa83 Feb 10 04:30:09 bonza kernel: [<c0170c32>] do_sync_read+0x0/0x10a Feb 10 04:30:09 bonza kernel: [<c01632a8>] arch_get_unmapped_area_topdown+0x9b/0x127 Feb 10 04:30:09 bonza kernel: [<c0162c52>] vma_merge+0x168/0x178 Feb 10 04:30:09 bonza kernel: [<c02c78fa>] do_page_fault+0x26c/0x5a6 Feb 10 04:30:09 bonza kernel: [<c02c768e>] do_page_fault+0x0/0x5a6 Feb 10 04:30:09 bonza kernel: [<c02c6492>] error_code+0x72/0x80 Feb 10 04:30:09 bonza kernel: ======================= Feb 10 04:30:09 bonza kernel: Code: 05 90 ff 00 30 d2 89 d0 c3 89 c2 90 81 28 00 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 90 81 02 00 00 00 01 30 c9 89 c8 c3 90 <fe> 08 79 09 f3 90 80 38 00 7e f9 eb f2 c3 90 81 28 00 00 00 01 Feb 10 04:30:09 bonza kernel: EIP: [<c02c6157>] _spin_lock+0x1/0xf SS:ESP 0068:d56dfd84 Feb 10 04:30:09 bonza su: (to nobody) root on none Feb 10 04:30:35 bonza syslog-ng[2748]: last message repeated 3 times Let me know what else I can provide. I am not the only user affected by instability on Gigabyte boards. Per Jessen on the openSuSE list is likewise seeing the exact same issues. I am more than happy to provide whatever you need to see if we can solve this. This has happened regardless of the kernel version installed. Currently we have: 11:35 bonza~> uname -r 2.6.22.16-0.2-default Also, if you need/would like direct access to the box, contact me privately and I will be glad to arrange it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c11 --- Comment #11 from David Rankin <drankinatty@suddenlinkmail.com> 2008-02-11 11:17:12 MST --- Here is a bit more information: Since this has occurred with both mandriva and opensuse, on different power supplies and with different ram and hardrives, it is pretty much limited to being a gigabyte motherboard problem. The relevant information is: Gigabyte GA-N700SL 2 Gig of Ram Raid1: 2 Seagate (ST3250410AS_6RY28WW1) SATA 250G drives in Raid 1 (md raid) ATA: Maxtor_5T020H2_T2H34J2 as swap This occurred without the raid as well so that isn't the culprit. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c12 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |drankinatty@suddenlinkmail.com --- Comment #12 from Jeff Mahoney <jeffm@novell.com> 2008-02-20 13:18:04 MST --- The Oops in the initial description was caused by buffer_mapping (inode->i_data.assoc_mapping) being NULL in remove_inode_buffers(). spin_lock hits it and derefs the NULL pointer to find the lock. The Oops in comment #3 was caused by the buffer head being NULL, and it's accessing bh->b_data (0x19 = 25). NULL in this case likely means 0x1. I haven't decoded the others because I'm not really sure how we should proceed here. You're getting Oopses all over the kernel in *core* data paths. These paths are well tested, and more importantly used by every Linux user every day. I know you've said you've tried using different memory, etc - but I'd stil like to see the results of a memtest run. Also, is this system overclocked? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c13 --- Comment #13 from David Rankin <drankinatty@suddenlinkmail.com> 2008-02-20 21:48:50 MST --- Jeff, The system is "bone stock" no overclocking whatsoever. I run all my servers very conservatively. I will install and give a full memory test just to make triple sure. Also, the system, and all of the meltdowns seem to be associated with cron running. The last two meltdowns have been the direct result of the cron.daily script being run. I have set the cron.daily run time to 12:15 (lunch) just to monitor this behavior. The result of the meltdown today is provided below. Even though all of these data paths are well tested, there seems to be a problem with their operation on this Gigabyte board. I'll run memtest, but it seems unlikely that I could have gotten 3 bad sets of memory. Here is the latest explosion: Feb 20 12:15:07 bonza syslog-ng[2858]: STATS: dropped 0 Feb 20 12:15:38 bonza kernel: BUG: unable to handle kernel paging request at virtual address 03188010 Feb 20 12:15:38 bonza kernel: printing eip: Feb 20 12:15:38 bonza kernel: c01890c1 Feb 20 12:15:38 bonza kernel: *pde = 00000000 Feb 20 12:15:38 bonza kernel: Oops: 0000 [#1] Feb 20 12:15:38 bonza kernel: SMP Feb 20 12:15:38 bonza kernel: last sysfs file: /devices/platform/i2c-adapter/i2c-9191/9191-0290/name Feb 20 12:15:38 bonza kernel: Modules linked in: af_packet it87 hwmon_vid hwmon i2c_isa iptable_filter ip_tables ip6_tables x_tables apparmor loop 8139cp 8139too mii button snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd ns558 parport_pc parport gameport soundcore nvidia_agp shpchp snd_page_alloc agpgart rtc_cmos pci_hotplug i2c_nforce2 rtc_core sr_mod rtc_lib cdrom i2c_core sg dm_mirror dm_log sd_mod ehci_hcd ohci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Feb 20 12:15:38 bonza kernel: CPU: 0 Feb 20 12:15:38 bonza kernel: EIP: 0060:[<c01890c1>] Tainted: G N VLI Feb 20 12:15:38 bonza kernel: EFLAGS: 00010202 (2.6.22.17-0.1-default #1) Feb 20 12:15:38 bonza kernel: EIP is at __writeback_single_inode+0x1af/0x31c Feb 20 12:15:38 bonza kernel: eax: 03188000 ebx: 00000001 ecx: 00000000 edx: c01589ec Feb 20 12:15:38 bonza kernel: esi: d4cdfe4c edi: c21a7600 ebp: d4cdfef4 esp: dfb0def0 Feb 20 12:15:38 bonza kernel: ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Feb 20 12:15:38 bonza kernel: Process pdflush (pid: 182, ti=dfb0c000 task=dff5aab0 task.ti=dfb0c000) Feb 20 12:15:38 bonza kernel: Stack: 00000002 df8369a0 df8369a0 00000000 dfb0df74 00000000 00000000 00000000 Feb 20 12:15:38 bonza kernel: 00000002 f8c7483e df8d7ec0 df8d7ec0 00000000 00000002 d4cdfe4c dfb78100 Feb 20 12:15:38 bonza kernel: c21a6e00 dfb0df74 c018950f c21a6e7c c21a6e84 02015354 00000000 c21a6e00 Feb 20 12:15:38 bonza kernel: Call Trace: Feb 20 12:15:38 bonza kernel: [<f8c7483e>] dm_table_any_congested+0x33/0x4c [dm_mod] Feb 20 12:15:38 bonza kernel: [<c018950f>] sync_sb_inodes+0x16a/0x225 Feb 20 12:15:38 bonza kernel: [<c018990c>] writeback_inodes+0x6a/0xb2 Feb 20 12:15:39 bonza kernel: [<c0159445>] wb_kupdate+0x83/0xe5 Feb 20 12:15:39 bonza kernel: [<c0159825>] pdflush+0x123/0x1c2 Feb 20 12:15:39 bonza kernel: [<c01593c2>] wb_kupdate+0x0/0xe5 Feb 20 12:15:39 bonza kernel: [<c0159702>] pdflush+0x0/0x1c2 Feb 20 12:15:39 bonza kernel: [<c01350ee>] kthread+0x38/0x5e Feb 20 12:15:39 bonza kernel: [<c01350b6>] kthread+0x0/0x5e Feb 20 12:15:39 bonza kernel: [<c0106117>] kernel_thread_helper+0x7/0x10 Feb 20 12:15:39 bonza kernel: ======================= Feb 20 12:15:39 bonza kernel: Code: 89 86 38 01 00 00 b0 01 86 05 84 eb 35 c0 8b 54 24 10 89 e8 e8 77 fd fc ff 80 e3 03 89 44 24 14 74 3b 8b 86 9c 00 00 00 8b 40 20 <83> 78 10 00 74 1f 89 f0 e8 ff 94 ff ff 85 c0 75 14 8b 86 9c 00 Feb 20 12:15:39 bonza kernel: EIP: [<c01890c1>] __writeback_single_inode+0x1af/0x31c SS:ESP 0068:dfb0def0 Feb 20 12:15:39 bonza kernel: klogd 1.4.1, ---------- state change ---------- Feb 20 12:15:48 bonza kernel: BUG: unable to handle kernel paging request at virtual address 0318800c Feb 20 12:15:48 bonza kernel: printing eip: Feb 20 12:15:48 bonza kernel: c0189754 Feb 20 12:15:48 bonza kernel: *pde = 00000000 Feb 20 12:15:48 bonza kernel: Oops: 0000 [#2] Feb 20 12:15:48 bonza kernel: SMP Feb 20 12:15:48 bonza kernel: last sysfs file: /devices/platform/i2c-adapter/i2c-9191/9191-0290/name Feb 20 12:15:48 bonza kernel: Modules linked in: af_packet it87 hwmon_vid hwmon i2c_isa iptable_filter ip_tables ip6_tables x_tables apparmor loop 8139cp 8139too mii button snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd ns558 parport_pc parport gameport soundcore nvidia_agp shpchp snd_page_alloc agpgart rtc_cmos pci_hotplug i2c_nforce2 rtc_core sr_mod rtc_lib cdrom i2c_core sg dm_mirror dm_log sd_mod ehci_hcd ohci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Feb 20 12:15:48 bonza kernel: CPU: 0 Feb 20 12:15:48 bonza kernel: EIP: 0060:[<c0189754>] Tainted: G N VLI Feb 20 12:15:48 bonza kernel: EFLAGS: 00010202 (2.6.22.17-0.1-default #1) Feb 20 12:15:48 bonza kernel: EIP is at __mark_inode_dirty+0x1d/0x16b Feb 20 12:15:48 bonza kernel: eax: 03188000 ebx: 00000001 ecx: 00000000 edx: 00000001 Feb 20 12:15:48 bonza kernel: esi: d4cdfe4c edi: 00001000 ebp: d4cdfe4c esp: ed5bdf0c Feb 20 12:15:48 bonza kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Feb 20 12:15:48 bonza kernel: Process mandb (pid: 17139, ti=ed5bc000 task=ef319ab0 task.ti=ed5bc000) Feb 20 12:15:48 bonza kernel: Stack: d4cdfe4c 223215a6 47bc6e53 c21a7600 c0128f0a d4cdfe4c c21a7600 00001000 Feb 20 12:15:48 bonza kernel: 00000000 d4cdfe4c 00001000 ed5bc000 c017394f ed5bdf44 d4d05534 dfff18a0 Feb 20 12:15:48 bonza kernel: 00000000 bfb71228 083ab5a4 00000000 00000001 00000000 000e2ae9 00000000 Feb 20 12:15:48 bonza kernel: Call Trace: Feb 20 12:15:48 bonza kernel: [<c0128f0a>] current_fs_time+0x41/0x46 Feb 20 12:15:48 bonza kernel: [<c017394f>] sys_readlinkat+0x76/0xa4 Feb 20 12:15:48 bonza kernel: [<c01739a4>] sys_readlink+0x27/0x2b Feb 20 12:15:48 bonza kernel: [<c0104e22>] sysenter_past_esp+0x6b/0xa9 Feb 20 12:15:48 bonza kernel: ======================= Feb 20 12:15:48 bonza kernel: Code: f7 ff ff b2 01 86 15 84 eb 35 c0 5b 5e c3 55 89 c5 57 56 53 89 d3 83 ec 20 8b 80 9c 00 00 00 f6 c2 03 89 44 24 18 74 0e 8b 40 20 <8b> 50 0c 85 d2 74 04 89 e8 ff d2 f0 83 04 24 00 89 d8 23 85 38 Feb 20 12:15:48 bonza kernel: EIP: [<c0189754>] __mark_inode_dirty+0x1d/0x16b SS:ESP 0068:ed5bdf0c Feb 20 12:15:48 bonza su: (to nobody) root on none Feb 20 12:16:34 bonza syslog-ng[2858]: last message repeated 6 times (then it dies) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c14 David Rankin <drankinatty@suddenlinkmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|drankinatty@suddenlinkmail.com | --- Comment #14 from David Rankin <drankinatty@suddenlinkmail.com> 2008-02-27 22:39:45 MST --- Sorry, forgot the check the button to turn off the need status. Also, I'm beginning to suspect a mother board or memory problem. I'm looking into that. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=353252 User drankinatty@suddenlinkmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=353252#c15 David Rankin <drankinatty@suddenlinkmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #15 from David Rankin <drankinatty@suddenlinkmail.com> 2008-03-03 17:32:55 MST --- You can close this bug. Apparently there was a spurious memory address in my ocz platinum RAM that was the culprit. It only appeared under certain loads. I replaced the ram 4 days ago and it is stable. It is hard to believe this was the problem after memtesting the ram less that 3 months ago. This is the first ram I have had just "go bad" and this wasn't cheepo ram either. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com