[Bug 970249] New: Kernel (evergreen) 3.12.53 + adaptec 6805 sas hba adapter = backtrace
http://bugzilla.opensuse.org/show_bug.cgi?id=970249 Bug ID: 970249 Summary: Kernel (evergreen) 3.12.53 + adaptec 6805 sas hba adapter = backtrace Classification: openSUSE Product: openSUSE 13.1 Version: Final Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: bruno@ioda-net.ch QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- As discussed on the mailing list during the Update Staging process, I'm opening this bug for reference I was using since month the kernel prepared at http://download.opensuse.org/home:/mkubecek:/evergreen-13.1/openSUSE_13.1/ AMD FX8350 + nvidia blob as desktop (high 3d usage / kde 4.x) AMD FX8350 as server with adaptec raid 8805 AMD Opteron(tm) Processor 2431 + adaptec raid 5805 8x 1To nearline raid6 Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz + LSI MegaRaid 1 (2x400G intel ssd + 2x 2To HSGD) Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz + 8x 2T Seagate raid soft mdadm (web/mail/postgresql/kvm) AMD Athlon(tm) II X2 250 Processor (high throughput firewall) AMD A10-5800K APU with Radeon(tm) HD Graphics + adaptec 5800 with 8x 2To WDC (raid6) + 8x4To Seagate (raid6 soft) + 10Gbps network intel card (heavy io : backup) But on two machines : having an adaptec raid controleur 6805 I'm getting a kernel backtrace see attachement for captured screen. (sorry no remote serial log) On one of them I've this history of kernel used (during a time I was using kernel:standard) before switching back to evergreen. 2014-04-17 22:24:57|kernel-default|3.11.6-4.1|x86_64|root@clochette.disney.interne|openSUSE-13.1-1.10 2014-04-17 23:13:36|kernel-default|3.11.10-7.1|x86_64||updates 2014-04-18 00:32:55|kernel-default|3.14.1-1.1.geafcebd|x86_64|root@clochette|kernel-stable 2014-05-06 18:17:35|kernel-default|3.14.2-1.1.g1474ea5|x86_64||kernel-stable 2014-05-20 18:30:49|kernel-default|3.11.10-11.1|x86_64||updates 2014-05-20 18:35:06|kernel-default|3.14.4-1.1.gbebeb6f|x86_64||kernel-stable 2014-06-06 17:15:46|kernel-default|3.14.4-2.1.g0de0f93|x86_64||kernel-stable 2014-07-01 17:38:11|kernel-default|3.15.2-1.1.gfb7c781|x86_64||kernel-stable 2014-07-01 17:41:17|kernel-default|3.11.10-17.2|x86_64||updates 2014-07-10 18:47:49|kernel-default|3.15.4-1.1.g2b59ae6|x86_64||kernel-stable 2014-07-29 18:28:29|kernel-default|3.15.6-2.1.gedc5ddf|x86_64||kernel-stable 2014-08-01 17:21:02|kernel-default|3.15.7-1.1.g972d9a6|x86_64||kernel-stable 2014-08-13 19:27:38|kernel-default|3.11.10-21.1|x86_64||updates 2014-08-13 19:28:49|kernel-default|3.15.8-2.1.g258e3b0|x86_64||kernel-stable 2014-09-12 17:56:36|kernel-default|3.16.2-1.1.gdcee397|x86_64||kernel-stable 2014-09-30 11:15:49|kernel-default|3.16.3-1.1.gd2bbe7f|x86_64||kernel-stable 2014-10-17 18:07:46|kernel-default|3.17.0-1.1.gc467423|x86_64||kernel-stable 2014-12-03 17:53:49|kernel-default|3.17.4-2.1.g2d23787|x86_64||kernel-stable 2015-01-06 17:54:51|kernel-default|3.18.1-1.1.g5f2f35e|x86_64|root@clochette|kernel-stable 2015-01-20 17:59:23|kernel-default|3.18.2-2.1.g88366a3|x86_64||kernel-stable 2015-02-03 17:44:26|kernel-default|3.18.5-1.1.gf378da4|x86_64||kernel-stable 2015-03-03 17:57:05|kernel-default|3.19.0-4.1.g7f0e735|x86_64||kernel-stable 2015-03-11 18:07:01|kernel-default|3.19.1-2.1.gc0946e9|x86_64||kernel-stable 2015-03-21 09:05:51|kernel-default|3.19.2-1.1.gf2f9797|x86_64||kernel-stable 2015-04-02 17:00:19|kernel-default|3.19.3-1.1.gf10e7fc|x86_64||kernel-stable 2015-04-17 19:04:27|kernel-default|3.19.4-1.1.g74c332b|x86_64||kernel-stable 2015-05-13 18:15:53|kernel-default|4.0.2-1.1.ga425d38|x86_64||kernel-stable 2015-06-02 18:18:37|kernel-default|4.0.4-4.1.gad54361|x86_64||kernel-stable 2015-06-16 18:13:00|kernel-default|4.0.5-2.1.g0e899eb|x86_64||kernel-stable 2015-07-14 17:59:47|kernel-default|4.1.1-2.1.gcac28b3|x86_64||kernel-stable 2015-07-29 13:56:16|kernel-default|4.1.3-5.1.ga0f869c|x86_64||kernel-stable 2015-08-11 11:26:26|kernel-default|4.1.4-1.1.ga37e14f|x86_64||kernel-stable 2015-08-15 10:53:25|kernel-default|4.1.5-2.1.g83fbd4e|x86_64||kernel-stable 2016-02-02 18:25:21|kernel-default|4.4.0-8.1.g9f68b90|x86_64|root@clochette|kernel-stable 2016-02-17 18:45:43|kernel-default|3.12.51-2.1|x86_64|root@clochette|kernel-evergreen 2016-02-17 19:37:15|kernel-default|3.11.10-34.2|x86_64|root@sysresccd|updates 2016-03-01 17:52:16|kernel-default|3.12.53-1.1|x86_64||kernel-evergreen The last high number was 4.4.0, and the first working > 3.11 was 3.14.1 this is how arcconf tools see the controler and system on a pure 3.11.10-34-default -------------------------------------------------------- Controller Version Information 6805 -------------------------------------------------------- BIOS : 5.2-0 (19147) Firmware : 5.2-0 (19147) Driver : 1.2-0 (30200) Boot Flash : 5.2-0 (19147) On another one which has a different controleur but working 3.12.53 -------------------------------------------------------- Controller Version Information 5805 -------------------------------------------------------- BIOS : 5.2-0 (18948) Firmware : 5.2-0 (18948) Driver : 1.2-1 (40709) Boot Flash : 5.2-0 (18948) We saw the driver get an update 1.2-0 to 1.2-1
I'm not really an expert in this area but it looks like an IRQ is received and handled before all the device data structures are set up properly (a pointer which is still null is dereferenced).
The most funky is on the list of system 3 of them share almost every hardware piece same motherboard Asus CROSSHAIR V FORMULA-Z, BIOS 2101 04/17/2014 same ram TridentX - F3-2400C10D-8GTX - G.SKILL DDR3 Memory x4 same cpu AMD FX(tm)-8350 Eight-Core Processor The main differences are one has a 8805 and intel PT1000 + nvidia GeForce GTX 560 (with nvidia blob) (working) And the two failing have a 6805 + Intel 10-Gigabit X540-AT2 + Nvidia GT218 (pci-e 1x) with nouveau As the crash message really involve aacraid, That's how I deducted the 6800 is the culprit in the stack. We updated the firmware on controler to last available at pmc/adaptec And have this running actually Controllers Found: 2 ---------------------------------------------------------------------- Controller Information ---------------------------------------------------------------------- Controller Status : OK Channel Description : SAS/SATA Controller Model : Adaptec 6805 Controller World Wide Name : 50000D1104872180 Controller Alarm : Enabled Temperature : 52 C/ 125 F (Normal) Installed Memory : 512 MB Global task priority : Low Performance Mode : Default/Dynamic Host Bus Type : PCIe Host Bus Speed : 5000 MHz Host Bus Link Width : 8 bit(s)/link(s) Stayawake Period : Disabled Spinup limit internal drives : 0 Spinup limit external drives : 0 Defunct Disk Drive Count : 0 Logical Devices/Failed/Degraded : 1/0/0 NCQ Status : Enabled Statistics Data Collection Mode : Enabled -------------------------------------------------------- RAID Properties -------------------------------------------------------- Copyback : Disabled Automatic Failover : Enabled Background consistency check : Enabled Background Consistency Check Period : 30 -------------------------------------------------------- Controller Version Information -------------------------------------------------------- BIOS : 5.2-0 (19176) Firmware : 5.2-0 (19176) Driver : 1.2-0 (30200) Boot Flash : 5.2-0 (19176) SEEPROM (Load version/ Flash version) : 2/ 8 -------------------------------------------------------- Controller ZMM Information -------------------------------------------------------- Status : ZMM Optimal -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c1
--- Comment #1 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c2
--- Comment #2 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c3
--- Comment #3 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c4
--- Comment #4 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c5
--- Comment #5 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c6
Bruno Friedmann
I talked about the issue with our SCSI expert and he thinks he has seen either this or a very similar problem recently, there should already be a fix available.
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c13
--- Comment #13 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c14
--- Comment #14 from Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970249
http://bugzilla.opensuse.org/show_bug.cgi?id=970249#c15
--- Comment #15 from Bruno Friedmann
participants (1)
-
bugzilla_noreply@novell.com