Bug ID 970249
Summary Kernel (evergreen) 3.12.53 + adaptec 6805 sas hba adapter = backtrace
Classification openSUSE
Product openSUSE 13.1
Version Final
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter bruno@ioda-net.ch
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

As discussed on the mailing list during the Update Staging process,
I'm opening this bug for reference

I was using since month the kernel prepared at 
http://download.opensuse.org/home:/mkubecek:/evergreen-13.1/openSUSE_13.1/

AMD FX8350 + nvidia blob as desktop (high 3d usage / kde 4.x)
AMD FX8350 as server with adaptec raid 8805 
AMD Opteron(tm) Processor 2431 + adaptec raid 5805 8x 1To nearline raid6
Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz + LSI MegaRaid 1 (2x400G intel ssd +
2x 2To HSGD)
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz + 8x 2T Seagate raid soft mdadm
(web/mail/postgresql/kvm)
AMD Athlon(tm) II X2 250 Processor (high throughput firewall)
AMD A10-5800K APU with Radeon(tm) HD Graphics + adaptec 5800 with 8x 2To WDC
(raid6) + 8x4To Seagate (raid6 soft) + 10Gbps network intel card
(heavy io : backup)

But on two machines : having an adaptec raid controleur 6805 I'm getting a
kernel backtrace 
see attachement for captured screen. (sorry no remote serial log)

On one of them I've this history of kernel used (during a time I was using
kernel:standard) before switching back to evergreen.

2014-04-17
22:24:57|kernel-default|3.11.6-4.1|x86_64|root@clochette.disney.interne|openSUSE-13.1-1.10
2014-04-17 23:13:36|kernel-default|3.11.10-7.1|x86_64||updates
2014-04-18
00:32:55|kernel-default|3.14.1-1.1.geafcebd|x86_64|root@clochette|kernel-stable
2014-05-06 18:17:35|kernel-default|3.14.2-1.1.g1474ea5|x86_64||kernel-stable
2014-05-20 18:30:49|kernel-default|3.11.10-11.1|x86_64||updates
2014-05-20 18:35:06|kernel-default|3.14.4-1.1.gbebeb6f|x86_64||kernel-stable
2014-06-06 17:15:46|kernel-default|3.14.4-2.1.g0de0f93|x86_64||kernel-stable
2014-07-01 17:38:11|kernel-default|3.15.2-1.1.gfb7c781|x86_64||kernel-stable
2014-07-01 17:41:17|kernel-default|3.11.10-17.2|x86_64||updates
2014-07-10 18:47:49|kernel-default|3.15.4-1.1.g2b59ae6|x86_64||kernel-stable
2014-07-29 18:28:29|kernel-default|3.15.6-2.1.gedc5ddf|x86_64||kernel-stable
2014-08-01 17:21:02|kernel-default|3.15.7-1.1.g972d9a6|x86_64||kernel-stable
2014-08-13 19:27:38|kernel-default|3.11.10-21.1|x86_64||updates
2014-08-13 19:28:49|kernel-default|3.15.8-2.1.g258e3b0|x86_64||kernel-stable
2014-09-12 17:56:36|kernel-default|3.16.2-1.1.gdcee397|x86_64||kernel-stable
2014-09-30 11:15:49|kernel-default|3.16.3-1.1.gd2bbe7f|x86_64||kernel-stable
2014-10-17 18:07:46|kernel-default|3.17.0-1.1.gc467423|x86_64||kernel-stable
2014-12-03 17:53:49|kernel-default|3.17.4-2.1.g2d23787|x86_64||kernel-stable
2015-01-06
17:54:51|kernel-default|3.18.1-1.1.g5f2f35e|x86_64|root@clochette|kernel-stable
2015-01-20 17:59:23|kernel-default|3.18.2-2.1.g88366a3|x86_64||kernel-stable
2015-02-03 17:44:26|kernel-default|3.18.5-1.1.gf378da4|x86_64||kernel-stable
2015-03-03 17:57:05|kernel-default|3.19.0-4.1.g7f0e735|x86_64||kernel-stable
2015-03-11 18:07:01|kernel-default|3.19.1-2.1.gc0946e9|x86_64||kernel-stable
2015-03-21 09:05:51|kernel-default|3.19.2-1.1.gf2f9797|x86_64||kernel-stable
2015-04-02 17:00:19|kernel-default|3.19.3-1.1.gf10e7fc|x86_64||kernel-stable
2015-04-17 19:04:27|kernel-default|3.19.4-1.1.g74c332b|x86_64||kernel-stable
2015-05-13 18:15:53|kernel-default|4.0.2-1.1.ga425d38|x86_64||kernel-stable
2015-06-02 18:18:37|kernel-default|4.0.4-4.1.gad54361|x86_64||kernel-stable
2015-06-16 18:13:00|kernel-default|4.0.5-2.1.g0e899eb|x86_64||kernel-stable
2015-07-14 17:59:47|kernel-default|4.1.1-2.1.gcac28b3|x86_64||kernel-stable
2015-07-29 13:56:16|kernel-default|4.1.3-5.1.ga0f869c|x86_64||kernel-stable
2015-08-11 11:26:26|kernel-default|4.1.4-1.1.ga37e14f|x86_64||kernel-stable
2015-08-15 10:53:25|kernel-default|4.1.5-2.1.g83fbd4e|x86_64||kernel-stable
2016-02-02
18:25:21|kernel-default|4.4.0-8.1.g9f68b90|x86_64|root@clochette|kernel-stable
2016-02-17
18:45:43|kernel-default|3.12.51-2.1|x86_64|root@clochette|kernel-evergreen
2016-02-17 19:37:15|kernel-default|3.11.10-34.2|x86_64|root@sysresccd|updates
2016-03-01 17:52:16|kernel-default|3.12.53-1.1|x86_64||kernel-evergreen

The last high number was 4.4.0, and the first working > 3.11 was 3.14.1

this is how arcconf tools see the controler and system on a pure
3.11.10-34-default 
   --------------------------------------------------------
   Controller Version Information 6805
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (19147)
   Firmware                                 : 5.2-0 (19147)
   Driver                                   : 1.2-0 (30200)
   Boot Flash                               : 5.2-0 (19147)



On another one which has a different controleur but working 3.12.53
   --------------------------------------------------------
   Controller Version Information 5805
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (18948)
   Firmware                                 : 5.2-0 (18948)
   Driver                                   : 1.2-1 (40709)
   Boot Flash                               : 5.2-0 (18948)

We saw the driver get an update 1.2-0 to 1.2-1 


> I'm not really an expert in this area but it looks like an IRQ is 
> received and handled before all the device data structures are set up 
> properly (a pointer which is still null is dereferenced).
> 


The most funky is on the list of system 3 of them share almost every hardware
piece
same motherboard Asus CROSSHAIR V FORMULA-Z, BIOS 2101 04/17/2014
same ram TridentX - F3-2400C10D-8GTX - G.SKILL DDR3 Memory x4
same cpu AMD FX(tm)-8350 Eight-Core Processor
The main differences are one has a 8805 and intel PT1000 + nvidia GeForce GTX
560 (with nvidia blob)
(working)

And the two failing have a 6805 + Intel 10-Gigabit X540-AT2 + Nvidia GT218
(pci-e 1x) with nouveau
As the crash message really involve aacraid, That's how I deducted the 6800 is
the culprit in the
stack.


We updated the firmware on controler to last available at pmc/adaptec
And have this running actually

Controllers Found: 2
----------------------------------------------------------------------
Controller Information
----------------------------------------------------------------------
   Controller Status                        : OK
   Channel Description                      : SAS/SATA
   Controller Model                         : Adaptec 6805
   Controller World Wide Name               : 50000D1104872180
   Controller Alarm                         : Enabled
   Temperature                              : 52 C/ 125 F (Normal)
   Installed Memory                         : 512 MB
   Global task priority                     : Low
   Performance Mode                         : Default/Dynamic
   Host Bus Type                            : PCIe
   Host Bus Speed                           : 5000 MHz
   Host Bus Link Width                      : 8 bit(s)/link(s)
   Stayawake Period                         : Disabled
   Spinup limit internal drives             : 0
   Spinup limit external drives             : 0
   Defunct Disk Drive Count                 : 0
   Logical Devices/Failed/Degraded          : 1/0/0
   NCQ Status                               : Enabled
   Statistics Data Collection Mode          : Enabled
   --------------------------------------------------------
   RAID Properties
   --------------------------------------------------------
   Copyback                                 : Disabled
   Automatic Failover                       : Enabled
   Background consistency check             : Enabled
   Background Consistency Check Period      : 30
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (19176)
   Firmware                                 : 5.2-0 (19176)
   Driver                                   : 1.2-0 (30200)
   Boot Flash                               : 5.2-0 (19176)
   SEEPROM (Load version/ Flash version)    : 2/ 8
   --------------------------------------------------------
   Controller ZMM Information
   --------------------------------------------------------
   Status                                   : ZMM Optimal


You are receiving this mail because: