[Bug 666106] New: Oops when MPTSAS raid is in degraded state
https://bugzilla.novell.com/show_bug.cgi?id=666106 https://bugzilla.novell.com/show_bug.cgi?id=666106#c0 Summary: Oops when MPTSAS raid is in degraded state Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: Linux Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: jengelh@medozas.de QAContact: qa@suse.de Found By: Beta-Customer Blocker: --- Observing a NULL deref when the kernel is booted on a degraded hardware raid set on at least two machines. (A third one last booted in synchronized state and showed no oops.) # uname -a Linux xen35 2.6.34.7-0.7-xen #1 SMP 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux Hardware is a SUN X4100. [ 0.377835] mptsas 0000:02:03.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28 [ 0.377994] mptbase: ioc0: Initiating bringup [ 1.304009] ioc0: LSISAS1064 A3: Capabilities={Initiator} [ 7.304017] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 2 [ 7.304021] mptbase: ioc0: volume is now degraded, enabled, resync in progr ess [ 7.308028] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 7.308396] IP: [<ffffffffa0000212>] scsi_device_lookup+0x22/0x90 [scsi_mod] [ 7.308667] PGD 2e50bb067 PUD 2e5009067 PMD 0 [ 7.309123] Oops: 0000 [#1] SMP [ 7.309463] last sysfs file: [ 7.309587] CPU 0 [ 7.309704] Modules linked in: mptsas(+) mptscsih mptbase scsi_transport_sas scsi_mod [ 7.310610] [ 7.310728] Pid: 82, comm: mpt/0 Not tainted 2.6.34.7-0.7-xen #1 Sun Fire X4100 Server/Sun Fire X4100 Server [ 7.310880] RIP: e030:[<ffffffffa0000212>] [<ffffffffa0000212>] scsi_device_lookup+0x22/0x90 [scsi_mod] [ 7.311145] RSP: e02b:ffff8802e50b7cf0 EFLAGS: 00010286 [ 7.311277] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000000 [ 7.311417] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000 [ 7.311557] RBP: ffff8802e50ad5c0 R08: 0000000000000000 R09: 0000000000000001 [ 7.311697] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 [ 7.311838] R13: ffff8802e512e000 R14: 0000000000000105 R15: 0000000000000001 [ 7.311981] FS: 00007f12329f4700(0000) GS:ffff88000200f000(0000) knlGS:0000000000000000 [ 7.312004] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 7.312004] CR2: 0000000000000058 CR3: 00000002e50d0000 CR4: 0000000000000660 [ 7.312004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7.312004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 7.312004] Process mpt/0 (pid: 82, threadinfo ffff8802e50b6000, task ffff8802e63ec200) [ 7.312004] Stack: [ 7.312004] ffff8802e50ad64d ffff8802e50ad5c0 ffff8802e50b7da0 ffff8802e512e000 [ 7.312004] <0> 0000000000000105 ffffffffa0086126 0000000000000000 ffff880200000000 [ 7.312004] <0> ffff8802e50b7d50 ffff880002019c80 ffff8802e50b7d70 ffffffff80036a54 [ 7.312004] Call Trace: [ 7.312004] [<ffffffffa0086126>] mptsas_send_raid_event+0x1a6/0x2a0 [mptsas] [ 7.312004] [<ffffffff800603ac>] run_workqueue+0xec/0x220 [ 7.312004] [<ffffffff8006057b>] worker_thread+0x9b/0x100 [ 7.312004] [<ffffffff80063f3e>] kthread+0x8e/0xa0 [ 7.312004] [<ffffffff80007e04>] kernel_thread_helper+0x4/0x10 [ 7.312004] Code: ff 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 4c 89 6c 24 18 41 89 f4 4c 89 74 24 20 <48> 8b 7f 58 41 89 d5 41 89 ce e8 4f ea 40 e0 44 89 e6 44 89 f1 [ 7.312004] RIP [<ffffffffa0000212>] scsi_device_lookup+0x22/0x90 [scsi_mod] [ 7.312004] RSP <ffff8802e50b7cf0> [ 7.312004] CR2: 0000000000000058 [ 7.325821] ---[ end trace 58c7baf0246ea174 ]--- [ 7.345076] scsi0 : ioc0: LSISAS1064 A3, FwRev=01040000h, Ports=1, MaxQ=511, IRQ=28 [ 7.373231] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 4, phy 2, sas_addr 0x5000cca00075bcb9 [ 7.375586] scsi 0:0:0:0: Direct-Access HITACHI H101414SCSUN146G SA25 PQ: 0 ANSI: 5 [ 7.383266] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 3, phy 3, sas_addr 0x5000cca000766139 [ 7.385103] scsi 0:0:1:0: Direct-Access HITACHI H101414SCSUN146G SA25 PQ: 0 ANSI: 5 [ 7.391033] mptsas: ioc0: attaching raid volume, channel 1, id 2 [ 7.493664] scsi 0:1:2:0: Direct-Access LSILOGIC Logical Volume 3000 PQ: 0 ANSI: 2 I would love to try with 2.6.37, but that would depdng on bug #655329. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=666106
https://bugzilla.novell.com/show_bug.cgi?id=666106#c
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=666106
https://bugzilla.novell.com/show_bug.cgi?id=666106#c1
James Bottomley
[ 0.377835] mptsas 0000:02:03.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28 [ 0.377994] mptbase: ioc0: Initiating bringup [ 1.304009] ioc0: LSISAS1064 A3: Capabilities={Initiator} [ 7.304017] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 2 [ 7.304021] mptbase: ioc0: volume is now degraded, enabled, resync in progr ess [ 7.308028] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
The bug occurs after the mpt_attach() (that's the ioc0: initiating .. message) but apparently before any scsi host is added. This means that the raid event handler was called before ioc->sh has a value, hence it's feeding a NULL into scsi_device_lookup() for the host pointer. Fixing this is a bit tricky ... the way it should be fixed is to hold off firmware RAID events until the ioc is fully brought up. It looks like the mptsas has code for this (the fw_events_off flag) but there's a race while the thread is in mpt_attach() before the events get turned off. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=666106
https://bugzilla.novell.com/show_bug.cgi?id=666106#c5
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=666106
https://bugzilla.novell.com/show_bug.cgi?id=666106#c6
--- Comment #6 from Jeff Mahoney
participants (1)
-
bugzilla_noreply@novell.com