[Bug 670816] New: Kernel oops during bootup in ses_intf_add function for lun 0 scsi device
https://bugzilla.novell.com/show_bug.cgi?id=670816 https://bugzilla.novell.com/show_bug.cgi?id=670816#c0 Summary: Kernel oops during bootup in ses_intf_add function for lun 0 scsi device Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: All OS/Version: SLES 11 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: somasundaram.krishnasamy@lsi.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729) During device discovery (bootup), scsi mid layer sends INQUIRY command to LUN 0. If the LUN 0 is not mapped to host, it will create a temporary scsi_device with LUN id 0 and send REPORT_LUNS command to it. After the REPORT_LUNS succeeds, it walks through the LUN table returned by REPORT_LUNS and adds each LUN found to sysfs. At the end of REPORT_LUNS lun table scan, it will delete the temporary scsi_device of LUN 0. When scsi devices are added to sysfs, it calls add_dev function of all the registered class interfaces. If ses driver is registered, it will call ses_intf_add(). This function in ses_intf_add()->scsi_device_enclosure() path will try to access the inquiry data for the temporary lun 0 scsi_device. static inline int scsi_device_enclosure(struct scsi_device *sdev) { return sdev->inquiry[6] & (1<<6); } Since sdev->inquiry was not allocated for temporary LUN 0 scsi_device, it will cause NULL pointer exception. This makes the scanning thread to die leaving the class mutex lock held. The following message is thrown in the console log. <1>[ 6.713599] BUG: unable to handle kernel NULL pointer dereference at 0000000000000006 <1>[ 6.717522] IP: [<ffffffffa02adf8c>] ses_intf_add+0x4ec/0x508 [ses] <4>[ 6.717522] PGD 233b41067 PUD 233754067 PMD 0 <0>[ 6.717522] Oops: 0000 [#1] SMP <0>[ 6.717522] last sysfs file: /sys/module/sd_mod/initstate <4>[ 6.717522] CPU 1 <4>[ 6.717522] Modules linked in: qla2xxx bnx2 i2c_i801 iTCO_wdt rtc_cmos iTCO_vendor_support rtc_core ibmpex(X) ibmaem(X) sr_mod i5000_edac ses scsi_transport_fc rtc_lib cdrom ipmi_msghandler pcspkr serio_raw i2c_core enclosure i5k_amb button tpm_tis scsi_tgt tpm edac_core tpm_bios shpchp ioatdma pci_hotplug dca uhci_hcd ehci_hcd usbcore mppVhba edd ext3 mbcache jbd fan processor ide_pci_generic piix ide_core ata_generic ata_piix libata thermal thermal_sys hwmon aacraid mppUpper sg sd_mod crc_t10dif scsi_mod <4>[ 6.717522] Supported: Yes <6>[ 6.717522] Pid: 1427, comm: scsi_wq_3 Tainted: G X 2.6.32.12-0.7-default #1 IBM System x3550 -[7978AC1]- <6>[ 6.717522] RIP: 0010:[<ffffffffa02adf8c>] [<ffffffffa02adf8c>] ses_intf_add+0x4ec/0x508 [ses] <6>[ 6.717522] RSP: 0018:ffff88023356baa0 EFLAGS: 00010246 <6>[ 6.717522] RAX: 0000000000000000 RBX: ffff880233a0c000 RCX: 0000000000000000 <6>[ 6.717522] RDX: 0000000000000000 RSI: ffffffff811de0a0 RDI: ffff880235790980 <6>[ 6.717522] RBP: ffff8802352c9800 R08: 0000000000000002 R09: ffffffff81927200 <6>[ 6.717522] R10: ffff880011113718 R11: ffffffff810b6720 R12: ffff8802352c9800 <6>[ 6.717522] R13: ffff880235740938 R14: ffff880235792a00 R15: ffff880235740800 <6>[ 6.717522] FS: 0000000000000000(0000) GS:ffff880011100000(0000) knlGS:0000000000000000 <6>[ 6.717522] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <6>[ 6.717522] CR2: 0000000000000006 CR3: 0000000233b40000 CR4: 00000000000006e0 <6>[ 6.717522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <6>[ 6.717522] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[ 6.717522] Process scsi_wq_3 (pid: 1427, threadinfo ffff88023356a000, task ffff88023357a500) <0>[ 6.717522] Stack: <4>[ 6.717522] ffff880235740b48 ffff880200000000 ffff880235740938 ffff880235740b38 <4>[ 6.717522] <0> ffff880235792a20 0000000000000000 ffff880235740b38 ffffffffa02ae4a0 <4>[ 6.717522] <0> ffff880235740b48 ffff880233f93028 ffff880233eeb3c0 ffffffff81291cb2 <0>[ 6.717522] Call Trace: <4>[ 6.717522] [<ffffffff81291cb2>] device_add+0x382/0x4e0 <4>[ 6.717522] [<ffffffffa000d86f>] scsi_sysfs_add_sdev+0x7f/0x2c0 [scsi_mod] <4>[ 6.717522] [<ffffffffa000aa76>] scsi_add_lun+0x4f6/0x510 [scsi_mod] <4>[ 6.717522] [<ffffffffa000af04>] scsi_probe_and_add_lun+0x1b4/0x480 [scsi_mod] <4>[ 6.717522] [<ffffffffa000b498>] scsi_report_lun_scan+0x2c8/0x450 [scsi_mod] <4>[ 6.717522] [<ffffffffa000bc2e>] __scsi_scan_target+0xde/0x1d0 [scsi_mod] <4>[ 6.717522] [<ffffffffa000c1e0>] scsi_scan_target+0xc0/0xd0 [scsi_mod] <4>[ 6.717522] [<ffffffffa029a23a>] fc_scsi_scan_rport+0xaa/0xb0 [scsi_transport_fc] <4>[ 6.717522] [<ffffffff8105f788>] run_workqueue+0xb8/0x140 <4>[ 6.717522] [<ffffffff8105f8a6>] worker_thread+0x96/0x110 <4>[ 6.717522] [<ffffffff81063966>] kthread+0x96/0xa0 <4>[ 6.717522] [<ffffffff81003fba>] child_rip+0xa/0x20 <0>[ 6.717522] Code: 19 eb 3b 0f 1f 40 00 49 8b 3f 48 89 de e8 2d 24 d5 ff 48 85 c0 48 89 c3 74 24 8b 8b 84 00 00 00 85 c9 75 e3 48 8b 83 a8 00 00 00 <f6> 40 06 40 75 d6 48 89 de 4c 89 e7 e8 23 f9 ff ff eb c9 31 ed <1>[ 6.717522] RIP [<ffffffffa02adf8c>] ses_intf_add+0x4ec/0x508 [ses] <4>[ 6.717522] RSP <ffff88023356baa0> <0>[ 6.717522] CR2: 0000000000000006 <4>[ 7.716384] ---[ end trace 661c67574856f0ca ]--- Any other thread trying to add/delete entry to sysfs on the scsi device class will hang forever in the following stack trace. #0 [ffff880233cf98c8] schedule at ffffffff81394db2 #1 [ffff880233cf9980] __mutex_lock_slowpath at ffffffff81395e0f #2 [ffff880233cf99f0] mutex_lock at ffffffff813957ba #3 [ffff880233cf9a08] device_add at ffffffff81291c68 #4 [ffff880233cf9a48] scsi_sysfs_add_sdev at ffffffffa000d86f #5 [ffff880233cf9a78] scsi_add_lun at ffffffffa000aa76 Reproducible: Always Steps to Reproduce: 1. Don't map lun 0 to host 2. Load ses driver 3. Rescan the HBAs for scsi devices Actual Results: Scanning kernel thread oopsed leaving the class mutex held. This makes all other scanning threads to hang forever. The oops messages also displayed on the console. Expected Results: ses driver should have skipped checking the temporary scsi device's inquiry data. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=670816 https://bugzilla.novell.com/show_bug.cgi?id=670816#c1 --- Comment #1 from Somasundaram Krishnasamy <somasundaram.krishnasamy@lsi.com> 2011-02-09 23:19:36 UTC --- The code change required to fix this problem:- drivers/scsi/ses.c:ses_intf_add() shost_for_each_device(tmp_sdev, sdev->host) { if (tmp_sdev->lun != 0 || tmp_sdev->sdev_state != SDEV_RUNNING || scsi_device_enclosure(tmp_sdev)) <<<===== changed line continue; ses_match_to_enclosure(edev, tmp_sdev); } I will also submit a patch for this change. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=670816 https://bugzilla.novell.com/show_bug.cgi?id=670816#c Somasundaram Krishnasamy <somasundaram.krishnasamy@lsi.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com