New subject: [Bug 1007716] scsi scan work was stuck forever when reboot storage controller

31 Oct 2016

      http://bugzilla.novell.com/show_bug.cgi?id=1007716

            Bug ID: 1007716
           Summary: scsi scan work was stuck forever when reboot storage
                    controller
    Classification: openSUSE
           Product: openSUSE.org
           Version: unspecified
          Hardware: x86-64
                OS: SLES 11
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: 3rd party software
          Assignee: opensuse-communityscreening@forge.provo.novell.com
          Reporter: liuteng.liu@huawei.com
        QA Contact: opensuse-communityscreening@forge.provo.novell.com
          Found By: ---
           Blocker: ---

My server install suse11sp2, and Emulex HBA card connect to SAN storage. I
mapped 100 luns to host server. Once reboot storage controller A leads to scsi
scan work stuck with following 
call traces:

kbox: Hung task kworker/u:5:1023 is in D state,more than 120 seconds!
kworker/u:5
D
  1023   1023      2 
Call Trace:
 [<ffffffff803f7fad>] schedule_timeout+0x21d/0x2c0
 [<ffffffff803f6e95>] wait_for_common+0xe5/0x210
 [<ffffffff801faf58>] blk_execute_rq+0xb8/0xf0
 [<ffffffffa0087ab5>] alua_vpd_inquiry+0xb5/0x3a0 [scsi_dh_alua]
 [<ffffffffa0087e4e>] alua_initialize+0xae/0x130 [scsi_dh_alua]
 [<ffffffffa008833e>] alua_bus_attach+0x6e/0x19c [scsi_dh_alua]
 [<ffffffffa007223a>] scsi_dh_handler_attach+0x2a/0x80 [scsi_dh]
 [<ffffffff803fde47>] notifier_call_chain+0x37/0x70
 [<ffffffff8006e68b>] __blocking_notifier_call_chain+0x5b/0x90
 [<ffffffff802cdee2>] device_add+0x2b2/0x4e0
 [<ffffffffa000f8a1>] scsi_sysfs_add_sdev+0xb1/0x310 [scsi_mod]
 [<ffffffffa000c888>] scsi_add_lun+0x518/0x530 [scsi_mod]
 [<ffffffffa000cd89>] scsi_probe_and_add_lun+0x1b9/0x480 [scsi_mod]
 [<ffffffffa000d326>] scsi_report_lun_scan+0x2d6/0x440 [scsi_mod]
 [<ffffffffa000dab6>] __scsi_scan_target+0xf6/0x1f0 [scsi_mod]
 [<ffffffffa000e0b1>] scsi_scan_target+0xd1/0xf0 [scsi_mod]
 [<ffffffffa05a712a>] fc_scsi_scan_rport+0xaa/0xb0 [scsi_transport_fc]
 [<ffffffff80060b78>] process_one_work+0x168/0x350
 [<ffffffff8006452a>] worker_thread+0x17a/0x480
 [<ffffffff80068126>] kthread+0x96/0xa0
 [<ffffffff80402894>] kernel_thread_helper+0x4/0x10

scan_work was stuck at submit_vpd_inquiry(), because it is waiting for low 
layer to complete evpd inquiry request. But io never complete because request
queue was set to QUEUE_FLAG_STOPPED, therefore it just return without endio.
And stopped flag was set by fc_remote_port_delete which trigger by lpfc driver
when received RSCN event.

        CPU1                CPU2
--------------------------------------------------------------------
                    fc_remote_port_add <-- queue scan work
                    fc_scsi_scan_rport                                        
                    scsi_scan_target
                    __scsi_scan_target
                    scsi_add_lun
                    scsi_dh_handler_attach
                    alua_initialize //alua mode
                    alua_vpd_inquiry
                    submit_vpd_inquiry
fc_remote_port_delete
scsi_target_block
device_block
scsi_internal_device_block
blk_stop_queue
queue_flag_set(QUEUE_FLAG_STOPPED, q);
                    blk_execute_rq
                    __blk_run_queue
                    if(unlikely(blk_queue_stopped(q)))
                        return;
                    <<< never complete, wait forever
                    wait_for_completion                                         

we are hoping suse could fix this problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 1007716] New: scsi scan work was stuck forever when reboot storage controller

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

tags

participants (1)