Comment # 7 on bug 1048942 from
(In reply to Hannes Reinecke from comment #6)
> Could you try this?
>
> diff --git a/drivers/scsi/qla2xxx/qla_init.c
> b/drivers/scsi/qla2xxx/qla_init.c

Unfortunately, that didn't work.
The system is still looping with:

-->
[   40.974837] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=47b
portid=770403 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.974969] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=47d
portid=7c0700 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975117] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=480
portid=770505 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975277] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=482
portid=770403 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975394] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=484
portid=7c0700 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975396] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=485
portid=770505 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975722] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=489
portid=770403 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975726] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=48a
portid=7c0700 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
[   40.975852] qla2xxx [0000:02:03.1]-5046:7: Async-gpdb failed - hdl=48b
portid=770505 status=30 mb0=4006 mb1=0 mb2=0 mb6=0 mb7=0.
--<

And eventually crashing.
Still, I'm more concerned about the loop than the crash.
It seems that this loop just creates a cpu lockup which triggers the NMI
watchdog:

-->
[   56.389236] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0

--<

The code shows as message origin the function: qla2x00_mbx_iocb_entry in
drivers/scsi/qla2xxx/qla_isr.c

-->
static void
qla2x00_mbx_iocb_entry(scsi_qla_host_t *vha, struct req_que *req,
    struct mbx_entry *mbx)
{
[...]
        ql_log(ql_log_warn, vha, 0x5046,
            "Async-%s failed - hdl=%x portid=%02x%02x%02x status=%x "
            "mb0=%x mb1=%x mb2=%x mb6=%x mb7=%x.\n", type, sp->handle,
            fcport->d_id.b.domain, fcport->d_id.b.area, fcport->d_id.b.al_pa,
            status, le16_to_cpu(mbx->mb0), le16_to_cpu(mbx->mb1),
            le16_to_cpu(mbx->mb2), le16_to_cpu(mbx->mb6),
            le16_to_cpu(mbx->mb7));
--<

It is getting called from a while loop in function
qla2x00_process_response_queue:

-->
qla2x00_process_response_queue(struct rsp_que *rsp)
{
        struct scsi_qla_host *vha;
        struct qla_hw_data *ha = rsp->hw;
        struct device_reg_2xxx __iomem *reg = &ha->iobase->isp;
        sts_entry_t     *pkt;
        uint16_t        handle_cnt;
        uint16_t        cnt;

        vha = pci_get_drvdata(ha->pdev);

        if (!vha->flags.online)
                return;

        while (rsp->ring_ptr->signature != RESPONSE_PROCESSED) {
[...]
                case MBX_IOCB_TYPE:
                        qla2x00_mbx_iocb_entry(vha, rsp->req,
                            (struct mbx_entry *)pkt);
                        break;
--<

But as you can see there is a break after calling qla2x00_mbx_iocb_entry, so I
don't really understand how this can loop.
It would also be interesting to know what:

 status=30 mb0=4006

in the log means. Maybe that indicates an error?

I you could give me any pointer I'm happy to further investigate.
After all, learning a bit more about the qlogic driver could be my alternative
hackweek project. :)


You are receiving this mail because: