[opensuse] SCSI Problem
Hello, I've a problem with the scsi subsystem We have tape library connected over SAN with ten tape devices. The devices are seen with lssci and the mappings between the generic scsi devices and the scsi addresses (H:B:T:L) with sg_map -x. This looks like: .. /dev/sg235 7 0 0 0 12 /dev/sg236 7 0 0 1 8 /dev/sg237 7 0 0 2 8 /dev/sg238 7 0 1 0 1 /dev/nst0 /dev/sg239 7 0 2 0 1 /dev/nst1 /dev/sg240 7 0 2 1 8 /dev/sg241 7 0 3 0 1 /dev/nst2 /dev/sg242 7 0 4 0 1 /dev/nst3 ... But sometimes I get the messages from sg_map: "Strange, could not find device /dev/nstxx mapped to sg device??" on top of the output and at the end the strange "scsi address" (see sg254) /dev/sg252 9 0 1 0 1 /dev/sg253 10 0 0 0 1 /dev/nst11 /dev/sg254 -2 -2 -2 -2 -2 /dev/sg255 10 0 2 0 1 The device file /dev/sg254 exist in the /dev directory. If I disconnect the device, the device file will be removed and if I reconnect again, the device file is create automatically. Does anyone has a clue, why this strange behavior occurs? thanks Meike -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, 14 Mar 2014 10:28:07 +0100 Meike Stone wrote:
Hello,
I've a problem with the scsi subsystem We have tape library connected over SAN with ten tape devices. The devices are seen with lssci and the mappings between the generic scsi devices and the scsi addresses (H:B:T:L) with sg_map -x. This looks like: .. /dev/sg235 7 0 0 0 12 /dev/sg236 7 0 0 1 8 /dev/sg237 7 0 0 2 8 /dev/sg238 7 0 1 0 1 /dev/nst0 /dev/sg239 7 0 2 0 1 /dev/nst1 /dev/sg240 7 0 2 1 8 /dev/sg241 7 0 3 0 1 /dev/nst2 /dev/sg242 7 0 4 0 1 /dev/nst3 ...
But sometimes I get the messages from sg_map: "Strange, could not find device /dev/nstxx mapped to sg device??" on top of the output and at the end the strange "scsi address" (see sg254) /dev/sg252 9 0 1 0 1 /dev/sg253 10 0 0 0 1 /dev/nst11 /dev/sg254 -2 -2 -2 -2 -2 /dev/sg255 10 0 2 0 1
The device file /dev/sg254 exist in the /dev directory. If I disconnect the device, the device file will be removed and if I reconnect again, the device file is create automatically.
Does anyone has a clue, why this strange behavior occurs?
thanks Meike
Hello Meike, Do you know the manufacturer, model & serial numbers for the subsystem? For the tape drives? These devices usually have firmware, too, so you want those version numbers as well. Is the subsystem single-ended SCSI or is it differentially driven? And which SCSI flavor? (1, 2, fast, fast-wide, and so on...?) The same information is helpful to have for the host controller, as well, including firmware version. If it turns out that all the hardware, including controllers, cables, connectors and terminators, are in good repair you'll need this information to start debugging at the driver / firmware / software level. The above being said, the first thing I would check is the affected device. Is it the last on the bus, i.e. furthest away, electrically, from the host controller? If so, it needs to be properly terminated. Inspect the length, condition and quality of all the cables, connectors and sockets and the terminator, too, to confirm that everything is within the physical design specifications. Likewise, check the condition of the subsystem power supply and wiring. It could be the affected device is receiving marginal or 'flaky' power due to the supply degrading over time. hth & regards, Carl -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
2014-03-14 11:12 GMT+01:00 Carl Hartung <opensuse@cehartung.com>:
On Fri, 14 Mar 2014 10:28:07 +0100 Meike Stone wrote:
Hello,
I've a problem with the scsi subsystem We have tape library connected over SAN with ten tape devices. The devices are seen with lssci and the mappings between the generic scsi devices and the scsi addresses (H:B:T:L) with sg_map -x. This looks like: .. /dev/sg235 7 0 0 0 12 /dev/sg236 7 0 0 1 8 /dev/sg237 7 0 0 2 8 /dev/sg238 7 0 1 0 1 /dev/nst0 /dev/sg239 7 0 2 0 1 /dev/nst1 /dev/sg240 7 0 2 1 8 /dev/sg241 7 0 3 0 1 /dev/nst2 /dev/sg242 7 0 4 0 1 /dev/nst3 ...
But sometimes I get the messages from sg_map: "Strange, could not find device /dev/nstxx mapped to sg device??" on top of the output and at the end the strange "scsi address" (see sg254) /dev/sg252 9 0 1 0 1 /dev/sg253 10 0 0 0 1 /dev/nst11 /dev/sg254 -2 -2 -2 -2 -2 /dev/sg255 10 0 2 0 1
The device file /dev/sg254 exist in the /dev directory. If I disconnect the device, the device file will be removed and if I reconnect again, the device file is create automatically.
Does anyone has a clue, why this strange behavior occurs?
thanks Meike
Hello Meike,
Do you know the manufacturer, model & serial numbers for the subsystem? For the tape drives? These devices usually have firmware, too, so you want those version numbers as well. Is the subsystem single-ended SCSI or is it differentially driven? And which SCSI flavor? (1, 2, fast, fast-wide, and so on...?) The same information is helpful to have for the host controller, as well, including firmware version. If it turns out that all the hardware, including controllers, cables, connectors and terminators, are in good repair you'll need this information to start debugging at the driver / firmware / software level.
Hello, there are 10 tape devices from HP (Ultrium 4-SCSI und 6-SCSI) the are connected via FC, so no electrical wiring and terminating necessary. The FC HBAs are: Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) With udevad I get following Informations for a failing device: ================================================================ ~#udevadm info --attribute-walk --name /dev/sg254 Udevadm info starts with the device specified by the devpath and then walks up the chain of parent devices. It prints for every device found, all possible attributes in the udev rules key format. A rule to match, can be composed by the attributes of the device and the attributes from one single parent device. looking at device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1/10:0:1:0/scsi_generic/sg254': KERNEL=="sg254" SUBSYSTEM=="scsi_generic" DRIVER=="" looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1/10:0:1:0': KERNELS=="10:0:1:0" SUBSYSTEMS=="scsi" DRIVERS=="st" ATTRS{device_blocked}=="0" ATTRS{type}=="1" ATTRS{scsi_level}=="7" ATTRS{vendor}=="HP " ATTRS{model}=="Ultrium 6-SCSI " ATTRS{rev}=="J3KZ" ATTRS{state}=="running" ATTRS{timeout}=="900" ATTRS{iocounterbits}=="32" ATTRS{iorequest_cnt}=="0x714a5e" ATTRS{iodone_cnt}=="0x714a5e" ATTRS{ioerr_cnt}=="0x2c" ATTRS{modalias}=="scsi:t-0x01" ATTRS{evt_media_change}=="0" ATTRS{dh_state}=="alua" ATTRS{queue_depth}=="16" ATTRS{queue_ramp_up_period}=="120000" ATTRS{queue_type}=="none" looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1': KERNELS=="target10:0:1" SUBSYSTEMS=="scsi" DRIVERS=="" looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1': KERNELS=="rport-10:0-1" SUBSYSTEMS=="" DRIVERS=="" looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10': KERNELS=="host10" SUBSYSTEMS=="scsi" DRIVERS=="" ATTRS{fw_dump}=="" ATTRS{optrom}=="" looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1': KERNELS=="0000:21:00.1" SUBSYSTEMS=="pci" DRIVERS=="qla2xxx" ATTRS{vendor}=="0x1077" ATTRS{device}=="0x2532" ATTRS{subsystem_vendor}=="0x103c" ATTRS{subsystem_device}=="0x3263" ATTRS{class}=="0x0c0400" ATTRS{irq}=="68" ATTRS{local_cpus}=="00000000,000000f0" ATTRS{local_cpulist}=="4-7" ATTRS{modalias}=="pci:v00001077d00002532sv0000103Csd00003263bc0Csc04i00" ATTRS{numa_node}=="1" ATTRS{dma_mask_bits}=="64" ATTRS{consistent_dma_mask_bits}=="64" ATTRS{enable}=="1" ATTRS{broken_parity_status}=="0" ATTRS{msi_bus}=="" looking at parent device '/devices/pci0000:20/0000:20:03.0': KERNELS=="0000:20:03.0" SUBSYSTEMS=="pci" DRIVERS=="pcieport" ATTRS{vendor}=="0x8086" ATTRS{device}=="0x3c08" ATTRS{subsystem_vendor}=="0x103c" ATTRS{subsystem_device}=="0x18a8" ATTRS{class}=="0x060400" ATTRS{irq}=="69" ATTRS{local_cpus}=="00000000,000000f0" ATTRS{local_cpulist}=="4-7" ATTRS{modalias}=="pci:v00008086d00003C08sv0000103Csd000018A8bc06sc04i00" ATTRS{numa_node}=="1" ATTRS{dma_mask_bits}=="32" ATTRS{consistent_dma_mask_bits}=="32" ATTRS{enable}=="2" ATTRS{broken_parity_status}=="0" ATTRS{msi_bus}=="1" looking at parent device '/devices/pci0000:20': KERNELS=="pci0000:20" SUBSYSTEMS=="" DRIVERS=="" ================================================================ Looks like that all is ok... The sg_map command tries to gather informations from the device via systemcall ioctl request code SG_GET_SCSI_ID after opening the device with open. But the open fails with -1 :-O ... Thanks Meike -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Btw. I forgot to tell, that on the FC-Ports are no rx/tx errors, frame drops, link up/downs, .. or anything else that looks like an error 2014-03-14 12:32 GMT+01:00 Meike Stone <meike.stone@googlemail.com>:
2014-03-14 11:12 GMT+01:00 Carl Hartung <opensuse@cehartung.com>:
On Fri, 14 Mar 2014 10:28:07 +0100 Meike Stone wrote:
Hello,
I've a problem with the scsi subsystem We have tape library connected over SAN with ten tape devices. The devices are seen with lssci and the mappings between the generic scsi devices and the scsi addresses (H:B:T:L) with sg_map -x. This looks like: .. /dev/sg235 7 0 0 0 12 /dev/sg236 7 0 0 1 8 /dev/sg237 7 0 0 2 8 /dev/sg238 7 0 1 0 1 /dev/nst0 /dev/sg239 7 0 2 0 1 /dev/nst1 /dev/sg240 7 0 2 1 8 /dev/sg241 7 0 3 0 1 /dev/nst2 /dev/sg242 7 0 4 0 1 /dev/nst3 ...
But sometimes I get the messages from sg_map: "Strange, could not find device /dev/nstxx mapped to sg device??" on top of the output and at the end the strange "scsi address" (see sg254) /dev/sg252 9 0 1 0 1 /dev/sg253 10 0 0 0 1 /dev/nst11 /dev/sg254 -2 -2 -2 -2 -2 /dev/sg255 10 0 2 0 1
The device file /dev/sg254 exist in the /dev directory. If I disconnect the device, the device file will be removed and if I reconnect again, the device file is create automatically.
Does anyone has a clue, why this strange behavior occurs?
thanks Meike
Hello Meike,
Do you know the manufacturer, model & serial numbers for the subsystem? For the tape drives? These devices usually have firmware, too, so you want those version numbers as well. Is the subsystem single-ended SCSI or is it differentially driven? And which SCSI flavor? (1, 2, fast, fast-wide, and so on...?) The same information is helpful to have for the host controller, as well, including firmware version. If it turns out that all the hardware, including controllers, cables, connectors and terminators, are in good repair you'll need this information to start debugging at the driver / firmware / software level.
Hello, there are 10 tape devices from HP (Ultrium 4-SCSI und 6-SCSI) the are connected via FC, so no electrical wiring and terminating necessary.
The FC HBAs are: Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
With udevad I get following Informations for a failing device:
================================================================ ~#udevadm info --attribute-walk --name /dev/sg254
Udevadm info starts with the device specified by the devpath and then walks up the chain of parent devices. It prints for every device found, all possible attributes in the udev rules key format. A rule to match, can be composed by the attributes of the device and the attributes from one single parent device.
looking at device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1/10:0:1:0/scsi_generic/sg254': KERNEL=="sg254" SUBSYSTEM=="scsi_generic" DRIVER==""
looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1/10:0:1:0': KERNELS=="10:0:1:0" SUBSYSTEMS=="scsi" DRIVERS=="st" ATTRS{device_blocked}=="0" ATTRS{type}=="1" ATTRS{scsi_level}=="7" ATTRS{vendor}=="HP " ATTRS{model}=="Ultrium 6-SCSI " ATTRS{rev}=="J3KZ" ATTRS{state}=="running" ATTRS{timeout}=="900" ATTRS{iocounterbits}=="32" ATTRS{iorequest_cnt}=="0x714a5e" ATTRS{iodone_cnt}=="0x714a5e" ATTRS{ioerr_cnt}=="0x2c" ATTRS{modalias}=="scsi:t-0x01" ATTRS{evt_media_change}=="0" ATTRS{dh_state}=="alua" ATTRS{queue_depth}=="16" ATTRS{queue_ramp_up_period}=="120000" ATTRS{queue_type}=="none"
looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1/target10:0:1': KERNELS=="target10:0:1" SUBSYSTEMS=="scsi" DRIVERS==""
looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10/rport-10:0-1': KERNELS=="rport-10:0-1" SUBSYSTEMS=="" DRIVERS==""
looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1/host10': KERNELS=="host10" SUBSYSTEMS=="scsi" DRIVERS=="" ATTRS{fw_dump}=="" ATTRS{optrom}==""
looking at parent device '/devices/pci0000:20/0000:20:03.0/0000:21:00.1': KERNELS=="0000:21:00.1" SUBSYSTEMS=="pci" DRIVERS=="qla2xxx" ATTRS{vendor}=="0x1077" ATTRS{device}=="0x2532" ATTRS{subsystem_vendor}=="0x103c" ATTRS{subsystem_device}=="0x3263" ATTRS{class}=="0x0c0400" ATTRS{irq}=="68" ATTRS{local_cpus}=="00000000,000000f0" ATTRS{local_cpulist}=="4-7" ATTRS{modalias}=="pci:v00001077d00002532sv0000103Csd00003263bc0Csc04i00" ATTRS{numa_node}=="1" ATTRS{dma_mask_bits}=="64" ATTRS{consistent_dma_mask_bits}=="64" ATTRS{enable}=="1" ATTRS{broken_parity_status}=="0" ATTRS{msi_bus}==""
looking at parent device '/devices/pci0000:20/0000:20:03.0': KERNELS=="0000:20:03.0" SUBSYSTEMS=="pci" DRIVERS=="pcieport" ATTRS{vendor}=="0x8086" ATTRS{device}=="0x3c08" ATTRS{subsystem_vendor}=="0x103c" ATTRS{subsystem_device}=="0x18a8" ATTRS{class}=="0x060400" ATTRS{irq}=="69" ATTRS{local_cpus}=="00000000,000000f0" ATTRS{local_cpulist}=="4-7" ATTRS{modalias}=="pci:v00008086d00003C08sv0000103Csd000018A8bc06sc04i00" ATTRS{numa_node}=="1" ATTRS{dma_mask_bits}=="32" ATTRS{consistent_dma_mask_bits}=="32" ATTRS{enable}=="2" ATTRS{broken_parity_status}=="0" ATTRS{msi_bus}=="1"
looking at parent device '/devices/pci0000:20': KERNELS=="pci0000:20" SUBSYSTEMS=="" DRIVERS=="" ================================================================
Looks like that all is ok...
The sg_map command tries to gather informations from the device via systemcall ioctl request code SG_GET_SCSI_ID after opening the device with open.
But the open fails with -1 :-O ...
Thanks Meike -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, 14 Mar 2014 13:49:35 +0100 Meike Stone wrote:
Btw. I forgot to tell, that on the FC-Ports are no rx/tx errors, frame drops, link up/downs, .. or anything else that looks like an error
Hi Meike, Thanks for the clarification and I apologize for the delay. It was a work day, today, so... In any case, no errors in communications with the subsystem is very good to know. When you combine that with the anomaly being confined to a single drive, this makes me think it's in the media or the drive, itself. Depending upon the exact year and model, the drive should have at least two on-board trouble / error codes that would be helpful to have but, again, you're going to need physical access to obtain them. hth, good luck & regards, Carl -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (2)
-
Carl Hartung
-
Meike Stone