What | Removed | Added |
---|---|---|
Status | RESOLVED | REOPENED |
CC | alan@softiron.co.uk | |
Resolution | NORESPONSE | --- |
Hi guys, I know what the problem is, as I have seen this issue and chased it down. The symptom: Plug in a specific thumb drive, and it is unusable (eg: unable to be mounted) for 60 seconds. dmesg will show that after 60 seconds the drive is reset and then becomes usable (eg: able to be mounted). Using a USB analyzer, I was able to determine that the host is sending a SCSI INQUIRY request with the VPD bit set looking for page 0x80, which is where the device strings are. The device then incorrectly returns a non-zero in a reserved field, which the host incorrectly treats as part of a length field, causing the host to re-send the SCSI INQUIRY request again with the much larger length field, causing the thumb drive to hang. Make no mistake, this is a bug on the thumb drive, but we need to work around it, like we always do. See SCSI documentation from [1]. The section numbers below come from this document. Summary of events: 1. device is connected 2. host enumerates device 3. host sends SCSI INQUIRY with EVPD bit set, requesting page 0x80 (serial numbers). See 3.6.1, Table 45, and information on EVPD in the same section. 4. The device returns the contents of page 0x80, but incorrectly puts a 0x6 in byte 2, which is supposed to be a reserved field. See 4.4.10. This is a bug in the thumb drive. 5. The host interprets bytes 2 and 3 together as a 16-bit length value, relying on the device to have zeroed byte 2 (as required), which this thumb drive incorrectly does not do. 6. With the length returned by the device being much larger than expected, the host will then re-send the INQUIRY with the new, larger, incorrect length (in this case 0x6fc instead of just 0xfc). 7. The thumb drive, for whatever reason, can't handle this request and hangs. 8. After 60 seconds, the request times out, and the device is reset. After the device is reset, the host will NOT issue the INQUIRY with VPD for page 0x80, and the device works fine (eg: is able to be mounted). After trying to find where in the kernel this SCSI INQUIRY with VPD was originating from, I eventually was able to determine that it was happening as a result of an ioctl() initiated from user space by the sg_inq process, which is part of the sg3_utils package. This is happening as a result of a udev rule[2], which is also part of sg3_utils. This udev rule[2] calls sg_inq which sends the SCSI commands directly to the device. It's worth pointing out here, that the Linux kernel's code, which can read these VPD pages, will _not_ issue these EVPD INQUIRY requests for USB devices. In drivers/usb/storage/scsiglue.c, is the code: /* Some devices don't handle VPD pages correctly */ sdev->skip_vpd_pages = 1; Indeed, this device does not handle VPD correctly. There's a reason the kernel doesn't issue these requests on USB devices, so they shouldn't be issued from user space either. 1. Is there a reason to need this sg3_utils udev file which pokes at the drive from user space when it's attached? 2. If the answer to the above is yes, can it be modified to _not_ be run for USB devices? My recommendation is to remove the udev rule at [2] completely. The kernel is very good at working around broken hardware and giving each device the interactions that it needs in order to work properly. This udev rule cuts through all those checks and workarounds and issues SCSI commands directly from user space, re-exposing the same hardware issues that the kernel is very careful to work around. I'm running on Leap 42.1 on aarch64, with sg3_utils version 1.41. The latest sg3_utils source[3] still has the issue. Again though, the larger issue here is not that sg3_utils is doing something incorrectly (even though it is); the issue is that we are sending SCSI commands from user space to devices where they are not appropriate (eg: USB devices). Thanks! Alan. [1] http://www.seagate.com/staticfiles/support/disc/manuals/scsi/100293068a.pdf [2] /usr/lib/udev/rules.d/55-scsi-sg3_id.rules [3] https://github.com/hreinecke/sg3_utils/blob/master/src/sg_inq.c#L3058