USB storage read 64k problem
Hi, One of my USB storage devices fails after 64k read (when copying files) with an IO error (the device worked fine with previous SuSEs, works with Windows, too, but I did see a similar problem on a Debian distribution). It's a Mambo-X USB MP3 stick, which reports an Atmel ID (so the driver chip is from Atmel). Writing is ok, and it plays all written files well, so the device itself is "ok" (it also can play files with anoraK). The other USB storage devices I've attached so far work well. I turned on USB storage verbose debug support, recompiled the kernel, and looked at the output. The highly suspect entry is: Jan 6 00:40:14 vimes kernel: usb-storage: Bulk Status S 0x53425355 T 0x2a556 R 65536 Stat 0x0 ^^^^^ This certainly should be 0 (all other entries indicate that the transfer was ok). For all other sizes <64k, it *is* 0, as it should be. Faking the residual to 0 when it is reportedly longer fixes all problems (see attached patch, against the SuSE 9.2 2.6.8-24.10 kernel, x86_64 version). The source of the problem is obviously buggy code inside the device itself, i.e. it subtracts the residual using a 16 bit counter only. This explains nicely why it works with Windows (that just doesn't do
=64k SCSI reads ;-).
This bad hack "fixes" my problem, but it's not a real solution. It is probably better to be able to restrict certain USB storage devices (specifically this one ;-) to less than 64k transfers, or at least check for this ID when doing the hack (the transfer itself is ok, just the reported bytes are not). The device ID is (from lsusb) Bus 004 Device 002: ID 03eb:2002 Atmel Corp. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
The source of the problem is obviously buggy code inside the device itself, i.e. it subtracts the residual using a 16 bit counter only. This explains nicely why it works with Windows (that just doesn't do
=64k SCSI reads ;-).
Thanks for debugging.
This bad hack "fixes" my problem, but it's not a real solution. It is
Given the typo I'm surprised it still works at all. However I would suspect a better fix would be to limit the device to smaller transfer sizes. You can do that by lowering max_sectors in sysfs or adding an entry in drivers/usb/storage/scsiglue.c:slave_configure() There is already a similar workaround for another device there. That would probably be the right workaround. Given that thre is an easy workaround I don't think we'll take such a patch for 9.2, but if it gets into mainline the next release will likely have it.
--- linux-2.6.8-24.10/drivers/usb/storage/transport.c.orig 2005-01-06 00:52:32.283149967 +0100 +++ linux-2.6.8-24.10/drivers/usb/storage/transport.c 2005-01-06 00:52:58.935858513 +0100 @@ -1055,6 +1055,8 @@
/* try to compute the actual residue, based on how much data * was really transferred and what the device tells us */ + if(residue = 0x10000)
Surely you mean if (residue == 0x10000) ?
+ residue = 0; // fake residue to 0 residue = min(residue, transfer_length); srb->resid = max(srb->resid, (int) residue);
-Andi
On Thursday 06 January 2005 02:40, Andi Kleen wrote:
This bad hack "fixes" my problem, but it's not a real solution. It is
Given the typo I'm surprised it still works at all.
Well, it works because "variable = 0x10000" is always true, so it always ignores the reported residue from the device, which is wrong, anyway :-(. I expected GCC to give me a warning, so what happend to -Wall on kernel compiles?
However I would suspect a better fix would be to limit the device to smaller transfer sizes. You can do that by lowering max_sectors in sysfs or adding an entry in drivers/usb/storage/scsiglue.c:slave_configure() There is already a similar workaround for another device there.
Yes, that was the place I wanted to know (see attachment, this time with real == comparisons ;-).
That would probably be the right workaround. Given that thre is an easy workaround I don't think we'll take such a patch for 9.2, but if it gets into mainline the next release will likely have it.
Ok. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
On Thu, Jan 06, 2005 at 01:36:02PM +0100, Bernd Paysan wrote:
On Thursday 06 January 2005 02:40, Andi Kleen wrote:
This bad hack "fixes" my problem, but it's not a real solution. It is
Given the typo I'm surprised it still works at all.
Well, it works because "variable = 0x10000" is always true, so it always ignores the reported residue from the device, which is wrong, anyway :-(. I expected GCC to give me a warning, so what happend to -Wall on kernel compiles?
It's still there. Perhaps you overlooked the warning.
That would probably be the right workaround. Given that thre is an easy workaround I don't think we'll take such a patch for 9.2, but if it gets into mainline the next release will likely have it.
Ok
I assume Matthew Dharm will take the patch. -Andi
participants (2)
-
Andi Kleen
-
Bernd Paysan