Hi,
Adam encouraged me to release a 0.0.2d even if I was gone, as someone
else might be able to fix it (or at least play with it). So here it is.
In order to aid nailing this problem I've written it down from my view -
so that either I discover the bug while explaining it, or at least give
some one an idea of what to look for.
The bugs manifests itself by causing corruption of some of the data
written to disc. It _appears_ that this corruption only happens when
the queue is unplugged due to memory pressue (buffer cache is filled,
and something needs to be written out).
The corruption looks like data ending up in the wrong places. Copying
several big .bz2 files to disc and might show some of the blocks from
file1 in file2 and so on.
I removed the buffer hole-merge support, even though I couldn't see any
bugs in that. The corruption was still there, so that does not appear
to be the culprit. This pre-release does not have the hole merge support
back in, for this reason.
I even went as far as md5 summing data when I had the entire request
but before scheduling it for another queue, and then comparing each
individual buffer when it was completed (in the end_io handler). Nothing
showed up there. That wasn't it, so at least data is consistent from
when we let it go and when the drive has written it out (unless the
drive is garbling data, not very likely though...). I tried this
because I thought that someone may be using the buffer even though
we have it locked, but that doesn't not appear to be what is happening.
Other changes since 0.0.2c include:
* - (scsi) use implicit segment recounting for all hba's
* - fix speed setting, was consistenly off on most drives
* - only print capacity when opening for write
* - fix off-by-two error in getting/setting write+read speed (affected
* reporting as well as actual speed used)
* - possible to enable write caching on drive
* - do ioctl marshalling on sparc64 from Ben Collins