[opensuse] Issue with dd. bug or feature?
All, I am a heavy user of dd and think I know it well, but we had some unexpected behavior out of dd recently. At least it was unexpected by us. In its simplest reproducible form we did something like: md5sum big_file cat big_file | dd bs=32k conv=sync,noerror | md5sum big_file was a multiple of 32k long, so we expected the results to be identical. They were not. It appears that if dd consumes data faster than cat can produce it, then dd will get partial block reads. ie. the 32k reads will get less than 32k out of the pipe. When that happens, I suspect dd flags it as an error read and with the above 'conv=sync,noerror' arg, dd fills the 32k block with zeros. Thus the data passed into md5 has been corrupted from the original and we get a different md5sum. We found that totally dropping the 'conv=sync,noerror' arg eliminates the problem, although I'm not positive how that works. dd still reports a lot of read errors, but the md5 is right. Not very comforting to see it report hundreds of read errors just reading a couple GB of data. Also, changing the block size to 4k, fixes it. I assume because cat is working in blocks of 4k or a multiple there of. No read errors reported at all this way. Anyway, I don't know if the above is by design, or if we have come across a bug. If a bug, is it in dd? the kernel? FYI: I know conv=sync,noerror is not needed in the above. We tend to use it just in case something goes wrong. In this case, it made things go wrong! Thanks Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 06 May 2009 00:33:48 Greg Freemyer wrote:
All,
I am a heavy user of dd and think I know it well, but we had some unexpected behavior out of dd recently. At least it was unexpected by us.
In its simplest reproducible form we did something like:
md5sum big_file
cat big_file | dd bs=32k conv=sync,noerror | md5sum [...] Anyway, I don't know if the above is by design, or if we have come across a bug. If a bug, is it in dd? the kernel?
You really don't want to use conv=sync here. noerror is fine, but sync is not. With noerror, dd will continue, and try its best to fill up the output buffer. With sync, it will bad the output buffer with zeros, which will do nasty things to your md5sum. Anders -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Let me try that again On Wednesday 06 May 2009 00:44:40 I wrote:
You really don't want to use conv=sync here. noerror is fine, but sync is not. With noerror, dd will continue, and try its best to fill up the output
..by reading some more from the input...
buffer. With sync, it will bad
pad
the output buffer with zeros
..instead of reading from the input...
, which will do nasty things to your md5sum.
Anders
ds. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, May 5, 2009 at 6:44 PM, Anders Johansson <ajohansson@suse.de> wrote:
On Wednesday 06 May 2009 00:33:48 Greg Freemyer wrote:
All,
I am a heavy user of dd and think I know it well, but we had some unexpected behavior out of dd recently. At least it was unexpected by us.
In its simplest reproducible form we did something like:
md5sum big_file
cat big_file | dd bs=32k conv=sync,noerror | md5sum [...] Anyway, I don't know if the above is by design, or if we have come across a bug. If a bug, is it in dd? the kernel?
You really don't want to use conv=sync here. noerror is fine, but sync is not. With noerror, dd will continue, and try its best to fill up the output buffer. With sync, it will bad the output buffer with zeros, which will do nasty things to your md5sum.
Anders
Anders, I understand what happened, but it really came as a surprise. And I'm not sure dd should consider a partial successful read as a failure that requires the data be padded out. dd does NOT consider it a big enough issue to actually stop the dd command if you leave off the noerror arg. ie. I normally dd with if=/dev/sdx. In that case conv=noerror,sync is critical. If you have unreliable media you are copying from and you leave off noerror, dd will stop after the first failed read. Those failed reads we obviously want padded out with zeros. This is different. dd is reading from a pipe and sometimes only a partial block is available. In the absence of conv=noerror,sync, it will process all the data coming from cat just fine, and the md5sum is accurate. Thus it seems to me, dd's definition of a read error is not consistent with my definition. Thus my uncertainty that it is a bug or just a really nasty feature. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday May 5 2009, Greg Freemyer wrote:
...
Anders,
I understand what happened, but it really came as a surprise.
And I'm not sure dd should consider a partial successful read as a failure that requires the data be padded out. dd does NOT consider it a big enough issue to actually stop the dd command if you leave off the noerror arg.
That doesn't make sense / can't be done. The Linux I/O call takes a maximum byte count and returns a count of bytes actually read for each call. If there was an error, it returns -1 and a specific error code that characterizes the failure that occurred. End-of-file is signalled by a zero-byte read. If some bytes are returned, then the count is returned and thus by definition it's not an error. If you're running dd in "sync" mode, then it will request the buffer size in the read and if less than that many byes are returned, it will pad the buffer and write it to the output side. You should keep in mind that the sync mode of dd is mostly oriented towards tape devices, not usually disks and files and certainly not pipes and sockets.
ie. I normally dd with if=/dev/sdx. In that case conv=noerror,sync is critical. If you have unreliable media you are copying from and you leave off noerror, dd will stop after the first failed read. Those failed reads we obviously want padded out with zeros.
This is different. dd is reading from a pipe and sometimes only a partial block is available.
You certainly should not be using sync from a pipe or a socket, since they are not random-access or record-oriented sources, not can they be assumed to sustain non-blocking reads on their output side.
...
Greg
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (3)
-
Anders Johansson
-
Greg Freemyer
-
Randall R Schulz