[oS-en] disk imaging code question
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I use this bash code (script) to image partitions - I arrived there with help here: function hacer() { echo echo "Doing partition $1 ($2) on $3" echo "copying, compressing, and calculating md5..." mkfifo mdpipe dd if=/dev/$1 status=progress bs=16M | tee mdpipe | pigz > $3.gz & # Try using "--fast" next time md5sum -b mdpipe | tee -a md5checksum_expanded wait rm mdpipe echo "$3" >> md5checksum_expanded echo "Verifying..." pigz --test $3.gz echo echo "·········" } Used this way: time hacer sda1 "5S" "sda1_WINBOOT" time hacer sda2 "250M" "sda2_WINDOWS" (the current directory is on the destination disk, external rotating rust via USB2) This works fine. Now, I noticed something. On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help. Could that be improved somehow? Short of writing my own binary code to implement it all... Never coded with parallelization, so that would not be trivial. - -- Cheers Carlos E. R. (from 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYccnxxwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfV+DEAn0qKBQv/D0k/pGQFYl6a w5+FjD23AJ0fsJs7A+zehwK0a5S/nHEJuP3bMw== =+rL4 -----END PGP SIGNATURE-----
Carlos, et al -- ...and then Carlos E. R. said... % % Hi, % % I use this bash code (script) to image partitions - I arrived there with % help here: Thanks! This is a tight little function. % % function hacer() % { % echo % echo "Doing partition $1 ($2) on $3" ... % % Used this way: % % time hacer sda1 "5S" "sda1_WINBOOT" % time hacer sda2 "250M" "sda2_WINDOWS" [snip] I'm still curious about $2, though. It isn't used anywhere else, so it looks like it's just a way for you to provide eye candy. Is that supposed to actually drive the copy size, or is it just a visual reminder? TIA & Happy New Year :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt
On 30/12/2021 12.44, David T-G wrote:
Carlos, et al --
...and then Carlos E. R. said... % % Hi, % % I use this bash code (script) to image partitions - I arrived there with % help here:
Thanks! This is a tight little function.
Yes :-) The idea is to read from disk only once. Calculating the checksum is one huge disk read operation, and reading it in order to copy elsewhere is another huge read operation. Compression is fast on a fast CPU, but my laptops aren't, so the --fast switch should improve things. Space is not that important, I just want to not waste it, like on empty sectors. The verification phase is for paranoia, and is very slow, doesn't parallelize. But I think it is better safe than sorry, with a backup.
% % function hacer() % { % echo % echo "Doing partition $1 ($2) on $3" ... % % Used this way: % % time hacer sda1 "5S" "sda1_WINBOOT" % time hacer sda2 "250M" "sda2_WINDOWS" [snip]
I'm still curious about $2, though. It isn't used anywhere else, so it looks like it's just a way for you to provide eye candy. Is that supposed to actually drive the copy size, or is it just a visual reminder?
Just visual reminder of the time it usually takes: 5 seconds, 250 minutes... :-) But it could be anything else you'd want.
TIA & Happy New Year
:-D
Same :-) -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ? -- Per Jessen, Zürich (14.4°C)
On 30/12/2021 14.51, Per Jessen wrote:
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ?
Yesss :-D The first laptop where we developed and used this code had several cores, I forget the number. The code did run fast, so fast that the CPU overheated and clocked itself down. There is a range of pseudo-powerful laptops, designed so that they can apply cpu power for only about a minute, then they overheat and clock down. Maybe they lack a fan or is not big enough. Short sprints of speed, can not maintain it. So I modified something in order to make the script run slower. I don't remember what I did, the laptop is now with her owner. That laptop has also USB3. My own laptops only have two ancient cores. Still, my laptop would do it faster if there wasn't that alternation between read and write operations (writing via USB2). -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
Carlos E. R. wrote:
On 30/12/2021 14.51, Per Jessen wrote:
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ?
Yesss :-D
Okay :-) I was just wondering if you had too many processes competing for CPU time, essentially causing a serialisation.
Still, my laptop would do it faster if there wasn't that alternation between read and write operations (writing via USB2).
For some reason your read process is not reading ahead even if it has plenty of time to fill up some buffers while the slow write IO is taking place. -- Per Jessen, Zürich (13.3°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland.
On 30/12/2021 19.49, Per Jessen wrote:
Carlos E. R. wrote:
On 30/12/2021 14.51, Per Jessen wrote:
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ?
Yesss :-D
Okay :-)
I was just wondering if you had too many processes competing for CPU time, essentially causing a serialisation.
Ah, ok, I see your thought. No, I run this from a dedicated external hard disk, with an XFCE graphical system used for rescue/backup/restore operations. There is nothing installed or running on it, no email, browser, etc. Ok, the programs may be there, but never started. As I routinely run a terminal with top and atop, I would notice if something is using resources.
Still, my laptop would do it faster if there wasn't that alternation between read and write operations (writing via USB2).
For some reason your read process is not reading ahead even if it has plenty of time to fill up some buffers while the slow write IO is taking place.
Right. Or no buffering of the read. It fills that 16M of the buffer in dd, and then sends it forward to the pipe. A better processing would start reading another 16M buffer. So, I wonder if there is something that could be done. Another pipe that just buffers in RAM what it gets from dd, freeing dd? I think tape readers used some program called "buffer", but I have never used it. The manual is confusing. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
Carlos E. R. wrote:
On 30/12/2021 19.49, Per Jessen wrote:
Carlos E. R. wrote:
On 30/12/2021 14.51, Per Jessen wrote:
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ?
Yesss :-D
Okay :-)
I was just wondering if you had too many processes competing for CPU time, essentially causing a serialisation.
Ah, ok, I see your thought.
No, I run this from a dedicated external hard disk, with an XFCE graphical system used for rescue/backup/restore operations. There is nothing installed or running on it, no email, browser, etc. Ok, the programs may be there, but never started. As I routinely run a terminal with top and atop, I would notice if something is using resources.
No, never mind everything else. Your script causes a number of inter-dependent processes. Every time you pipe something, for instance. Your pigz is also parallelized.
Still, my laptop would do it faster if there wasn't that alternation between read and write operations (writing via USB2).
For some reason your read process is not reading ahead even if it has plenty of time to fill up some buffers while the slow write IO is taking place.
Right. Or no buffering of the read. It fills that 16M of the buffer in dd, and then sends it forward to the pipe. A better processing would start reading another 16M buffer.
No doubt it does - unless it cannot get rid of the buffer.
So, I wonder if there is something that could be done.
I'm sure there is, but your writing to USB will remain your bottleneck. -- Per Jessen, Zürich (12.9°C)
On 30/12/2021 20.42, Per Jessen wrote:
Carlos E. R. wrote:
On 30/12/2021 19.49, Per Jessen wrote:
Carlos E. R. wrote:
On 30/12/2021 14.51, Per Jessen wrote:
Carlos E. R. wrote:
(the current directory is on the destination disk, external rotating rust via USB2)
This works fine.
Now, I noticed something.
On gkrellmn I observe, as the script runs, that the reads on the source disk (ssd, in this case) and the writes to the destination disk alternate, are not simultaneous. And it took 5 hours to image perhaps 300gigs; of course, the destination is on USB2, and the CPU is old, but that alternation doesn't help.
Could that be improved somehow?
I am assuming you have more than one core for running this on ?
Yesss :-D
Okay :-)
I was just wondering if you had too many processes competing for CPU time, essentially causing a serialisation.
Ah, ok, I see your thought.
Curio: pigz --fast diskimage also alternates read and write in the same fashion as the script, but this time both read and write are on the external disk. Hum, maybe not relevant, not the normal use case.
No, I run this from a dedicated external hard disk, with an XFCE graphical system used for rescue/backup/restore operations. There is nothing installed or running on it, no email, browser, etc. Ok, the programs may be there, but never started. As I routinely run a terminal with top and atop, I would notice if something is using resources.
No, never mind everything else. Your script causes a number of inter-dependent processes. Every time you pipe something, for instance. Your pigz is also parallelized.
Yes. I hope the pipe is in RAM.
Still, my laptop would do it faster if there wasn't that alternation between read and write operations (writing via USB2).
For some reason your read process is not reading ahead even if it has plenty of time to fill up some buffers while the slow write IO is taking place.
Right. Or no buffering of the read. It fills that 16M of the buffer in dd, and then sends it forward to the pipe. A better processing would start reading another 16M buffer.
No doubt it does - unless it cannot get rid of the buffer.
So, I wonder if there is something that could be done.
I'm sure there is, but your writing to USB will remain your bottleneck.
Certainly. But the read is from internal disk (SSD), and the write is to external rotating rust (normally). Writing to usb continuously, instead of in chunks, could speed it up significantly. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
On 12/30/21 20:07, Carlos E. R. wrote:
It fills that 16M of the buffer in dd, and then sends it forward to the pipe. A better processing would start reading another 16M buffer.
So, I wonder if there is something that could be done.
To avoid the read-from-file-and-write-to-stdout in dd, you could try to completely avoid it, and instead to associate /dev/XXX directly as stdin to tee(1). mkfifo mdpipe - dd if=/dev/$1 status=progress bs=16M | tee mdpipe | pigz > $3.gz & + tee mdpipe < /dev/$1 | pigz > $3.gz & # Try using "--fast" next time md5sum -b mdpipe | tee -a md5checksum_expanded Not sure if that's much better in regards of performance, though, but obviously that saves 1x pass of reading + 1x pass of writing of the whole stuff. And of course you won't get the 'progress' output from dd. ;-) FWIW to avoid the fifo file, you could use the process substitution operator >(...) which newer shells support. Examples can be found in the tee documentation: https://www.gnu.org/software/coreutils/tee Finally, you can also play with another compression algorithm/tool than pigz. E.g. zstd seems to be quite fast, and better in compression: The following with pigz takes 2min49s here with sdb1 being a 20G file system (which is 80% used), and the resulting GZ file has 6.2G: $ tee < /dev/sdb1 tee >( pigz > sdb1.gz ) | md5sum > sdb1.md5 while using zstd tool runs 1min17s and results in a 6.0G file: $ tee < /dev/sdb1 tee >( zstd > sdb1.gz ) | md5sum > sdb1.md5 and using 'zstd --fast' has finished after 57s. BTW: I haven't read the complete thread, but I think we can assume that the partition you're copying is not (or at least read-only) mounted, right? Have a nice day, Berny
On 31/12/2021 01.50, Bernhard Voelker wrote:
On 12/30/21 20:07, Carlos E. R. wrote:
It fills that 16M of the buffer in dd, and then sends it forward to the pipe. A better processing would start reading another 16M buffer.
So, I wonder if there is something that could be done.
To avoid the read-from-file-and-write-to-stdout in dd, you could try to completely avoid it, and instead to associate /dev/XXX directly as stdin to tee(1).
mkfifo mdpipe - dd if=/dev/$1 status=progress bs=16M | tee mdpipe | pigz > $3.gz & + tee mdpipe < /dev/$1 | pigz > $3.gz & # Try using "--fast" next time md5sum -b mdpipe | tee -a md5checksum_expanded
Interesting concoction.
Not sure if that's much better in regards of performance, though, but obviously that saves 1x pass of reading + 1x pass of writing of the whole stuff. And of course you won't get the 'progress' output from dd. ;-)
Ok, trying the code now, as I have the environment ready, with a small 9G partition. [...] Unfortunately, it alternates reading and writing, too. I did another run yesterday, using my version but without compression, and did not alternate. So the culprit is pigz. I think that even when pigz is working alone it happens.
FWIW to avoid the fifo file, you could use the process substitution operator >(...) which newer shells support. Examples can be found in the tee documentation: https://www.gnu.org/software/coreutils/tee
Finally, you can also play with another compression algorithm/tool than pigz. E.g. zstd seems to be quite fast, and better in compression:
The following with pigz takes 2min49s here with sdb1 being a 20G file system (which is 80% used), and the resulting GZ file has 6.2G:
$ tee < /dev/sdb1 tee >( pigz > sdb1.gz ) | md5sum > sdb1.md5
while using zstd tool runs 1min17s and results in a 6.0G file:
$ tee < /dev/sdb1 tee >( zstd > sdb1.gz ) | md5sum > sdb1.md5
and using 'zstd --fast' has finished after 57s.
BTW: I haven't read the complete thread, but I think we can assume that the partition you're copying is not (or at least read-only) mounted, right?
Sure, not mounted. Ok, I will try zstd on my version in a minute. [...] several minutes, 9 gigs takes time [...] Huh, I was testing on a large partition, not the small one (bug in my script). With: tee mdpipe < /dev/$1 | pigz > $3.gz &" 2022-01-01 12:08:48+01:00 Starting dd: 2022-01-01 12:14:28+01:00 Ending compression phase: 2022-01-01 12:16:11+01:00 Ending verification phase and all: time: real 7m22.876s user 8m45.837s sys 0m43.694s does r/w alternation. File size: 2.1GiB (2.3GB) ('ls -lh' vs 'ls -l --si') with: dd if=/dev/$1 status=progress bs=16M | \ tee mdpipe | zstd > $3.zst &" No alternation! :-D No constant reading speed, though. Maybe varies with compression rate of the current chunks. (reading is from fast internal SSD; writing goes to external rotating rust over USB2) Writing is intermittent. Means that the compression is doing its goal, compensating the slow i/o speed by having to write less bytes. Very nice! :-D I forgot to check CPU load. 2022-01-01 12:41:22+01:00 Starting dd: 2022-01-01 12:44:32+01:00 Ending compression phase: 2022-01-01 12:45:50+01:00 Ending verification phase and all: time: real 4m27.691s user 3m28.830s sys 0m48.674s Size: 1.9GiB (2.1GB). Wonderful! I'll run it again to watch CPU load. [...] During compression, one core is almost at 100%, not constant, and the other is at 40%. I think the zstd manual says it uses one thread for compression and another for i/o. And I'm doing the md5sum at the same time. During verification, it reads nearly at 30MB/s and the two cores stay under 40%. zstd stays, at least when using USB2 transport and slow CPU. On a fast computer and USB3, I would have to reevaluate. Thanks for the idea! -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
On 01/01/2022 12.54, Carlos E. R. wrote:
On 31/12/2021 01.50, Bernhard Voelker wrote:
On 12/30/21 20:07, Carlos E. R. wrote:
...
with:
dd if=/dev/$1 status=progress bs=16M | \ tee mdpipe | zstd > $3.zst &"
No alternation! :-D
No constant reading speed, though. Maybe varies with compression rate of the current chunks.
(reading is from fast internal SSD; writing goes to external rotating rust over USB2)
Writing is intermittent. Means that the compression is doing its goal, compensating the slow i/o speed by having to write less bytes.
Very nice! :-D
I forgot to check CPU load.
2022-01-01 12:41:22+01:00 Starting dd: 2022-01-01 12:44:32+01:00 Ending compression phase: 2022-01-01 12:45:50+01:00 Ending verification phase and all:
time:
real 4m27.691s user 3m28.830s sys 0m48.674s
Size: 1.9GiB (2.1GB).
One more modification and test, after reading the manual a bit: dd if=/dev/$1 status=progress bs=16M | tee mdpipe | \ zstd --size-hint=$4 --adapt > $3.zst & --adapt[=min=#,max=#] zstd will dynamically adapt compression level to perceived I/O conditions. Com- pression level adaptation can be observed live by using command -v. Adaptation can be constrained between supplied min and max levels. The fea- ture works when combined with multi-threading and --long mode. It does not work with --single-thread. It sets window size to 8 MB by default (can be changed manually, see wlog). Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. note : at the time of this writing, --adapt can remain stuck at low speed when combined with multi- ple worker threads (>=2). --size-hint=# When handling input from a stream, zstd must guess how large the source size will be when optimizing compression parameters. If the stream size is rela- tively small, this guess may be a poor one, resulting in a higher compression ratio than expected. This feature allows for controlling the guess when needed. Exact guesses result in better compression ratios. Overestimates result in slightly degraded compression ratios, while underestimates may result in significant degradation. What it doesn't say is what units does the hint expect. I used "7G" and it did not complain. Now I have to consider how to calculate the size of a partition to feed that value automatically. 2022-01-01 13:34:42+01:00 Starting dd: 2022-01-01 13:37:02+01:00 Ending compression phase: 2022-01-01 13:39:37+01:00 Ending verification phase and all: real 4m55.131s user 1m25.702s sys 0m51.150s Size: 4.1GiB (4.3G) It actually takes more time to run, and the file size is doubled. Compression time is 2:20 (vs 3:10) Verification time is 2:35 (vs 1:18) Total time is 4:55 (vs 4:28) So, it is the verification that is much worse, the compression time is much better. But as the size is doubled, the read effort to do the verification suffers a lot, it is USB2. So, don't use --adapt. Instead, perhaps, try with different compression rations manually and compare the actual results - as I use a dedicated external disk for each computer that I backup, I can use tuned scripts for each. Or set min,max values for the hint switch - no, because the hint heuristics can not take into consideration the verification phase. (the default compression level is 3) -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2022-01-01 at 14:17 +0100, Carlos E. R. wrote:
On 01/01/2022 12.54, Carlos E. R. wrote:
On 31/12/2021 01.50, Bernhard Voelker wrote:
On 12/30/21 20:07, Carlos E. R. wrote:
...
So, it is the verification that is much worse, the compression time is much better. But as the size is doubled, the read effort to do the verification suffers a lot, it is USB2.
So, don't use --adapt. Instead, perhaps, try with different compression rations manually and compare the actual results - as I use a dedicated external disk for each computer that I backup, I can use tuned scripts for each.
Ok, doing some testing. The code is now: #!/bin/bash TIMESTAMPFILE=0_timestamp_dd_4 # Exit script on Control-C (signal 2) trap 'echo "Control-C pressed."; rm mdpipe ; exit 1;' 2 function hacer() { #parameters: device, estimated time, file name, device size, compression level DATE=`date --rfc-3339=seconds` SECS1=`date +%s` echo >> $TIMESTAMPFILE echo "Starting processing partition $1 ($2) to $3 (Level $5)" | tee -a $TIMESTAMPFILE echo " copying, compressing, and calculating md5..." | tee -a $TIMESTAMPFILE echo " $DATE Starting dd" | tee -a $TIMESTAMPFILE mkfifo mdpipe dd if=/dev/$1 status=progress bs=16M | tee mdpipe | zstd --size-hint=$4 -$5 > $3.zst & md5sum -b mdpipe | tee -a md5checksum_expanded wait rm mdpipe echo "$3" >> md5checksum_expanded SECS2=`date +%s` DATE=`date --rfc-3339=seconds` echo " $DATE Ending compression phase" | tee -a $TIMESTAMPFILE echo echo " Verifying..." zstd --test $3.zst DATE=`date --rfc-3339=seconds` SECS3=`date +%s` DIFF_COMP=`expr $SECS2 - $SECS1` DIFF_VERIF=`expr $SECS3 - $SECS2` DIFF_TOT=`expr $SECS3 - $SECS1` SIZE=`stat --printf="%s" $3.zst` SIZEM=`expr $SIZE / 1000000` echo " $DATE Ending verification phase and all ($DIFF_TOT = $DIFF_COMP + $DIFF_VERIF secs; $SIZEM MB)" | tee -a $TIMESTAMPFILE echo echo "·········" } echo "-----------" >> md5checksum_expanded DATE=`date --rfc-3339=seconds` SECS_TOT_1=`date +%s` echo | tee -a $TIMESTAMPFILE echo "==================" | tee -a $TIMESTAMPFILE echo "Start at $DATE" | tee -a $TIMESTAMPFILE echo "Doing test" time hacer sda9 "XX:YY" "sda9_Other_01" 7G 1 echo time hacer sda9 "XX:YY" "sda9_Other_02" 7G 2 echo ... time hacer sda9 "XX:YY" "sda9_Other_10" 7G 10 echo DATE3=`date --rfc-3339=seconds` SECS_TOT_2=`date +%s` DIFF_TOTAL=`expr $SECS_TOT_2 - $SECS_TOT_1` echo >> $TIMESTAMPFILE echo "End at $DATE3 (Total time= $DIFF_TOTAL S)" | tee -a $TIMESTAMPFILE There is no detection if the verification fails, don't know how to do that. Maybe an exit code from zstd? Could test for not zero. And the results are: (Level 1) (320 = 177 + 143 secs; 3027 MB) (Level 2) (318 = 176 + 142 secs; 3012 MB) <==== sweet point (Level 3) (335 = 195 + 140 secs; 3013 MB) (Level 4) (402 = 261 + 141 secs; 2851 MB) (Level 5) (431 = 303 + 128 secs; 2789 MB) (Level 6) (453 = 321 + 132 secs; 2785 MB) (Level 7) (458 = 325 + 133 secs; 2784 MB) (Level 8) (455 = 322 + 133 secs; 2784 MB) (Level 9) (600 = 469 + 131 secs; 2783 MB) (Level 10) (600 = 471 + 129 secs; 2783 MB) Of course, compressing a different partition with different files stored should alter the results a bit (sda9 is a small install of Leap 15.0 Beta, ext4 filesystem) Testing for verification failure: zstd --test $3.zst if [ ! $? ] then echo " Failed verification!" | tee -a $TIMESTAMPFILE fi Then I used "wxHexEditor sda9_Other_02.zst" and changed 2 bytes at random, and run the script skiping the writing phase. I get: Starting processing partition sda9 (XX:YY) to sda9_Other_02 (Level 2) copying, compressing, and calculating md5... 2022-01-01 22:47:26+01:00 Starting dd 2022-01-01 22:47:26+01:00 Ending compression phase Verifying... sda9_Other_02.zst : 2423 MB... sda9_Other_02.zst : Decoding error (36) : Destination buffer is too small 2022-01-01 22:47:58+01:00 Ending verification phase and all (done sda9 as sda9_Other_02 at level 2; 32 = 0 + 32 secs; 3012 MB) zstd prints an error to the screen, but the exitcode is still zero, no error detected. The importance is that my log file doesn't log the error; when compressing 9 partitions, if there is an error on the first one, it will flow out of the terminal and not be seen. So the function is now: function hacer() { #parameters: device, estimated time, file name, device size, compression level DATE=`date --rfc-3339=seconds` SECS1=`date +%s` echo >> $TIMESTAMPFILE echo "Starting processing partition $1 ($2) to $3 (Level $5)" | tee -a $TIMESTAMPFILE echo " copying, compressing, and calculating md5..." | tee -a $TIMESTAMPFILE echo " $DATE Starting dd" | tee -a $TIMESTAMPFILE mkfifo mdpipe dd if=/dev/$1 status=progress bs=16M | tee mdpipe | zstd --size-hint=$4 -$5 > $3.zst & md5sum -b mdpipe | tee -a md5checksum_expanded wait rm mdpipe echo "$3" >> md5checksum_expanded SECS2=`date +%s` DATE=`date --rfc-3339=seconds` echo " $DATE Ending compression phase" | tee -a $TIMESTAMPFILE echo echo " Verifying..." zstd --test $3.zst if [ ! $? ] then echo " Failed verification!" | tee -a $TIMESTAMPFILE fi DATE=`date --rfc-3339=seconds` SECS3=`date +%s` DIFF_COMP=`expr $SECS2 - $SECS1` DIFF_VERIF=`expr $SECS3 - $SECS2` DIFF_TOT=`expr $SECS3 - $SECS1` SIZE=`stat --printf="%s" $3.zst` SIZEM=`expr $SIZE / 1000000` echo " $DATE Ending verification phase and all (done $1 as $3 at level $5; $DIFF_TOT = $DIFF_COMP + $DIFF_VERIF secs; $SIZEM MB)" | tee -a $TIMESTAMPFILE echo echo "·········" } - -- Cheers, Carlos E. R. (from openSUSE 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYdDO6hwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfViwYAn0UI+P/gi2/beuteXG0j t865eGexAJsH+04xyB7nIzfEP/IaZn+Dut2fNw== =STvO -----END PGP SIGNATURE-----
On 1/1/22 14:17, Carlos E. R. wrote:
--size-hint=# When handling input from a stream, zstd must guess how large the source size will be when optimizing compression parameters.
yes, that zstd parameter seems much useful with the input from a pipe. Did you try without the redundant dd(1) - and instead let tee(1) read from the partition? Just out of curiosity ... Have a nice day, Berny
On 03/01/2022 00.54, Bernhard Voelker wrote:
On 1/1/22 14:17, Carlos E. R. wrote:
--size-hint=# When handling input from a stream, zstd must guess how large the source size will be when optimizing compression parameters.
yes, that zstd parameter seems much useful with the input from a pipe.
Did you try without the redundant dd(1) - and instead let tee(1) read from the partition? Just out of curiosity ...
Yes, some mails before, the one posted at 12:54 CET It works, but the disk i/o alternates, is not constant. The culprit was pigz. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
On 1/3/22 10:30, Carlos E. R. wrote:
On 03/01/2022 00.54, Bernhard Voelker wrote:
Did you try without the redundant dd(1) - and instead let tee(1) read from the partition? Just out of curiosity ...
It works, but the disk i/o alternates, is not constant. The culprit was pigz.
I see, thanks. Have a nice day, Berny
On 25/12/2021 15.16, Carlos E. R. wrote:
Hi,
I use this bash code (script) to image partitions - I arrived there with help here:
I just did a run with a modification (for real use):
function hacer() { echo echo "Doing partition $1 ($2) on $3" echo "copying, compressing, and calculating md5..." mkfifo mdpipe dd if=/dev/$1 status=progress bs=16M | tee mdpipe | pigz > $3.gz &
dd if=/dev/$1 status=progress bs=16M | tee mdpipe | > $3.img &
md5sum -b mdpipe | tee -a md5checksum_expanded wait rm mdpipe echo "$3" >> md5checksum_expanded
echo echo "·········" }
Used this way:
time hacer sda8 "XX:YY H" "sda_8_Home" I did it that way, to image without compression a 252G partition (I thought it would be faster without compression, and intended to delete the image afterwards). Source on SSD, writing on rotating rust over USB2. It took 2:27 hours, or 147 minutes. 30..33MB/S, continuous, *no alternating*. Actually, that external disk is mounted as btrfs with compression and encryption, so there is always some compression: /dev/mapper/cr_data on /data type btrfs (rw,relatime,lazytime,compress=zlib:3,space_cache,subvolid=5,subvol=/) Erebor:/data/Portatil_entero_9.1 # compsize --bytes sda_8_Home.img Type Perc Disk Usage Uncompressed Referenced TOTAL 99% 269950705664 270011826176 269657550848 none 100% 269859127296 269859127296 269505048576 zlib 59% 91578368 152698880 152502272 Erebor:/data/Portatil_entero_9.1 # Then I did a restore, of the files, with rsync (because I am reformatting the disk with different filesytem): md theloop mount -o ro,loop sda_8_Home.img theloop/ md thehome mount -v /dev/sda8 thehome/ resulting in these mounts: /dev/sda8 on /data/Portatil_entero_9.1/thehome type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /data/Portatil_entero_9.1/sda_8_Home.img on /data/Portatil_entero_9.1/theloop type reiserfs (ro,relatime) And restoring files with: time rsync --archive --acls --xattrs --hard-links --sparse \ --partial-dir=.rsync-partial --stats --human-readable \ --checksum /data/Portatil_entero_9.1/theloop/ \ /data/Portatil_entero_9.1/thehome resulting in these 'fs': dev/sda8 252G 179G 73G 72% /data/Portatil_entero_9.1/thehome /dev/loop0 252G 180G 73G 72% /data/Portatil_entero_9.1/theloop With this screen output: 2021-12-31 21:39:15+01:00 Empezando rsync: Number of files: 552,872 (reg: 529,077, dir: 21,541, link: 2,214, special: 40) Number of created files: 552,871 (reg: 529,077, dir: 21,540, link: 2,214, special: 40) Number of deleted files: 0 Number of regular files transferred: 462,412 Total file size: 191.62G bytes Total transferred file size: 191.41G bytes Literal data: 191.41G bytes Matched data: 0 bytes File list size: 30.21M File list generation time: 0.250 seconds File list transfer time: 0.000 seconds Total bytes sent: 191.51G Total bytes received: 12.12M sent 191.51G bytes received 12.12M bytes 11.06M bytes/sec total size is 191.62G speedup is 1.00 real 288m29.496s user 31m20.928s sys 24m50.115s End at 2022-01-01 02:27:45+01:00 The point here is how much slower is the rsync restore (288 min or 2:30H) than the image write (147, or 4:48H), so an image backup/restore can be faster than an rsync strategy. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
On 25/12/2021 15.16, Carlos E. R. wrote:
Hi,
I use this bash code (script) to image partitions - I arrived there with help here:
Testing on my "powerful" desktop machine. Current version: mkfifo mdpipe dd if=/dev/$1 status=progress bs=16M | tee mdpipe | \ zstd --size-hint=$4 -$5 > $3.zst & md5sum -b mdpipe | tee -a md5checksum_expanded wait rm mdpipe $1 $2 $3 $4 $5 nvme0n1p2 "23m" "nvme0n1p2__nvme-swap" 100G 4 nvme0n1p5 "16m" "nvme0n1p5__nvme-main" 150G 3 Write speed was: 107374182400 bytes (107 GB, 100 GiB) copied, 1106.29 s, 97.1 MB/s 161059176448 bytes (161 GB, 150 GiB) copied, 705.491 s, 228 MB/s Then I did a comparison using dd alone, no compression; I got these speeds: 107374182400 bytes (107 GB, 100 GiB) copied, 317.29 s, 338 MB/s 161059176448 bytes (161 GB, 150 GiB) copied, 265.229 s, 607 MB/s Now, notice the much faster write speed on the destination without using zstd compression (writing to rotating rust). This is a first. The differences are a powerful CPU (AMD Ryzen 5 3600X 6-Core Processor), on Leap 15.3 (previous testing were old laptops and Leap 15.2). Source disk is M.2 nvme "disk", destination is rotating rust over USB3, running LUKS encrypted and compressed btrfs partition on 15.3. mount output: /dev/mapper/cr_backup on /backup type btrfs (rw,relatime,compress=zlib:3,space_cache,subvolid=5,subvol=/) Previous testing were Leap 15.2, SSD source, destination USB2/USB3, running LUKS encrypted and compressed btrfs partition on 15.2. Obviously, that I get a constant write speed above 150MB/S (the hardware maximum for rotating rust) has to be due to the effective btrfs compression, something that did not happen on my other machines. Although: Erebor4:~ # hdparm -tT /dev/sdb4 /dev/sdb4: Timing cached reads: 27872 MB in 2.00 seconds = 13952.21 MB/sec Timing buffered disk reads: 604 MB in 3.00 seconds = 201.21 MB/sec Erebor4:~ # Compression ratios obtained: Erebor4:/backup/images/001 # compsize nvme0n1p2__nvme-swap.img Processed 1 file, 609160 regular extents (609160 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 47% 47G 100G 100G none 100% 31G 31G 31G zlib 23% 16G 68G 68G Erebor4:/backup/images/001 # compsize nvme0n1p2__nvme-swap.zst Processed 1 file, 311 regular extents (311 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 34G 34G 34G none 100% 34G 34G 34G Erebor4:/backup/images/001 # Erebor4:/backup/images/001 # compsize nvme0n1p5__nvme-main.img Processed 1 file, 1025001 regular extents (1025001 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 26% 40G 149G 149G none 100% 27G 27G 27G zlib 10% 12G 122G 122G Erebor4:/backup/images/001 # compsize nvme0n1p5__nvme-main.zst Processed 1 file, 4342 regular extents (4342 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 99% 33G 33G 33G none 100% 33G 33G 33G zlib 14% 11M 74M 74M Erebor4:/backup/images/001 # man says: The fields above are: Type compression algorithm Perc disk usage/uncompressed (compression ratio) Disk Usage blocks on the disk; this is what storing these files actually costs you (save for RAID considerations) Uncompressed uncompressed extents; what you would need without compression - includes deduplication savings and pinned extent waste Referenced apparent file sizes (sans holes); this is what a traditional filesystem that supports holes and efficient tail packing, or tar -S, would need to store these files What it does not explain are the TOTAL, none, zlib rows. I think we have to look at the "TOTAL" rows. Compression is less than what "zstd 3" gets, but if the goal is speed, then it is better without zstd (which also facilitates recovery). At least on a powerful computer running 15.3. So, the results can vary a lot per machine. Maybe the kernel varies the btrfs compression effort depending on CPU? -- Cheers / Saludos, Carlos E. R. (from oS Leap 15.3 x86_64 (Erebor-4))
On 18/01/2022 15.13, Carlos E. R. wrote:
On 25/12/2021 15.16, Carlos E. R. wrote:
... (btrfs compression)
What it (the man) does not explain are the TOTAL, none, zlib rows.
I think we have to look at the "TOTAL" rows. Compression is less than what "zstd 3" gets, but if the goal is speed, then it is better without zstd (which also facilitates recovery). At least on a powerful computer running 15.3.
So, the results can vary a lot per machine.
Maybe the kernel varies the btrfs compression effort depending on CPU?
Something else: it seems that the kernel continues working on compression after the file is written to disk. For example, I imaged sdc8 which is xfs, holding /opt. Compsize run immediately after writing the image said: Type Perc Disk Usage Uncompressed Referenced TOTAL 12% 1.1G 9.6G 9.6G but hours later said: TOTAL 8% 4.3G 50G 50G On sdd9, ext4, I got: TOTAL 44% 28G 63G 63G ... TOTAL 44% 28G 64G 64G -- Cheers / Saludos, Carlos E. R. (from oS Leap 15.3 x86_64 (Erebor-4))
participants (4)
-
Bernhard Voelker
-
Carlos E. R.
-
David T-G
-
Per Jessen