-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2009-09-15 at 11:04 -0400, Brian K. White wrote:
Per Jessen wrote:
Carlos E. R. wrote:
Well, what I mean is that a compressed tar is not reliable as a backup procedure. I don't know what is reliable in Linux, but tar isn't. It is much less reliable than, for example, the old pctools backup from central point software was twenty years ago.
Okay, that's an interesting point - I don't think I have heard anyone complain about tars reliability before. What do you see as a more reliable tool/utility then?
I do not know :-( The old Pcbackup for MSDos from pctools was reliable (per the definition below), but had other problems, like incompatible version upgrades. And of course, it is closed source, and probably not available. I would like to find something similar for Linux, and free. An example: I have an old backup made in a set of about 80 floppies (the 360 KB type) which is still fully retrievable, even though it has read errors: the software correct them.
It's not that tar has an unsuspected achilles heel or weakness.** Tar is fine. It's just that tar lacks any extra strength. Tar is fine as long as the underlying media is perfect.
Exactly. That's the point. There is no integrity check, there is no "forward" error correction...
In fact all the low-level protocols for networking and media storage have some form of error detection and correction built-in, so that you the user or higher-level protocol do not have to worry about it. But, backups and tars are a special case.
Exactly.
Sometimes you treat a tar or a compressed tar as a simple file in a filesystem. In that case, your "media" is perfect. If it's not, as was said, you have bigger problems. But really the media isn't in fact perfect. Merely the disk has automatic bad block detection and remapping, and the raid controller or software raid maintains extra data to detect and transparently correct media errors so that you never see them at the application level. Similarly, tcp/ip garantees that you at the application level never see any of the garbage that's actually on the wire. the network cards and the tcp/ip stack all do whatever amount of checksumming and retrying is required transparently and present you with a magically perfect communication channel. In these contexts, you can simply handle plain data any way you want and always trust the media 100%. Things like compression aren't dangerous because the media is perfect.
Exactly.
But, in the particular case of backups, you also sometimes write these things directly to raw media. In that case everything changes drastically. Now your file better have redundancy and error compensation built in because there is no magic lower layer protecting your data from the imperfect media.
Exactly :-)
All the potential problems that compression can amplify, apply to encryption exactly the same. If you restore a plain file with some garbage in the middle, you can probably still actually use that file and the customer would very much rather have 99% of his database file with a few bad records instead of nothing. But, (in the simple case) encrypt that same file and then inflict the same exact damage on it, and it's impossible to decrypt it. You've lost the entire file.
Exactly again :-)
...Unless you take special steps to mitigate exactly that scenario, and you make your encrypter or compressor work with defined and self-contained blocks of data so that you can lose a block and yet still decrypt the rest, or embed some other form of error compensation.
It's not quite as bad as that though. At least for simple compression, bzip2, gzip, and plain old zip all have recovery tools that can sometimes salvage all the good parts of a damaged compressed tar.
I understand that zip has verification and perhaps error recovery fields, and the compression is an integral part of the archive. But not targz. I have heard of cases where a single error destroyed the possibility of uncompressing the archive, and a large part if not all was unrecoverable. There are independent tools to attempt repair, but they do not always work.
As for tools designed to embed data to assist in recovering from or at least minimizing loss of data due to media errors, including in concert with compression: In the free world right off the top of my head, dar and xar. And dar just uses parchive which you could use seperately with something else too. And any of the official backup programs (amanda, zmanda, bru, bacula, arkeia, ...?) And basically any of the commercial programs.
I wonder why not on free software. ...
The archive was essentially a plain old tar, well, enhanced but compatible, with every file inside compressed with "compress" ("ncompress" today), with no .Z extension added to the filenames. You could unpack a ctar archive with most ordinary tar programs, you'd just have to uncompress each file yourself afterwards.
The point is though that even that very simple inversion of the normal process mitigated loss by the fact that, in that kind of archive, when the media was damaged, you only lost as little as one affected file, because the compression only spanned one file at a time.
True. There was a script included with SuSE some versions back that did that with cpio or afio, I think. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqv6ZIACgkQtTMYHG2NR9UwogCfXR8gYhbG8XHu9waFdp1QXErB j5UAnRwo1QW0XKtWS6jg/eV2JeUtQwIT =B6JF -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org