Per Jessen wrote:
Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Monday, 2009-09-14 at 08:28 +0200, Per Jessen wrote:
That is only a variation of (b) above - if you can't trust your backup procedure, a missing backup not your primary problem.
Well, what I mean is that a compressed tar is not reliable as a backup procedure. I don't know what is reliable in Linux, but tar isn't. It is much less reliable than, for example, the old pctools backup from central point software was twenty years ago.
Okay, that's an interesting point - I don't think I have heard anyone complain about tars reliability before. What do you see as a more reliable tool/utility then?
It's not that tar has an unsuspected achilles heel or weakness.** Tar is fine. It's just that tar lacks any extra strength. Tar is fine as long as the underlying media is perfect. The comparison is, the iso9660 and udf filesystems contain extra data similar to the mnp5 error correcting stream that modems use, or the zmodem & kermit filetransfer protocols. (I would say also like raid5 but the striping across multiple disks part of raid5 doesn't apply and may confuse the issue) They can suffer some amount of media / communication errors without losing the payload data, and if there is too much media/trasmission loss to recreate the data from the ECC data, they at least detect and know which bits are garbage instead of delivering them as good, and recover and resume as soon as the media/transmission clears up. Because, CD's, DVD's, telephone lines, even local serial lines, are not perfect media by a long shot. In fact all the low-level protocols for networking and media storage have some form of error detection and correction built-in, so that you the user or higher-level protocol do not have to worry about it. But, backups and tars are a special case. Sometimes you treat a tar or a compressed tar as a simple file in a filesystem. In that case, your "media" is perfect. If it's not, as was said, you have bigger problems. But really the media isn't in fact perfect. Merely the disk has automatic bad block detection and remapping, and the raid controller or software raid maintains extra data to detect and transparently correct media errors so that you never see them at the application level. Similarly, tcp/ip garantees that you at the application level never see any of the garbage that's actually on the wire. the network cards and the tcp/ip stack all do whatever amount of checksumming and retrying is required transparently and present you with a magically perfect communication channel. In these contexts, you can simply handle plain data any way you want and always trust the media 100%. Things like compression aren't dangerous because the media is perfect. But, in the particular case of backups, you also sometimes write these things directly to raw media. In that case everything changes drastically. Now your file better have redundancy and error compensation built in because there is no magic lower layer protecting your data from the imperfect media. All the potential problems that compression can amplify, apply to encryption exactly the same. If you restore a plain file with some garbage in the middle, you can probably still actually use that file and the customer would very much rather have 99% of his database file with a few bad records instead of nothing. But, (in the simple case) encrypt that same file and then inflict the same exact damage on it, and it's impossible to decrypt it. You've lost the entire file. ...Unless you take special steps to mitigate exactly that scenario, and you make your encrypter or compressor work with defined and self-contained blocks of data so that you can lose a block and yet still decrypt the rest, or embed some other form of error compensation. It's not quite as bad as that though. At least for simple compression, bzip2, gzip, and plain old zip all have recovery tools that can sometimes salvage all the good parts of a damaged compressed tar. As for tools designed to embed data to assist in recovering from or at least minimizing loss of data due to media errors, including in concert with compression: In the free world right off the top of my head, dar and xar. And dar just uses parchive which you could use seperately with something else too. And any of the official backup programs (amanda, zmanda, bru, bacula, arkeia, ...?) And basically any of the commercial programs. Years and years ago there was a simple thing called ctar that was some free code on usenet, that a few different companies went on to enhance and build long-running commercial products on, (ctar, lone-tar, backupedge, (at least?), ctar is no longer sold, but lonetar and backupedge are, and are far far advanced since then) . ctar had a very simple form of compression error mitigation which was the simple fact that when you told ctar to use compression, what it did was it compressed each file individually, seperately, inside the archive, while the archive itself was not technically compressed. The archive was essentially a plain old tar, well, enhanced but compatible, with every file inside compressed with "compress" ("ncompress" today), with no .Z extension added to the filenames. You could unpack a ctar archive with most ordinary tar programs, you'd just have to uncompress each file yourself afterwards. The point is though that even that very simple inversion of the normal process mitigated loss by the fact that, in that kind of archive, when the media was damaged, you only lost as little as one affected file, because the compression only spanned one file at a time. **Actually tar does have one problem in that it doesn't handle all forms of files. It goes beyond the already mentioned ACL's & other extended attributes. Most tar implimentations can't even handle fifo's, device nodes, hard links, sometimes not even symlinks. One reason certain things use cpio many times is because most systems' cpio can, even very old systems. Star is one free tar implementation that can handle all that stuff, and lots of other extended attributes, and long path & filenames, but not every possible new kind of extended attribute or metadata. It's been around long enough that it exists on most commonly used platforms so it's reasonably safe to use it's enhanced features even if portability is a concern. you just have to be willing to get star for any target system. It makes every attempt to be backwards compatible with ordinary tar imlementations so that if unpacked with a plain tar, you will get most or all of your data, just not any acls or other extended attributes that your tar program doesn't understand. Backupedge, ctar, and lonetar are all like that too. But Star doesn't have the stuff that xar/dar/parchive does for error-correction, I was still just talking about extended attributes and long filenames and stuff here. wow sorry for the book... -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org