Hi Istvan: Saw this yesterday, didn't have time to respond. But ... On Saturday 25 February 2006 07:37, Istvan Gabor wrote: ...
I used the 370 MB rar archive that resulted in a 3.2 GB tar archive after extraction. As I know tar doesn't compress so I suppose that the final also should be around 3.2 GB. Am I right or wrong?
You are right that tar does not compress, but there is something else going on here (other than inodes). ext2/3 are conventional filesystems in the way they allocate space to files: the minimum "chunk" of space that can be allocated to a file is a cluster, and a cluster can only be allocated to one file at a time. If your cluster size is 2048, a 1-byte file takes one cluster and a 2049 byte file takes 2 clusters. The overhead in both cases is the same (2047 bytes), but the _percentage_ overhead in the first case is much higher. So if your tar file contains a lot of ~200 byte files and your cluster size is 2K, you will get an overhead of ~900% for those files (not counting inodes), which is in line with what you reported. [Note: all this unallocated space at the end of clusters is not overwritten; it just contains whatever was there before the cluster was unallocated when some other file was deleted. This is one of the places that forensic recovery tools look for data.] Reiserfs takes a different approach: it "stuffs" the ends of clusters, so it is much more efficient in its use of available diskspace for small files. This explains some of the benefit that you saw. Warm regards, Robert