On Sat, August 30, 2008 21:28, Linda Walsh wrote:
The real Question (missing feature in Linux?):
Windows (NT-based) does have transparent, on-the-fly compression that can be enabled on NTFS. The transparent compress/decompress works with with all apps. If one enables compression for a volume, NT will look for "opportunities" to save space on on fixed-block sizes that are 16 times the Cluster size up to a max "compression unit", "CU", of 64k. This implicitly makes 4k the largest cluster size that will still allow file compression to be enabled on NTFS.
NT requires that compression save at least 1 cluster, or it doesn't bother, but a file of all one's, for example, could be stored in about 1/16th the cluster space of a non-compressed file (a file of all 'zero's, can get better savings with "sparse files"). But this means each integral "CU" is checked for possible compression savings -- so data files might have some ranges compressed, but not others. The fact that NT implements compression in "CU"-sized chunks means there is very little hit in speed for random-access. Obviously, the speed hit, will "vary" by the ratio of CPU-power:disk speed...with a speed hit being more noticeable if the drive is a 15K RPM SAS(scsi)-based RAID compared to a slower IDE-based non-raid systems.
On linux, I don't know of an implementation that does this as transparently as NT does with NTFS. Since the clusters are marked 'compressed' in the file system, there likely has to be some flag available on a "per-CU" basis in the NTFS meta information. On a data-read of a compressed section, NTFS decompresses the contents of the CU AND allocates the full, uncompressed file-space needed by the "CU" as *backing store* for the data that's decompressed into the block buffer. If the data isn't modified, the backing store doesn't get written to, so the CU stays compressed. If the user modifies the data, the modified data can be automatically written back to disk into the sectors that were allocated for backing store of the memory mapped file. At *some* point, (in writing), NT will try to compress the data in the CU to check for space savings of a cluster or more.
Anyway -- I suppose the actual linux implementation is left as an exercise for the user...? :-)
You should take a look at ZFS which implements LZJB. From the Wikipedia article on ZFS: Variable block sizes ZFS uses variable-sized blocks of up to 128 kilobytes. The currently available code allows the administrator to tune the maximum block size used as certain workloads do not perform well with large blocks. Automatic tuning to match workload characteristics is contemplated.[citation needed] If data compression (LZJB) is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations). ZFS is currently available in OpenSolaris, BSD and Mac OS X. Because of a license issue, ZFS cannot be implemented in the Linux kernel, but technically it would be a Simple Matter Of Programming. However there is a FUSE (yes, FUSE again) implementation of ZFS. -- Amedee -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org