Mailinglist Archive: opensuse (1606 mails)

< Previous Next >
Re: [opensuse] Re: Transparent content compression for space savings on linux like on NTFS?
  • From: "Amedee Van Gasse" <amedee@xxxxxxxxx>
  • Date: Mon, 1 Sep 2008 13:10:54 +0200 (CEST)
  • Message-id: <18042.193.121.250.194.1220267454.squirrel@xxxxxxxxxxxxxxxx>
On Sat, August 30, 2008 21:28, Linda Walsh wrote:

The real Question (missing feature in Linux?):

Windows (NT-based) does have transparent, on-the-fly compression that
can be enabled on NTFS. The transparent compress/decompress works with
with all
apps. If one enables compression for a volume, NT will look for
"opportunities"
to save space on on fixed-block sizes that are 16 times the Cluster size
up to a max "compression unit", "CU", of 64k. This implicitly makes 4k
the
largest cluster size that will still allow file compression to be enabled
on NTFS.

NT requires that compression save at least 1 cluster, or it doesn't
bother, but a file of all one's, for example, could be stored in about
1/16th
the cluster space of a non-compressed file (a file of all 'zero's, can get
better
savings with "sparse files"). But this means each integral "CU" is
checked
for possible compression savings -- so data files might have some ranges
compressed,
but not others. The fact that NT implements compression in "CU"-sized
chunks
means there is very little hit in speed for random-access. Obviously, the
speed
hit, will "vary" by the ratio of CPU-power:disk speed...with a speed hit
being
more noticeable if the drive is a 15K RPM SAS(scsi)-based RAID compared to
a slower
IDE-based non-raid systems.

On linux, I don't know of an implementation that does this as
transparently
as NT does with NTFS. Since the clusters are marked 'compressed' in the
file
system,
there likely has to be some flag available on a "per-CU" basis in the NTFS
meta
information. On a data-read of a compressed section, NTFS decompresses
the
contents of the CU AND allocates the full, uncompressed file-space needed
by the
"CU" as *backing store* for the data that's decompressed into the block
buffer.
If the data isn't modified, the backing store doesn't get written to, so
the
CU stays compressed.
If the user modifies the data, the modified data can be automatically
written
back to disk into the sectors that were allocated for backing store of the
memory mapped
file. At *some* point, (in writing), NT will try to compress the data in
the CU
to check for space savings of a cluster or more.

Anyway -- I suppose the actual linux implementation is left as an
exercise for
the user...? :-)

You should take a look at ZFS which implements LZJB. From the Wikipedia
article on ZFS:


Variable block sizes

ZFS uses variable-sized blocks of up to 128 kilobytes. The currently
available code allows the administrator to tune the maximum block size
used as certain workloads do not perform well with large blocks. Automatic
tuning to match workload characteristics is contemplated.[citation needed]

If data compression (LZJB) is enabled, variable block sizes are used. If a
block can be compressed to fit into a smaller block size, the smaller size
is used on the disk to use less storage and improve IO throughput (though
at the cost of increased CPU use for the compression and decompression
operations).


ZFS is currently available in OpenSolaris, BSD and Mac OS X.
Because of a license issue, ZFS cannot be implemented in the Linux kernel,
but technically it would be a Simple Matter Of Programming. However there
is a FUSE (yes, FUSE again) implementation of ZFS.

--
Amedee

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >
This Thread
  • No further messages