[opensuse-es] [OT] Se viene un nuevo FS con el kernel 2.6.30: NILFS
Salió en Linux Magazine, les pego el texto en ingles para los que no puedan acceder, pero segun parece, su mejor perfomance está en los almacenamientos de estado sólido SSD:: http://www.linux-mag.com/cache/7345/1.html NILFS: A File System to Make SSDs Scream The 2.6.30 kernel is chock full of next-gen file systems. One such example is NILFS, a new log-structured file system that dramatically improves write performance It’s difficult to write storage articles at this time and not focus on the upcoming 2.6.30 kernel. Why? This kernel is loaded with a number of new file systems — some of which we’ve already covered, like ext4 and btrfs. Another of the hot new file systems that is in 2.6.30 is NILFS. This file system is definitely one that you should be testing. NILFS2 (New Implementation of a Log-Structured File System Version 2) is a very promising new log-structured file system that has continuous snapshots and versioning of the entire file system. This means that you can recover files that were deleted or unintentionally modified as well as perform backups at any time from a snapshot without a performance penalty normally associated with creating snapshots. In addition, there is evidence that NILFS has extremely good performance on SSD drives. Log-Structured File System? Log-Structured File Systems are a bit different than other file systems with both good points and bad points. Rather than write to a tree structure such as a b-tree or an h-tree, either with or without a journal, a log-structured file system writes all data and metadata sequentially in a continuous stream that is called a log (actually it is a circular log). The concept was developed by John Ousterhout of TCL fame and Fred Douglis. The motivation behind log-structured file systems is that typical file systems lay out data based on spatial locality for rotating media (hard drives). But rotating media tends to have slow seek times limiting write performance. In addition, it was presumed that most IO would become write dominated (this observation is supported by a study that was summarized in a recent article). So a log-structured file system takes a new approach and treats the file system as a circular log and writes sequentially to the “head” of the log (the beginning) never over writing the existing log. This means that seeks are kept to a minimum because everything is sequential, improving write performance. A log-structured file system, because of its design, makes it very easy to create snapshots (in NILFS they are called checkpoints) of both the data and metadata. NILFS can then mount these checkpoints (or snapshots) along side the primary NILFS file system. From these checkpoints, you can recover erased files (if the checkpoint has a date and time prior to when the file was erased) or you can use it for backups or even disaster recovery images. Another benefit of log-structured file systems is that recovering from a crash is easier than the more typical tree based file systems (e.g. ext2, ext3, etc.). After a log-structured file system crashes, when it is remounted it can reconstruct its state from the last consistent point in the log. It starts at the head of the circular log and backs up until the file system is consistent. This point should be very close to the head so little if any data or metadata will be lost. This process is extremely fast regardless of the size of the file system. This bears repeating - a log-structured file system recovers from a crash extremely fast and the amount of time is independent of the size of the file system. In contrast, other file systems have to replay their journal and possibly even walk their data structures to make sure the file system is consistent (i.e. run “fsck”). Everyone who has run fsck on a very large file system knows how much time it can take. One problematic aspect of log-structured file systems is that they need to include a fairly sophisticated capability of “garbage collection” to reclaim free space. Free space needs to be reclaimed from the tail of the log, primarily the old check points, so that the file system doesn’t become full when the head of the log wraps around to the tail. There are many techniques for reclaiming space, one is covered in the Wikipedia article about log-structured file systems. The garbage collection process reclaims space from the check points (snap shots) otherwise the file system would fill far too quickly. A Log Structured File System for Linux - NILFS The Nippon Telephone and Telegraph (NTT) CyberSpace Laboratories has been developing NILFS (also referred to as NILFS2 since it is the version 2 of the file system) for Linux. It is released under the GPL 2.0 license and is included in the 2.6.30 kernel. It spent a great deal of time in the -mm kernels and under went much testing since it’s initial announcement. One of the most noticeable features of NILFS is that it can “continuously and automatically save instantaneous states of the file system without interrupting service”. NILFS refers to these as checkpoints. In contrast, other file systems such as ZFS, can provide snapshots but they have to suspend operation to perform the snapshot operation. NILFS doesn’t have to do this. The snapshots (checkpoints) are part of the file system design itself. One of the really cool features of NILFS is that these checkpoints can actually be mounted along side the primary file system. This has many, many uses, one of which is to mount a checkpoint to recover files that were unintentionally erased. In addition to being able to recover recently erased files and extremely fast crash recovery times, there are a number of other features of NILFS that are very attractive: * The file size and inode numbers are stored as 64-bit fields * File sizes of up to 8 EiB (Exbibyte - approximately an Exabyte) * Block sizes that are smaller than a page size (i.e. 1KB-2KB). This can potentially make NILFS much faster for small files than other file systems. * File and inode blocks use a B-tree (the use of B-trees in a log-structured file system stems from the implementation which use something called segments) * NILFS uses 32-bit checksums (CRC32) on data and metadata for integrity assurance * Correctly ordered data and meta-data writes * Redundant superblock * Read-ahead for meta data files as well as data files (helps read performance) * Continuous check pointing which can be used for snapshots. These can be used for backups or they can even be used for recovering files. Checkpoints and Snapshots One of the features that users can really enjoy with NILFS is the ability to recover erased or modified files. NILFS creates a checkpoint “every few seconds or per synchronous write basis (unless there is no change).” (from the kernel documentation). Then the user can select a checkpoint and convert it into a snapshot. These snapshots are preserved until they are converted back into checkpoints. Checkpoints are not preserved for the life of the file system and after a period of time the garbage collection process will recover the space in the checkpoint. This means that users can’t recover files from a long time in the past. But there is no limit to the number of snapshots that can be created - at least until the file system volume becomes full. There are many uses for the snapshots including recovery of erased or modified files or they can be used by administrators for backups. There are a few user-space commands that help with check points and snapshots. From the NILFS web site is an explanation of the process and is paraphrased here. The first step is to list the check points using the lscp command. $ lscp CNO DATE TIME MODE SKT NBLKINC ICNT 1 2008-05-08 14:45:49 cp - 11 3 2 2008-05-08 14:50:22 cp - 200523 81 3 2008-05-08 20:40:34 cp - 136 61 4 2008-05-08 20:41:20 cp - 187666 1604 5 2008-05-08 20:41:42 cp - 51 1634 6 2008-05-08 20:42:00 cp - 37 1653 7 2008-05-08 20:42:42 cp - 272146 2116 8 2008-05-08 20:43:13 cp - 264649 2117 9 2008-05-08 20:43:44 cp - 285848 2117 10 2008-05-08 20:44:16 cp - 139876 7357 Notice that the output of lscp lists the date and time of the check points. Under the column labeled “MODE” is either a “cp”, that stands for “check point”, or “ss” that stands for “snap shot.” If a user does not want to wait for a check point and wants to create one immediately, the mkcp command could be used. In general you need to tell mkcp the device containing a NILFS file system otherwise it searches /proc/mounts for NILFS file systems. To take a check point and create a snap shot, one uses the mkcp command again. In this case, one uses the command mkcp -s to create the snapshot from an existing checkpoint. You can also use the chcp command that changes a check point into a snap shot or vice versa. Again, from the NILFS website is an example of creating a snapshot. $ sudo chcp ss 2 $ lscp CNO DATE TIME MODE SKT NBLKINC ICNT 1 2008-05-08 14:45:49 cp - 11 3 2 2008-05-08 14:50:22 ss - 200523 81 3 2008-05-08 20:40:34 cp - 136 61 4 2008-05-08 20:41:20 cp - 187666 1604 5 2008-05-08 20:41:42 cp - 51 1634 6 2008-05-08 20:42:00 cp - 37 1653 7 2008-05-08 20:42:42 cp - 272146 2116 8 2008-05-08 20:43:13 cp - 264649 2117 9 2008-05-08 20:43:44 cp - 285848 2117 10 2008-05-08 20:44:16 cp - 139876 7357 11 2008-05-08 21:05:23 cp - 10 7357 Notice that the chcp command changes the second check point into a snap shot. This is indicated under the “MODE” column where the second check point is listed as “ss” or snap shot. Now that the check point is a snap shot, it won’t be deleted during the garbage collection. However, you can remove the snap shot by using the rmcp command. NILFS implements garbage collection in a unique way. It uses a user-space daemon to perform the garbage collection. This daemon is activated when the file system is mounted via the “mount” command. This also means that garbage collection can be activated at any time (if the file system is mounted). Don’t forget that NILFS will delete check points after a certain period of time unless the check point is converted to a snap shot. The amount of time when the check point is held before being deleted is controlled by parameters in the /etc/nilfs_cleanerd.conf file. You can adjust the garbage collection (GC) parameters in the file and restart the GC daemon so that the new parameter values are used (or unmouning and remounting the file system). You must have root access or at least sudo ability to mount a snap shot. Also recall that snap shots are mounted as read-only. From the NILFS web site example, one can mount the snapshot previously created (it was created from the second check point) # mount -t nilfs2 -r -o cp=2 /dev/sdb1 /nilfs-cp # df -t nilfs2 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 71679996 3203068 64888832 5% /nilfs /dev/sdb1 71679996 3203068 64888832 5% /nilfs-cp # mount -t nilfs2 /dev/sdb1 on /nilfs type nilfs2 (rw,gcpid=13296) /dev/sdb1 on /nilfs-cp type nilfs2 (ro,cp=2) The snap shot is mounted on /nilfs-cp in a read-only mode (ro). Depending upon the options and permissions on the /nilfs-cp mount point, users could copy files from the snap shot. Alternatively, the root user could restore the file(s)s for the user. Also, the snap shot could be easily used for creating back-ups. An administrator could also use the snap shot for creating a disaster recovery image of the file system. Just as a reminder the mounted snap shot, while being a NILFS file system, is mounted read-only so check points of the snapshot are not created. After you are finished with the snap shot don’t forget to unmount it and either delete the snap shot or convert it back to a check point and allow garbage collection to recover the space. Speed, Glorious Speed Recall that one reason log-structured file systems were developed was to increase write performance (assuming the read performance would be dominated by caching effects). And who doesn’t like increased write performance? One of the earliest reviews of NILFS was in 2007 by Chris Samuel. He did a very comprehensive review of Emerging File Systems(how prescient was that review?). He did a very nice review of a number of file systems including NILFS including running benchmarks. The performance was good for such a young file system but even at that time it had the best performance by far for Sequential Deletes. It was even better than ZFS/OpenSolaris for most tests performed. In Feb. 2008, there was a presentation by Dongjun Shin from Samsung as part of the Linux Storage & File System Workshop 2008 (LSF ‘08). He benchmarked NILFS, Btrfs, Ext2, Ext3, Ext4, ReiserFS, and XFS when running on an SSD device. Granted that the testing is a little old, but the results are very, very exciting. The benchmark, Postmark, simulates an email server. Two groups of files sizes were tested, (1) 9 - 15KB (S), and (2) 0.1 - 3MB (L). For each group, two tests were run with a small number of files (S), and a larger number of files (L). Figure 1 and 2 below are the test results. Figure 1: Postmark Results for Small File Size Figure 1: Postmark Results for Small File Size Figure 2: Postmark Results for Large File Size Figure 2: Postmark Results for Large File Size Notice that in both cases, the performance of NILFS exceeds that of other file systems. For small files NILFS was about 25-38% faster than the nearest competitor (btrfs). For large files NILFS was about 15-25% faster than the nearest competitor (reiserfs and/or ext4). It is pretty amazing to see such a boost in performance from a change in file system, but it does show you that the coupling of file system design with hardware, in this case SSD’s, can produce a big boost in performance. But “There Ain’t No Such Thing As A Free Lunch” TANSTAAFL. There are some current issues with NILFS and SSD’s. There was a recent posting to the NILFS mailing list about using NILFS as the root drive for a Linux system. It was pointed out that the root file system produces a great deal of traffic. Coupled with this is the fact that NILFS file system activity can be reasonably write heavy and you have the potential for quickly wearing out SSD drives (remember that NAND chips which make up SSD’s have a limited number of rewrites). But the developers of NILFS are aware of this and a better garbage collection (GC) algorithm is under investigation. There was also a question on the Linux kernel mailing list about the effect of age on the performance (i.e. would the performance of NILFS still remain far above others on the Postmark test after it was used for a few months?). The answer is that the developers don’t believe the performance suffers after it is used for a period of time, but there isn’t any data to back up that claim at the present time. However, virtually all files systems suffer degrading performance with age. NILFS - It’s Definitely Worth Testing NILFS has a great deal going for it in many regards. It is a modern file system in almost every respect (OK, no built-in RAID, but that can be worked around). The log-structured design of NILFS means that its write performance should be very, very good and there is evidence of this from the performance report that benchmarked Postmark. Additionally, the fact that NILFS continuously creates checkpoints that can be used to create snapshots, is of great benefit. These checkpoints can be used to recover erased or modified user files. They can also be used for backups or creating disaster recover images of data. More over, creating these checkpoints or snapshots do not result in decreased performance as they do for file systems such as ZFS. NILFS holds great promise for Linux. There are many scenarios where it would work extremely well. In particular it works very well for user directories or work directories. In the HPC world it would work extremely well for high-speed storage that are dominated by write performance. Coupling the performance boosts with the snapshot features make NILFS a potential system administrators dream file system. It is well worth trying NILFS on your system. -- Para dar de baja la suscripción, mande un mensaje a: opensuse-es+unsubscribe@opensuse.org Para obtener el resto de direcciones-comando, mande un mensaje a: opensuse-es+help@opensuse.org
On Wednesday 03 June 2009 17:07:28 Juan Erbes wrote:
* File sizes of up to 8 EiB (Exbibyte - approximately an Exabyte)
Vaya foto cabe ahí !! Como diría Fry, "Mas resolución que la vida real."
Saludos.
Alfredo J.V.P.
--
"Una vez que se descarta lo imposible, lo que queda es la verdad por
improbable que parezca" (Sherlock Holmes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2009-06-03 a las 12:07 -0300, Juan Erbes escribió:
Salió en Linux Magazine, les pego el texto en ingles para los que no puedan acceder, pero segun parece, su mejor perfomance está en los almacenamientos de estado sólido SSD::
NILFS2 (New Implementation of a Log-Structured File System Version 2) is a very promising new log-structured file system that has continuous snapshots and versioning of the entire file system. This means that you can recover files that were deleted or unintentionally modified as well as perform backups at any time from a snapshot without a performance penalty normally associated with creating snapshots. In addition, there is evidence that NILFS has extremely good performance on SSD drives.
Log-Structured File System?
Log-Structured File Systems are a bit different than other file systems with both good points and bad points. Rather than write to a tree structure such as a b-tree or an h-tree, either with or without a journal, a log-structured file system writes all data and metadata sequentially in a continuous stream that is called a log (actually it is a circular log).
... Muy interesante... si no lo entiendo mal, es como una base datos incremental-continua. Va guardando los cambios. - -- Saludos Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkom9IMACgkQtTMYHG2NR9VRPwCgjCE/+Zjsb+rMYAb5/w22IwcQ o/0AnjZEvoIwUMaOkV1CbCMdG0EGNebm =9tlE -----END PGP SIGNATURE-----
participants (3)
-
Alfredo J. V. P.
-
Carlos E. R.
-
Juan Erbes