-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2011 04:22 PM, Greg Freemyer wrote:
On Thu, Jun 2, 2011 at 4:00 PM, jdd <jdd@dodin.org> wrote:
Le 02/06/2011 21:27, Carlos E. R. a écrit :
Which means run fsck on all opened filesystems.
shouldn't. I usually see only a journal control
jdd
Remember meta-data journaling is fairly common.
Data journaling much less so.
Data journaling will be more robust, so if robustness is your issue, give it a shot.
I don't know what filesystems offer data journaling, but ext3 definitely does. From the main page in the ext3 section:
============== data={journal|ordered|writeback} Specifies the journalling mode for file data. Metadata is always journaled. To use modes other than ordered on the root filesystem, pass the mode to the kernel as boot parameter, e.g. rootflags=data=journal.
journal All data is committed into the journal prior to being written into the main filesystem.
ordered This is the default mode. All data is forced directly out to the main file system prior to its metadata being committed to the journal.
writeback Data ordering is not preserved - data may be written into the main filesystem after its metadata has been committed to the jour‐ nal. This is rumoured to be the highest-throughput option. It guarantees internal filesystem integrity, however it can allow old data to appear in files after a crash and journal recovery. ================
writeback is the least robust. Data can be written in any order and conceivably sit in cache for extended periods. 5+ years ago, I think this was the normal behavior for most mainstream filesystems.
ext3 now defaults to data=ordered (Remember the journals are flushed on every mount, so it is easy to switch from one mode to another.)
I don't know if "data=journal" is any safer than "data=ordered" or not.
The choice between the two isn't one of robustness. It's a choice of workload. They'll both have your data on disk when fsync() returns and neither can make any guarantees about data being written before then. Within the confines of existing APIs, file systems can't make any promises WRT file contents beyond a chunk of data at a certain offset. It only understands its own metadata. Writes are still cached before being written to disk. In both cases, writes can be split into multiple transactions. The writes are split up into page-sized chunks (along with associated metadata, like bitmaps or indirect blocks), each of which may be in its own transaction. In neither case will a 32 MB write() be performed in an atomic chunk. Each mode will place the blocks on a list that will be flushed during commit. The mode determines where it will be flushed: the general file system or the journal. For robustness, use fsync(). That's what it's there for. The descriptions of each mode you've pasted give the "what it does" aspect of each mode, but not the effects. data=writeback means that the journal will not stall on large writes when the journal must be flushed to the general file system or an fsync() is called. This will perform the best for most write loads but can introduce corruption at the end of files if the system crashes if the file is extended (metadata) before the file data itself is written out (data). data=ordered means that data writes go directly to the file system and are guaranteed to hit disk before the transaction commits. This protects against old file data appearing in sections of a file that have grown but weren't written yet. It's a bit of a heavy hammer for that purpose since it writes all of the outstanding writes to the file before the transaction is committed, not just the ones that fall outside the boundaries. A side effect of this is that it can stall transaction commits when there are large writes queued up. There are fairly severe performance consequences when there is fsync activity on a file system with a lot of streaming writes. This is because the fsync can't be honored until the transaction is committed, and there may be other transactions queued to be committed before it. Even a small write can stall behind the ordered writeout of a large write list associated with another file. data=journal means that _every_ write to the file system must go through the journal. For streaming workloads, this will usually result in choppy, bursty performance as the journal overflows again and again and must be flushed to the filesystem, stalling progress as it does so. Administrators should be aware that any increase in journal size carries a corresponding increase in latency when the journal must be flushed. So you may get longer bursts but they'll be further apart. The flip side of this is that it also means that for fsync-heavy workloads on small files, like with a mail spool, the fsync() call can be honored just by committing the write to the journal. This limits seeking to within the journal area and allows the file system to write to the general file system at its leisure, queuing and sequencing writes to minimize seeking. Chris Mason, some time ago, started playing around with the idea of a data=guarded mode. This mode would only queue up writes that are outside the current boundaries of the file so that most of the latency associated with data=ordered would be eliminated. I didn't really follow what happened with this effort. If I had to guess I'd say that the overhead associated with making it work well would be too high to bother with, since it would require an extent mapping of the file waiting to be written. I'd also bet that an opportunistically created extent map wouldn't be complete enough to make it worthwhile. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk3oBRUACgkQLPWxlyuTD7Kp8ACdHIEaeBofo0u3X8w80jlpgo3f HlcAn1PvMdocQT3iVG0IxjZRZwLcXeEf =E87h -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org