Danny, On Thursday 12 August 2004 14:21, Danny Sauer wrote:
Randall wrote regarding 'Re: [SLE] KMail question' on Wed, Aug 11 at 20:14:
[... comparing size of maildir to mbox...]
...
So, using Reiser, the maildir actualy only took up ~5MB more space, or about 4%. It's worth noting that, while there was 127MB of files, it was only taking up 118MB of space on a reiser filesystem. It's also worth noting that it took a little over 4 times as long to create the mbox file than it did the maildir files, even though the maildir was created first so those files were more likely to be alread cached.
Well, I'm not using ReiserFS, for one thing. I'm using XFS.
On a mailing list folder (no binary attachments) with 65591 messages, du reports 363MB, but it's actually only taking up 350MB of disk space (as compared to 280MB for the mbox). That's 25% less space, but it's just 25MB for well over 65 *thousand* messages. It's hard to find a drive that's less than 10GB now. Say that costs $100. That's $10/GB. Less than 1 cent per megabyte. The performance hit and file corruption risk on an mbox is not worth the 25 cents worth of disk space saved, IMHO.
But I have the drives I have in the cabinet and power supply I have. To me, waste as waste. Also, since my work (software development) demands high performance computing hardware, I only buy 10,000 RPM, Ultra 160 (or maybe now SATA) drives. No quite as cheap and usually of more modest capcity. Right now I have two 37 GB drives in my system.
With an mbox, you've gotta do a lot of seeking to find the message you're interested in (or pre-scan the file generating file offsets at the time of opening). With maildir, you list a directory and go to the file corresponding to your message. 6000 messages in a folder? It takes a long time to read that 114MB text file, but not so long to read the contents of 2 directories. :)
Look in your ~/Mail directory. The mailbox files are indexed. There's no need to hunt around for individual messages. The overhead of accessing those indexes and then seeking to the mbox file offset required is certainly less than the overhead of accessing and maintaining file system directories.
Scenario: A file is open and being written to when the computer loses power due to the UPS malfunctioning. That file is only half written, and the journal didn't manage to catch it. The file's corrupted.
Results:
mbox - all mail is in one file, a bunch is lost due to the failure. maildir - one message per file, part of one message is lost due to the failure.
UPS malfunctioning? Perhaps. New mail is always appended to the end of the file. No power failure is going to disrupt a single sector write (modern "Winchester" drives never experience uncontrolled shut-down--they must retract the heads to their landing zone and have internal power reserves sufficient to do so regardless of how or when they lose power). So either the new message us added, it's partially added or it's not added at all (or the file system indexing doesn't reflect the data being written, which is effectively the same as the new message not being added to the mbox file). No existing mail is likely to be disturbed by a crash or power failure. In all my many years, what few disk problems I've encountered have all been absolute catastrophes (meaning there's no recourse be to go to your most recent backup).
Disks are cheap, esp for a 5MB difference. Data recovery time is not.
Waste is waste. If a crash happens, data recovery might be required. If it is, fewer files mean easier and faster recovery.
_That_'s (a few reasons) why I prefer maildir. :)
--Danny
Randall Schulz