Randall wrote regarding 'Re: [SLE] KMail question' on Thu, Aug 12 at 16:46:
Danny,
On Thursday 12 August 2004 14:21, Danny Sauer wrote:
Randall wrote regarding 'Re: [SLE] KMail question' on Wed, Aug 11 at 20:14:
[... comparing size of maildir to mbox...]
...
So, using Reiser, the maildir actualy only took up ~5MB more space, or about 4%. It's worth noting that, while there was 127MB of files, it was only taking up 118MB of space on a reiser filesystem. It's also worth noting that it took a little over 4 times as long to create the mbox file than it did the maildir files, even though the maildir was created first so those files were more likely to be alread cached.
Well, I'm not using ReiserFS, for one thing. I'm using XFS.
XFS is good for big files, and mbox is a big file. Reiser is good for lots of small files. Maildir uses small files. If using Maildir, it oughtta be on a ReiserFS.
On a mailing list folder (no binary attachments) with 65591 messages, du reports 363MB, but it's actually only taking up 350MB of disk space (as compared to 280MB for the mbox). That's 25% less space, but it's just 25MB for well over 65 *thousand* messages. It's hard to find a drive that's less than 10GB now. Say that costs $100. That's $10/GB. Less than 1 cent per megabyte. The performance hit and file corruption risk on an mbox is not worth the 25 cents worth of disk space saved, IMHO.
But I have the drives I have in the cabinet and power supply I have. To me, waste as waste. Also, since my work (software development) demands high performance computing hardware, I only buy 10,000 RPM, Ultra 160 (or maybe now SATA) drives. No quite as cheap and usually of more modest capcity. Right now I have two 37 GB drives in my system.
Perhaps you should consider storing those mbox files gzipped, too. They're text files, and compress very well. Waste is waste, you know? :)
With an mbox, you've gotta do a lot of seeking to find the message you're interested in (or pre-scan the file generating file offsets at the time of opening). With maildir, you list a directory and go to the file corresponding to your message. 6000 messages in a folder? It takes a long time to read that 114MB text file, but not so long to read the contents of 2 directories. :)
Look in your ~/Mail directory. The mailbox files are indexed. There's no need to hunt around for individual messages. The overhead of accessing those indexes and then seeking to the mbox file offset required is certainly less than the overhead of accessing and maintaining file system directories.
I don't have a ~/Mail directory. However, I do have a system that switched from mbox to Maildir that now performs much better in all measurable ways. Whiel it's true that an mbox file is faster to seek within when there are only a small number of messages, it scales badly. Maildir scales much more nicely. Clearly mbox serves your needs, but others searching the archive may benefit from knowing the limitations of both schemes. That, BTW, is my reason for posting.
Scenario: A file is open and being written to when the computer loses power due to the UPS malfunctioning. That file is only half written, and the journal didn't manage to catch it. The file's corrupted.
Results:
mbox - all mail is in one file, a bunch is lost due to the failure. maildir - one message per file, part of one message is lost due to the failure.
UPS malfunctioning? Perhaps.
I was gonna just say "power failure" but realized that most have UPSs by now. :)
New mail is always appended to the end of the file. No power failure is going to disrupt a single sector write (modern "Winchester" drives never experience uncontrolled shut-down--they must retract the heads to their landing zone and have internal power reserves sufficient to do so regardless of how or when they lose power). So either the new message us added, it's partially added or it's not added at all (or the file system indexing doesn't reflect the data being written, which is effectively the same as the new message not being added to the mbox file). No existing mail is likely to be disturbed by a crash or power failure.
What happens when you delete a message? That's right, the portion of the file after that deleted message has to be rewritten. When you pop your mail, and the pop server's deleting the first few messages, what happens when the SMTP server on the same machine starts delivering a message? I hope the locking mechanism works perfectly. With Maildir, you can do whatever you want with messages while new messages are being delivered. This may not matter to you personally, but a few years back I had an IMAP server crash while accessing my mbox. It didn't clear the filesystem lock it had on the mbox file. So, new messages started piling up in the SMTP queue. I didn't realize this, though, because my IMAP client didn't notify me. If I'd been using maildir, that wouldn't have happened. I've also had things corrupt the first line in an mbox file. Then, the accessing program can't get to any of the messages in there until I go in and fix the first line by hand. Again, this isn't an issue with Maildir. My priority is data reliability under modification. The space lost is nearly nothing. If setting up a new system where lots of messages will be stored, Maildir is really the only sensible choice. There are a few isolated situations where a simplistic low capacity system will work better, or where someone is stuck with a legacy setup. I'm sure mbox is fine, there. So far, I know exactly 1 person for whom mbox works better, though. :) --Danny, noting that NNTP storage works like maildir in most cases, and that such a decision was made for a reason ;)