Danny, On Friday 13 August 2004 08:56, Danny Sauer wrote:
Randall wrote regarding 'Re: [SLE] KMail question' on Thu, Aug 12 at 16:46:
...
Perhaps you should consider storing those mbox files gzipped, too. They're text files, and compress very well. Waste is waste, you know? :)
I don't see this option in KMail's options dialogs. How does one do this? Under Windows, I always applied system-level compression to the files that stored my mailbox files (Eudora always uses mbox format, by the way). Actually, I used compression for my archived mail, but kept active mailboxes (those still receiving new mail) uncompressed.
With an mbox, you've gotta do a lot of seeking to find the message you're interested in (or pre-scan the file generating file offsets at the time of opening). With maildir, you list a directory and go to the file corresponding to your message. 6000 messages in a folder? It takes a long time to read that 114MB text file, but not so long to read the contents of 2 directories. :)
Look in your ~/Mail directory. The mailbox files are indexed. There's no need to hunt around for individual messages. The overhead of accessing those indexes and then seeking to the mbox file offset required is certainly less than the overhead of accessing and maintaining file system directories.
I don't have a ~/Mail directory. However, I do have a system that switched from mbox to Maildir that now performs much better in all measurable ways. Whiel it's true that an mbox file is faster to seek within when there are only a small number of messages, it scales badly. Maildir scales much more nicely. Clearly mbox serves your needs, but others searching the archive may benefit from knowing the limitations of both schemes. That, BTW, is my reason for posting.
Overhead in space and time for multiple files will always be greater than for the same or very similar amount of data stored in a single file. Keep in mind that the indexing files are a practical necessity even for maildir storage, since without it the displaying the summaries in the mailbox pane of the mail client would require access to all the files each time those summaries was created (each time you switched mail folders in KMail).
Scenario: A file is open and being written to when the computer loses power due to the UPS malfunctioning. That file is only half written, and the journal didn't manage to catch it. The file's corrupted.
Results:
mbox - all mail is in one file, a bunch is lost due to the failure. maildir - one message per file, part of one message is lost due to the failure.
UPS malfunctioning? Perhaps.
I was gonna just say "power failure" but realized that most have UPSs by now. :)
New mail is always appended to the end of the file. No power failure is going to disrupt a single sector write (modern "Winchester" drives never experience uncontrolled shut-down--they must retract the heads to their landing zone and have internal power reserves sufficient to do so regardless of how or when they lose power). So either the new message us added, it's partially added or it's not added at all (or the file system indexing doesn't reflect the data being written, which is effectively the same as the new message not being added to the mbox file). No existing mail is likely to be disturbed by a crash or power failure.
What happens when you delete a message? That's right, the portion of the file after that deleted message has to be rewritten. When you pop your mail, and the pop server's deleting the first few messages, what happens when the SMTP server on the same machine starts delivering a message? I hope the locking mechanism works perfectly. With Maildir, you can do whatever you want with messages while new messages are being delivered. This may not matter to you personally, but a few years back I had an IMAP server crash while accessing my mbox. It didn't clear the filesystem lock it had on the mbox file. So, new messages started piling up in the SMTP queue. I didn't realize this, though, because my IMAP client didn't notify me. If I'd been using maildir, that wouldn't have happened. I've also had things corrupt the first line in an mbox file. Then, the accessing program can't get to any of the messages in there until I go in and fix the first line by hand. Again, this isn't an issue with Maildir.
You seem to be confusing local mail storage by the client (KMail) and mail storage in mail drop files or on servers. The latter is not what we're discussing. SMTP, POP, IMAP, mail drop files etc. are all irrelevant here. Locking mechanisms either work or they don't. I, too, "hope" the Linux kernel works. And again, I'm a single user on a personal computer. No one else is going to be manipulating my mailbox files, concurrently or otherwise. I'm certainly not going to accidentally edit an mailbox file while I'm using KMail. The deleted message are simply abandoned. The index no longer refers to them. When the mailbox file is compacted (I compact when exiting), that abandoned space is recovered. I also "hope" that KMail's file compaction is implemented intelligently, i.e., so as to minimize the likelihood of data loss in the unlikely event that some sort of hardware or software failure occurs during compcation.
My priority is data reliability under modification. The space lost is nearly nothing. If setting up a new system where lots of messages will be stored, Maildir is really the only sensible choice. There are a few isolated situations where a simplistic low capacity system will work better, or where someone is stuck with a legacy setup. I'm sure mbox is fine, there. So far, I know exactly 1 person for whom mbox works better, though. :)
--Danny, noting that NNTP storage works like maildir in most cases, and that such a decision was made for a reason ;)
NNTP server implementation issues are also irrelevant to the matter of storing email client-managed mail stores. We are talking about using KMail, right? And please stop winking at me, OK? Randall Schulz