[Bug 726206] New: Akonadi uses too much disk space
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c0 Summary: Akonadi uses too much disk space Classification: openSUSE Product: openSUSE 12.1 Version: Factory Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: KDE4 Applications AssignedTo: kde-maintainers@suse.de ReportedBy: suse-beta@cboltz.de QAContact: qa@suse.de Found By: --- Blocker: --- I have a mailstore (~/Mail, a mix of maildir and mailbox folders) containing about 5 GB of mails, most of them from mailinglists (which means lots of short mails, not too many attachments). Akonadi uses (wastes?) about 2.5 GB of disk space for indexing them :-( (and KMail seems to be even slower than in KDE 4.6, but that's another topic) I'm sorry to say that, but the only word that describes using additional 50% of the mail size for indexing is "insane" ;-) # du -h ~/.local/share/akonadi/ # except directories < 1 MB 1,5G /home/cb/.local/share/akonadi/db_data/akonadi 1,7G /home/cb/.local/share/akonadi/db_data 803M /home/cb/.local/share/akonadi/file_db_data 2,5G /home/cb/.local/share/akonadi/ The files in file_db_data (lots of files with some kB) seem to be copies of mail headers. Isn't that pointless for mails that are stored locally? Additionally, all mail headers are stored in the akonadi database - that probably explains why the database is that large, but it's still crazy. For example, I don't see the need for caching the "Received:" headers in the database - and mailinglist posts have lots of them... BTW: my mail accounts are all POP3 - the only exception: I used IMAP for some days after updating from openSUSE 11.4 to 12.1 beta. I'd guess I accessed a total mail volume of 30 MB over IMAP. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c1 Christian Trippe <ctrippe@opensuse.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ctrippe@opensuse.org --- Comment #1 from Christian Trippe <ctrippe@opensuse.org> 2011-10-24 20:01:13 UTC --- (In reply to comment #0)
The files in file_db_data (lots of files with some kB) seem to be copies of mail headers. Isn't that pointless for mails that are stored locally?
That's probably https://bugs.kde.org/show_bug.cgi?id=282160 which seems to be fixed since 4.7.2. This is supposed to be a cache but did not work. But the the old data does not get deleted. The rest is probably by design and should in my opinion anyway better go upstream. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c2 Will Stephenson <wstephenson@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Status|NEW |ASSIGNED CC| |wstephenson@suse.com AssignedTo|kde-maintainers@suse.de |wstephenson@suse.com --- Comment #2 from Will Stephenson <wstephenson@suse.com> 2011-11-23 22:36:14 UTC --- I don't know what to do about this. It's beyond my level of Akonadi-fu. Asking on the upstream bug for advice. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c3 Andras Mantia <amantia@kde.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amantia@kde.org --- Comment #3 from Andras Mantia <amantia@kde.org> 2011-11-24 07:01:37 UTC --- See my explanation in the KDE bugreport. If you had you akonadi setup from before akonadi 1.6.2 (KDE 4.7.2), then you might have items that are not deleted. The KDE report has information what you could do (or you could just start with a clean akonadi setup and add your accounts again - don't forget to fix the local filters after). In case you still have the old KMail .index.* files in your maildir directories, I'd like to make a test: count how much space do they require and how much is in ~/.kde4/share/apps/kmail and compare with the size of ~/.local/share/akonadi. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c4 --- Comment #4 from Christian Boltz <suse-beta@cboltz.de> 2011-11-24 22:08:28 CET --- (In reply to comment #3)
In case you still have the old KMail .index.* files in your maildir directories, I'd like to make a test: count how much space do they require and
I knew that backups are good for something ;-) All *index* files had a total of 187 MB.
how much is in ~/.kde4/share/apps/kmail
That directory has about 200 kB in my backup. I assume you were asking for the mailstore which is ~/Mail for me (yes, the mail location from the good old (IIRC KDE3) times ;-) I didn't check the size in the backup, but one more month of mails doesn't make a real difference in my grown-for-years mailstore. Therefore I'd say ~/Mail had about 5 GB - 100 MB more or less don't change too much here.
and compare with the size of ~/.local/share/akonadi.
I'm afraid you won't like that answer... ;-) It had 2.5 GB when I opened this bugreport and has slightly grown to 2.6 GB in the meantime. Now compare that with 187 MB of the old *index* files :-/ BTW: One of the reasons could be that Akonadi caches _all_ headers (checked in the database). IMHO that's pointless - ignoring the "Received:" headers would already make a big difference, and I doubt they are useful for searching. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c5 --- Comment #5 from Christian Boltz <suse-beta@cboltz.de> 2011-11-25 22:35:05 CET --- BTW: Please add another 1.3 GB in ~/.kde4/share/apps/nepomuk/repository/main/data/virtuosobackend to the calculation. I don't know how much/how direct it is connected with akonadi, but I'm quite sure it wasn't there with KMail 1. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c6 --- Comment #6 from Andras Mantia <amantia@kde.org> 2011-11-26 09:42:48 UTC --- Thanks for the numbers, indeed the index files were much smaller, it means.... Try akonadictl vacuum and see how much the database decreases and akonadictl fsck to see if you have stale files in file_db_data. Regarding email headers, not the full header is cached, only the envelope (according to the main akonadi developer). But it looks we should certainly reduce the space used somehow. Regarding virtuosobackend: no, it is not Akonadi. It is the "semantic desktop search database", which indeed indexes your mail for full text search and finding relation between mails, addressbook and other files on your system, but it might contain also indexes from your other files on the file system. So it interacts with KMail/Akonadi, but it is a different server and functionality, outside of Akonadi's control. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c7 --- Comment #7 from Christian Boltz <suse-beta@cboltz.de> 2011-11-26 13:29:14 CET --- (In reply to comment #6)
Thanks for the numbers, indeed the index files were much smaller, it means.... Try akonadictl vacuum and see how much the database decreases and akonadictl fsck to see if you have stale files in file_db_data.
akonadictl does not support those parameters (on openSUSE 12.1, KDE 4.7.2) - were they added in a later KDE release?
Regarding email headers, not the full header is cached, only the envelope (according to the main akonadi developer).
That seems to depend (but I have no idea on what ;-). I just checked the content of "parttable" and the result is mixed: select name, count(name), round(avg(datasize)) as datasize_avg, round(sum(datasize)/1024/1024) as datasize_sum_MB from parttable where name like 'PLD%' group by name; +--------------+-------------+--------------+-----------------+ | name | count(name) | datasize_avg | datasize_sum_MB | +--------------+-------------+--------------+-----------------+ | PLD:ENVELOPE | 340754 | 355 | 115 | | PLD:HEAD | 340754 | 3141 | 1021 | | PLD:RFC822 | 32933 | 10657 | 335 | +--------------+-------------+--------------+-----------------+ PLD:ENVELOPE rows only contains the most important headers PLD:HEAD rows contain the full mail headers (which can already be pretty big on mailinglist posts) PLD:RFC822 rows contain complete mails with header and body (!) (I don't know if those mails are stored in my maildir somewhere...) There seems to be a connection between PLD:HEAD and PLD:Envelope because the count() of both is the same.
But it looks we should certainly reduce the space used somehow.
Indeed ;-)
Regarding virtuosobackend: no, it is not Akonadi. It is the "semantic desktop search database", which indeed indexes your mail for full text search and finding relation between mails, addressbook and other files on your system, but it might contain also indexes from your other files on the file system. So it interacts with KMail/Akonadi, but it is a different server and functionality, outside of Akonadi's control.
Thanks for the explanation. Sounds like something I don't really need ;-) - is it possible to disable it and to delete its cache? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c8 --- Comment #8 from Andras Mantia <amantia@kde.org> 2011-11-26 21:14:49 UTC --- Oh, seems Volker didn't backport fsck and vacuum to 1.6.2 :( It will be in the next release only then. Alternatively you can compile yourself or get it from the soon to be released KDE 4.8 beta 1. Only the Akonadi server (not the libraries) is needed and it is of course compatible with the old version. As far as I know (and as I see in the akonadi database), PLD:ENVELOPE is stored for every mail. For all the others there is a PLD:HEAD and PLD:RFC822 (the whole mail), but this should have real data only for disconnected IMAP accounts. For all the others, including maildir and mbox, the header and body part should be stored only for a short time, for me it is 1 minute. You can check in the top level folder properties. (I ignore contacts/calendars that I don't know how long are cached). Just FYI (not that it matters for disc space) PLD:RFC822 itself holds the mail only if it is less then 4KB, anything bigger is stored in external files. But again, the storage is permanent only for disconnected imap. Back to your results. I have 2.7GB local mail in maildir. My ~/.local/share/akonadi/file_db_data is 350MB, which I can say it is roughly that amount of disconnected IMAP email cached there permanently. My mysql database is 692MB. Your query returns: +--------------+-------------+--------------+-----------------+ | name | count(name) | datasize_avg | datasize_sum_MB | +--------------+-------------+--------------+-----------------+ | PLD:ENVELOPE | 272777 | 375 | 97 | | PLD:HEAD | 272777 | 194 | 51 | | PLD:RFC822 | 102905 | 3815 | 374 | +--------------+-------------+--------------+-----------------+ The RFC822 part is large and worries me as well. But the HEAD for me is much less then for you, which makes me wonder what goes wrong in your case. Did you have any failed migration? Do you really need the mixed maildir/mailbox folder (the old KMail style storage)? What is the cache policy for you local folders? (Folder Properties->Retrieval) We *might* see a mixedmaildir specific issue as well here. The mixedmaildir thing is really there for KMail1 users, as it is a non-standard mail storage type. I urge you to change your folders to "clean" maildir if possible. And if you did that (or if you happened to have a failed migration), you could: - stop akonadi - remove the akonadi db (.local/share/akonadi) and config (.config/akonadi) - start akonadi - add back the agents, but now use the Maildir and not mixedmaildir (KMail Maildir) agent The ups is that you are sure you have a clean database, the downs is that your filters will be wrong and you have to fix them. You can also run a query like this: select parttable.name, count(parttable.name), round(avg(parttable.datasize)) as datasize_avg, round(sum(parttable.datasize)/1024/1024) as datasize_sum_MB from parttable,pimitemtable, collectiontable where pimitemtable.collectionid=collectiontable.id and collectiontable.resourceid=RESOURCEID and parttable.pimitemid=pimitemtable.id and parttable.name like 'PLD%' group by parttable.name and get the RESOURCEID from the resourcetable, to see how much space is needed per account type. Might help even more to nail down which resource is using (abusing?) the cache. Eg. for me, from the above amount: - maildir resource: 90M (this corresponds to the index files from KMail1 !) - disconnected imap: 431MB (expected, as this is a local cache) - kolab resource: 1MB (converts calendar mails to calendar data) - some small amounts (<1MB) for addressbook and local calendar resource Looking at this data - it is the first time I do such a research -, I have to say that it is not that bad. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c9 --- Comment #9 from Christian Boltz <suse-beta@cboltz.de> 2011-11-27 02:18:03 CET --- (In reply to comment #8)
others, including maildir and mbox, the header and body part should be stored only for a short time, for me it is 1 minute. You can check in the top level folder properties.
It's also set to 1 minute for me.
Just FYI (not that it matters for disc space) PLD:RFC822 itself holds the mail only if it is less then 4KB, anything bigger is stored in external files. But again, the storage is permanent only for disconnected imap.
.. which I don't use. Nearly all of my mails are in a mixedmaildir. I used IMAP for some days (= some hundred mails) directly after the update to KMail2 - that doesn't explain the >32000 PLD:RFC822 mails...
Back to your results. I have 2.7GB local mail in maildir. My ~/.local/share/akonadi/file_db_data is 350MB, which I can say it is roughly that amount of disconnected IMAP email cached there permanently. My mysql database is 692MB. Your query returns: +--------------+-------------+--------------+-----------------+ | name | count(name) | datasize_avg | datasize_sum_MB | +--------------+-------------+--------------+-----------------+ | PLD:ENVELOPE | 272777 | 375 | 97 | | PLD:HEAD | 272777 | 194 | 51 | | PLD:RFC822 | 102905 | 3815 | 374 | +--------------+-------------+--------------+-----------------+
The RFC822 part is large and worries me as well. But the HEAD for me is much less then for you, which makes me wonder what goes wrong in your case.
Most of my mails come via mailinglists, which means small mails, but lots of headers (mailinglist headers, and the longer transport route means more Received headers). If most of the mails you receive have big attachments, the header vs. body ratio is of course different. Nevertheless the average PLD:HEAD size of 194 bytes is really small - that would mean From, To, Subject, Date and maybe 2 or 3 other lines. Check your database what is stored there, for example the last 10 entries: select * from parttable where name='PLD:HEAD' order by id desc limit 10
Did you have any failed migration?
Yes, but IIRC I deleted the akonadi db and config afterwards. Maybe I overlooked something.
Do you really need the mixed maildir/mailbox folder (the old KMail style storage)?
I have 5 GB of mails in a mixedmaildir and don't really want to know how long it would take to migrate them to maildir. Migrate would mean to create a new maildir storage and move the mails there with drag&drop, right? I tried something like that with a mailinglist folder with 60000 mails some weeks ago. The result was that KMail was busy and unuseable for 10 minutes - and I didn't even move all the 60000 mails around, but used blocks of maybe 5000. And if I was really lucky, KMail even crashed... In other words: migrating all my mails to a maildir would probably take days. _If_ I do that, then the target is most probably a nice local dovecot server which (sorry to say that) seems to be a more useable, stable and faster solution these days.
What is the cache policy for you local folders? (Folder Properties->Retrieval)
[x] sync when opening the folder sync automatically after ___ never ___ [x] fetch message content when needed Keep message content for ___ 1 minute ___
We *might* see a mixedmaildir specific issue as well here. The mixedmaildir thing is really there for KMail1 users, as it is a non-standard mail storage
I'd call it a mix of two standard mail storage types ;-)
type. I urge you to change your folders to "clean" maildir if possible.
See above - this isn't really an option.
And if you did that (or if you happened to have a failed migration), you could: [delete akonadi database and config]
I'm not too keen on that - re-syncing all my mail folders would take a day. I had to "test" that more than once already...
the downs is that your filters will be wrong and you have to fix them.
Been there, done that ;-) Besides that, I'm afraid that (at least) my sent mails since using KMail2 are not stored on disk, but only in the database. I have 165 mails/1 MB in the "sent messages" folder in KMail (a maildir storage which claims to be in /home/cb/.local/share/local-mail), but that directory is completely empty on the disk...
You can also run a query like this: .. and get the RESOURCEID from the resourcetable, to see how much space is needed per account type. Might help even more to nail down which resource is using (abusing?) the cache.
Interesting idea, I'll check that tomorrow.
Looking at this data - it is the first time I do such a research -, I have to say that it is not that bad.
I hope you aren't surprised if I disagree ;-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c10 --- Comment #10 from Christian Boltz <suse-beta@cboltz.de> 2011-11-27 14:16:04 CET --- select resourcetable.id as res_id, resourcetable.name as resourcename , parttable.name, count(parttable.name), round(avg(parttable.datasize)) as datasize_avg, round(sum(parttable.datasize)/1024/1024) as datasize_sum_MB from parttable,pimitemtable, collectiontable, resourcetable where pimitemtable.collectionid=collectiontable.id and parttable.pimitemid=pimitemtable.id and parttable.name like 'PLD%' and collectiontable.resourceId = resourcetable.id group by parttable.name, collectiontable.resourceId; +--------+---------------------------------+--------------+-----------------------+--------------+-----------------+ | res_id | resourcename | name | count(parttable.name) | datasize_avg | datasize_sum_MB | +--------+---------------------------------+--------------+-----------------------+--------------+-----------------+ | 6 | akonadi_mixedmaildir_resource_0 | PLD:ENVELOPE | 339848 | 355 | 115 | | 7 | akonadi_maildir_resource_1 | PLD:ENVELOPE | 171 | 327 | 0 | | 6 | akonadi_mixedmaildir_resource_0 | PLD:HEAD | 339848 | 3149 | 1021 | | 7 | akonadi_maildir_resource_1 | PLD:HEAD | 171 | 14 | 0 | | 6 | akonadi_mixedmaildir_resource_0 | PLD:RFC822 | 32689 | 10897 | 340 | | 7 | akonadi_maildir_resource_1 | PLD:RFC822 | 165 | 169 | 0 | | 26 | akonadi_contacts_resource_0 | PLD:RFC822 | 114 | 0 | 0 | +--------+---------------------------------+--------------+-----------------------+--------------+-----------------+ This is not really surprising - nearly all data belongs to my mixedmaildir. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c11 --- Comment #11 from Andras Mantia <amantia@kde.org> 2011-11-27 18:52:03 UTC --- Thanks, I guess I have to import my data into mixedmaildir, test and report to the maintainer, as clearly what you have is not good. Would be interesting to know if those headers stored into database (that really should not stay there for a long time) are from the maildir part or the mbox part of your data. I don't knowthink if this could be find out easily via a query though. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c12 --- Comment #12 from Andras Mantia <amantia@kde.org> 2011-12-01 09:12:02 UTC --- A question: when did you create the akonadi database last time (when did you migrate/import the data)? Was it with KDE 4.7.2 and Akonadi 1.6.2 or with some earlier versions and you used KMail for a while with the earlier version? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c13 --- Comment #13 from Christian Boltz <suse-beta@cboltz.de> 2011-12-01 19:00:50 CET --- (In reply to comment #12)
A question: when did you create the akonadi database last time
It was with 12.1 beta or a slightly newer factory version. I don't remember the exact KDE version number, but you should be able to find it out from the changelog ;-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=726206 https://bugzilla.novell.com/show_bug.cgi?id=726206#c14 --- Comment #14 from Christian Trippe <ctrippe@opensuse.org> 2011-12-08 18:48:52 UTC --- @Christian: According to bug 722418 you used kmail2 already with KDE 4.7.1 which probably answers comment 12. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com