On 2006-01-09 12:03 Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The Monday 2006-01-09 at 09:01 +0100, Anders Norrbring wrote:
But then again, 45 minutes after a fresh reboot, I ran top and saw this
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3165 vscan 15 0 438m 384m 29m S 0.0 38.1 0:15.87 amavisd 2470 vscan 15 0 437m 383m 30m S 0.0 38.0 0:18.42 amavisd 3440 wwwrun 15 0 8684 6464 3988 S 0.0 0.6 0:00.15 httpd2-prefork 596 root 19 0 8936 6268 1716 S 0.0 0.6 0:01.35 java 599 root 15 0 8936 6268 1716 S 0.0 0.6 0:00.01 java 600 root 15 0 8936 6268 1716 S 0.0 0.6 0:00.14 java
amavisd consuming 76% of memory? Yikes!
I read the full thread before popping in, but I was going to tell you from the start that it is an "out of memory" problem, as explained by Carl Hartung.
The important message in the first log you posted are here marked with '*':
Jan 4 05:17:27 iris master[1030]: process 5996 exited, signaled to death by 9 Jan 4 05:23:24 iris postfix/master[2376]: warning: unix_trigger_event: read timeout forservice public/flush * Jan 4 05:23:28 iris kernel: ldt allocation failed * Jan 4 05:23:28 iris kernel: VM: killing process httpd2-prefork
"VM" is virtual memory. The kernel starts killing everything in sight to save himself - and fails doing so, because the system dies. Linux takes very badly being out of memory. How much do you have, by the way?
Then you mention that amavis is eating lots of memory - that's where you should look at: limit the number of children that amavis can span, for starters. My guess is that if you did "ps afx" you would have detected lot of children there.
I have seen this situation before, caused by amavis and/or spamassassin.
Why at certain hour? probably the hour chosen by some spammers to send you problematic emails. Or a virus out there.
Hi and thanks! It seems like it was spamassassin.. When it ran a lint, every bit of memory was grabbed. I have 2GB of memory in the box, so it seemed a little weird at first. However, I'm running an old SA on that box, apparently not well configured either.. :) I found that the bayes database files was almost 16GB in size! When I deleted those, a lint went just fine! At 4.35 in the morning, cron triggers a rules_du_jour update of the SA rules, and after that, a lint is run... I hope this will help, at least until I have the time and inspiration to upgrade it all to SuSE 10.. -- Anders Norrbring Norrbring Consulting