Rikard wrote regarding 'Re: [SLE] [9.0] How can i tell if Spamassassin is learning?' on Fri, Aug 20 at 11:12:
On Friday 20 August 2004 17.23, Danny Sauer wrote:
Rikard wrote regarding '[SLE] [9.0] How can i tell if Spamassassin is learning?' on Fri, Aug 20 at 07:03:
Hi all!
How can i determine if SA actually is learning via sa-learn? I get a message that it processed xx files but it keeps missing out on the same types of mails i have fed it some 10 times... It only catches approx 10-20% of the spam i am receiving. I have a bayes database and the contents in it changes after a sa-learn, but it still fails to recognize spam.
The bayesian filter in only part of the weighted score a spam sees. Do you have long reports enabled? If not, turn those on and see if the probability the a message is spam according to the bayes DB goes up. You may also look at the spam score in the headers. If you're getting a lot of spam that's scored 4.9, you might move your threshold down to 4 instead of leaving it at 5...
Note that the Bayes DB needs to learn from spam *and* ham to work well. If you haven't trained it with roughly equal amounts of ham and spam, it's not going to work well. Also, if it hasn't seen on the order of a few thousand of each message, it's not going to be working to its full potential. It takes time and lots of experience for it to learn, much like most things. :)
I know that doesn't directly answer your question, but maybe it helps none the less. If sa-learn says it processed all of those messages and doesn't throw an error, then it worked. It will alert you if it doesn't work.
--Danny
How do i enable "long reports", And where can i read those reports?
In /etc/mail/spamassassin/local.cf, set report_safe to 1 or 2 (1 is more reasonable, probably), and add a line like add_header all Report _REPORT_ to that file. Then, you'll get an additional header in all of your spams and non spams reporting on all of the tests applied to the message whether it's marked as spam or not.
The missed spams vary between 1.5 to almost 5 (my threshold is set to 5) I keep teaching SA about once a week.
perldoc Mail::SpamAssassin::Conf is good reading. You have to have at least 200 ham and 200 spam messages in bayes before it's even used. You may make sure you're at the point... :)
I move all missed spam manually to a specific mailfolder and run sa-learn manually:
#> sa-learn --spam /home/rikjoh/Mail/missed_spam/cur/ Learned from 710 message(s) (2148 message(s) examined).
I also run "sa-learn --ham" on a couple of folders (which brings me to a script question: How can i make sa-learn scan all "ham" folders automaticly? There are 103 of them scattered all under my ~/Mail folder... (eg. Mail/.Computer related.directory/QNX/cur, Mail/.Computer related.directory/.Linux.directory/SuSE/cur etc. etc.))
Just list them all. IIRC, sa-learn will acept multiple directories as arguments. Just do sa-learn --ham \ /path/to/dir1/cur \ /path/to/dir2/cur \ /path/to/dir3/cur Or, if they happen to have a common structure, you can just do something like: sa-learn --ham `find /path/to/hamfolders -type d -name cur` Probably easier, though, would be to copy your good messages to another folder, run sa-learn --ham weekly or so, and then clear that folder out. It'll automatically learn from messages that score low enough and high enough, so there's little reason to train the bayesian filter a lot unless you get lots of false positives or false negatives. --Danny