Re: [SLE] maildir and spam

20 Feb 2004


      On Thursday 19 February 2004 16:45, Carlos E. R. wrote:
...
|The Bayesian database support in Spamassassin tries to identify spam by
|looking at what are called tokens; short phrases that are commonly found
|in spam or ham. If I've handed 100 messages to sa-learn that have the
|phrase penis enlargement and told it that those are all spam, when the
|101st message comes in with the phrase penis enlargment, the Bayesian
|code is pretty sure that the new message is spam and raises the spam
|score of that message.
So it is looking at phrases, not words.
And perhaps thats why it DOES DO a good job with the
random word messages.  They contain virtually no "if and or 
the a it is am are"  etc.etc.  Therefore its pretty plain that
these random word messages are spam. 

-- 
_____________________________________
John Andersen