On Thursday 19 February 2004 16:45, Carlos E. R. wrote:
|The Bayesian database support in Spamassassin tries to identify spam by |looking at what are called tokens; short phrases that are commonly found |in spam or ham. If I've handed 100 messages to sa-learn that have the |phrase penis enlargement and told it that those are all spam, when the |101st message comes in with the phrase penis enlargment, the Bayesian |code is pretty sure that the new message is spam and raises the spam |score of that message.
So it is looking at phrases, not words.
And perhaps thats why it DOES DO a good job with the random word messages. They contain virtually no "if and or the a it is am are" etc.etc. Therefore its pretty plain that these random word messages are spam. -- _____________________________________ John Andersen