-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Wednesday 2006-02-01 at 10:50 +0100, Per Jessen wrote:
Carlos E. R. wrote:
SA should be perfectly capable of picking a lot of spam without using bayes.
It should... but, for example in my case, it doesn't. There are may spam emails I get that are not tagged by any other rule except bayes: I had to increase the scoring so that a 99% mark by the bayes filter gives 5 points. On the other hand, I'm getting more false positives than a few months back (and not by the bayes filter); some of those emails are also examined by a commercial filter of the mail server, and that one gives a correct score.
(I forgot to say that many of those false positives are from newsletters).
Interesting. I'm using spamassassin (amongst others) to provide a service for our customers, and I'm not currently using bayes. The hit-rate for SA is still very good - in the 95-98% range.
Looks good... it must depend on the kind of spam you receive, I suppose. Also, I suppose you must be using the networks tests: it's true that they flag a lot of spam, but sometimes they are unfair.
I don't like how SA is scoring spam recently...
I have changed some of the scores, but usually to a lower level. And I'm still using 2.64, not the 3.x series.
I have the feeling that it worked better, at least the scoring. I'm using spamassassin-3.1.0-0.1 myself. Look, you can see some of the false positives in this very same mail list: One chap got his (mistaken) unsubscription mail tagged as spam by the SuSE list server: X-Virus-Scanned: by amavisd-new at Relay1.suse.de X-Spam-Status: Yes, hits=8.1 tagged_above=-20.0 required=5.0 tests=BAYES_95, DNS_FROM_RFC_POST, DNS_FROM_RFC_WHOIS, HTML_90_100, HTML_MESSAGE, MISSING_SUBJECT X-Spam-Level: ******** The scoring would be something like (in my SA 3.1): BAYES_95 0.0001 0.0001 3.0 3.0 DNS_FROM_RFC_POST 0 1.440 0 1.708 Envelope sender in postmaster.rfc-ignorant.org DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org HTML_90_100 0.584 0 0.567 0.113 Message is 90% to 100% HTML HTML_MESSAGE 0.001 HTML included in message MISSING_SUBJECT 1.729 1.345 2.035 1.816 Missing Subject: header total ^^^^^ = 5.085 It doesn't add to those 8.1 points :-? Let's see with an 3.0.3 score set: BAYES_95 0 0 3.514 3.0 DNS_FROM_RFC_POST 0 1.376 0 1.614 DNS_FROM_RFC_WHOIS 0 0.492 0 0.296 HTML_90_100 0.346 0.189 0.043 0.022 HTML_MESSAGE 0.001 MISSING_SUBJECT 1.109 1.570 1.282 1.226 Even lower. SuSE must be using very altered values. And a badly trained Bayesian database: mine scores that same email at 5%, not 95%. And version 2.63? BAYES_95 did no exist, value not considered. BAYES_90 0 0 2.454 2.101 BAYES_99 0 0 5.400 5.400 HTML_90_100 0.308 1.073 0 1.187 HTML_MESSAGE 0.160 0.001 0.100 0.100 DNS_FROM_RFC_POST -- DNS_FROM_RFC_WHOIS -- MISSING_SUBJECT -- Ha! Version 2.63 did not have many of those tests. It would have got 3.39, ie, not spam. However... it proves my point that the postmaster (ISP) being ignorant of the RFC doesn't prove that their users send spam, and the goal of spam filters is to remove spam, and only spam. IMO, of course ;-) - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD4NCXtTMYHG2NR9URAghyAJkBOX6FedaiToJzkadBpsJUSVl++ACfYmbC vllFbKC2KfN0FCvVtVaxvOw= =xpEB -----END PGP SIGNATURE-----