-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Friday 2006-02-03 at 12:33 +0100, Per Jessen wrote:
Carlos E. R. wrote:
(I forgot to say that many of those false positives are from newsletters).
Same here. I'm in the process of building bayes-style filters that are meant for recognising just newsletters. That way I'll be able to add perhaps a couple of points, stopping a newsletter from ending up as a false positive.
Ah... I simply use "whitelist_from" in the file .spamassassin/user_prefs. It is faster to use, for a limited number of senders. The snag is that I could get faked newsletters instead.
DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org
I don't use rfc-ignorant other than as an indicator of a possibly dodgy server. Given that number of poorly configured mail-servers, using rfc-ignorant is a very agressive step, IMHO.
And my HO too. Unfortunately, they are active by default in the spamassassin configuration that SuSE (and us users) uses. Also, I think that quite some of those tests are redundant: if one RBL says that an IP or a domain is bad, some others will say the same. But that, I think, doesn't necessarily mean that the email spammines is higher. Those scores should not be arithmetically added, but some other type of algorithm should be used. Don't know what but kind of: only A says it's bad --> X points only B says it's bad --> Y points A and B says it's bad --> W points. where W should be perhaps the average or the maximum of (X, Y), but not the sum. The result of an IP being listed on a dozen black lists could mean that all think the same, or that all copy data; I rather think it means that it is very probably true that that IP or domain is bad, but it doesn't mean that the probability of being spam is 500%. IMO, of course :-)
Even lower. SuSE must be using very altered values. And a badly trained Bayesian database: mine scores that same email at 5%, not 95%.
Bayes is a double-edged sword - you've got to be very particular about what you record as spam/ham. Especially if you're not just training your bayes filters for purely personal use. And you've got to be careful with cleaning up the database too.
Very true. I doubt the usefulness of site-wide Bayesian databases. Also, I disabled autolearn for the same reason. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD43LftTMYHG2NR9URAparAJ9srkz/xHpnMYZtfHX0js2Ko14DPwCfeC/I EEXgtHXrpvFZ6ha049moFtc= =YRIe -----END PGP SIGNATURE-----