Mailinglist Archive: opensuse (3767 mails)

< Previous Next >
Re: [SLE] Spam
  • From: "Carlos E. R." <robin1.listas@xxxxxxxxxx>
  • Date: Wed, 1 Feb 2006 16:15:24 +0100 (CET)
  • Message-id: <Pine.LNX.4.61.0602011349180.28255@xxxxxxxxxxxxxxxx>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The Wednesday 2006-02-01 at 10:50 +0100, Per Jessen wrote:

> Carlos E. R. wrote:
>
> >> SA should be perfectly capable of
> >> picking a lot of spam without using bayes.
> >
> > It should... but, for example in my case, it doesn't. There are may
> > spam emails I get that are not tagged by any other rule except bayes:
> > I had to increase the scoring so that a 99% mark by the bayes filter
> > gives 5 points. On the other hand, I'm getting more false positives
> > than a few months back (and not by the bayes filter); some of those
> > emails are also examined by a commercial filter of the mail server,
> > and that one gives a correct score.

(I forgot to say that many of those false positives are from newsletters).

>
> Interesting. I'm using spamassassin (amongst others) to provide a
> service for our customers, and I'm not currently using bayes. The
> hit-rate for SA is still very good - in the 95-98% range.

Looks good... it must depend on the kind of spam you receive, I suppose.
Also, I suppose you must be using the networks tests: it's true that they
flag a lot of spam, but sometimes they are unfair.


> > I don't like how SA is scoring spam recently...
>
> I have changed some of the scores, but usually to a lower level. And
> I'm still using 2.64, not the 3.x series.

I have the feeling that it worked better, at least the scoring. I'm using
spamassassin-3.1.0-0.1 myself.


Look, you can see some of the false positives in this very same mail list:
One chap got his (mistaken) unsubscription mail tagged as spam by the SuSE
list server:

X-Virus-Scanned: by amavisd-new at Relay1.suse.de
X-Spam-Status: Yes, hits=8.1 tagged_above=-20.0 required=5.0
tests=BAYES_95,
DNS_FROM_RFC_POST, DNS_FROM_RFC_WHOIS, HTML_90_100, HTML_MESSAGE,
MISSING_SUBJECT
X-Spam-Level: ********


The scoring would be something like (in my SA 3.1):

BAYES_95 0.0001 0.0001 3.0 3.0
DNS_FROM_RFC_POST 0 1.440 0 1.708 Envelope sender in postmaster.rfc-ignorant.org
DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org
HTML_90_100 0.584 0 0.567 0.113 Message is 90% to 100% HTML
HTML_MESSAGE 0.001 HTML included in message
MISSING_SUBJECT 1.729 1.345 2.035 1.816 Missing Subject: header

total ^^^^^ = 5.085

It doesn't add to those 8.1 points :-? Let's see with an 3.0.3 score set:


BAYES_95 0 0 3.514 3.0
DNS_FROM_RFC_POST 0 1.376 0 1.614
DNS_FROM_RFC_WHOIS 0 0.492 0 0.296
HTML_90_100 0.346 0.189 0.043 0.022
HTML_MESSAGE 0.001
MISSING_SUBJECT 1.109 1.570 1.282 1.226


Even lower. SuSE must be using very altered values. And a badly trained
Bayesian database: mine scores that same email at 5%, not 95%.


And version 2.63?

BAYES_95 did no exist, value not considered.
BAYES_90 0 0 2.454 2.101
BAYES_99 0 0 5.400 5.400

HTML_90_100 0.308 1.073 0 1.187
HTML_MESSAGE 0.160 0.001 0.100 0.100
DNS_FROM_RFC_POST --
DNS_FROM_RFC_WHOIS --
MISSING_SUBJECT --

Ha! Version 2.63 did not have many of those tests. It would have got 3.39,
ie, not spam.


However... it proves my point that the postmaster (ISP) being ignorant of
the RFC doesn't prove that their users send spam, and the goal of spam
filters is to remove spam, and only spam. IMO, of course ;-)

- --
Cheers,
Carlos Robinson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFD4NCXtTMYHG2NR9URAghyAJkBOX6FedaiToJzkadBpsJUSVl++ACfYmbC
vllFbKC2KfN0FCvVtVaxvOw=
=xpEB
-----END PGP SIGNATURE-----


< Previous Next >
Follow Ups