Mailinglist Archive: opensuse (3767 mails)

< Previous Next >
Re: [SLE] Spam
  • From: "Carlos E. R." <robin1.listas@xxxxxxxxxx>
  • Date: Fri, 3 Feb 2006 16:12:29 +0100 (CET)
  • Message-id: <Pine.LNX.4.61.0602031553560.3684@xxxxxxxxxxxxxxxx>
Hash: SHA1

The Friday 2006-02-03 at 12:33 +0100, Per Jessen wrote:

> Carlos E. R. wrote:
> > (I forgot to say that many of those false positives are from
> > newsletters).
> Same here. I'm in the process of building bayes-style filters that are
> meant for recognising just newsletters. That way I'll be able to add
> perhaps a couple of points, stopping a newsletter from ending up as a
> false positive.

Ah... I simply use "whitelist_from" in the file .spamassassin/user_prefs.
It is faster to use, for a limited number of senders. The snag is that I
could get faked newsletters instead.

> > DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in
> >
> I don't use rfc-ignorant other than as an indicator of a possibly dodgy
> server. Given that number of poorly configured mail-servers, using
> rfc-ignorant is a very agressive step, IMHO.

And my HO too. Unfortunately, they are active by default in the
spamassassin configuration that SuSE (and us users) uses.

Also, I think that quite some of those tests are redundant: if one RBL
says that an IP or a domain is bad, some others will say the same. But
that, I think, doesn't necessarily mean that the email spammines is higher.
Those scores should not be arithmetically added, but some other type of
algorithm should be used. Don't know what but kind of:

only A says it's bad --> X points
only B says it's bad --> Y points
A and B says it's bad --> W points.

where W should be perhaps the average or the maximum of (X, Y), but not
the sum.

The result of an IP being listed on a dozen black lists could mean that
all think the same, or that all copy data; I rather think it means that it
is very probably true that that IP or domain is bad, but it doesn't mean
that the probability of being spam is 500%.

IMO, of course :-)

> > Even lower. SuSE must be using very altered values. And a badly
> > trained Bayesian database: mine scores that same email at 5%, not 95%.
> Bayes is a double-edged sword - you've got to be very particular about
> what you record as spam/ham. Especially if you're not just training
> your bayes filters for purely personal use. And you've got to be
> careful with cleaning up the database too.

Very true.

I doubt the usefulness of site-wide Bayesian databases. Also, I
disabled autolearn for the same reason.

- --
Carlos Robinson
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Made with pgp4pine 1.76


< Previous Next >
Follow Ups