I'm getting hammered by a lot of emails which have different subject lines and a different sender. Kmail displays it in plain text, does not say that it is a html message which can be viewed if you click here but there is html view available if scroll down to the description box, below the email. Clamav does not pick any virus. Spamassassin and bogofilter although set to learn, do not pick them up. (All uptodate). The message of the email starts with something on the lines of: "ListPrice: $550.00 OurPrice: $69.95 YouSave: $480.05 ( 87%) Availability: Available for INSTANT download!" followed by several references to microsoft type products, no links, so I assume you contact the sender (which suggests the email address is valid and not one from a compromised machine). I'm running suse 10.0 OSS on a standalone machine. Using Kmail. (Direct access via kmail to pop and stmp from my isp). Do I set a filter on body or message, containing the listprice etc quote from above, to mark as spam? Or could that compromise some valid emails where I have requested details on say holidays etc from somewhere else? Peter C
Peter Collier wrote:
I'm running suse 10.0 OSS on a standalone machine. Using Kmail. (Direct access via kmail to pop and stmp from my isp). Do I set a filter on body or message, containing the listprice etc quote from above, to mark as spam? Or could that compromise some valid emails where I have requested details on say holidays etc from somewhere else?
Can't help much with Kmail I'm afraid but can sing the praises of Thunderbird on this one.I was getting a lot of spam messages like this but you can easily teach Thunderbird (press the Junk button). The result is that I don't see them -except when I check my Junk folder. Cheers, Colin
On Tuesday 31 January 2006 10:05, Peter Collier wrote:
I'm getting hammered by a lot of emails which have different subject lines and a different sender. Kmail displays it in plain text, does not say that it is a html message which can be viewed if you click here but there is html view available if scroll down to the description box, below the email.
Hi could it be that you get a text and a html version? Then the default behaviour of kmail is to display the text message. You could unset this behaviour globally (which I would not recommend, certainly not for spam) or on a folder-by-folder base. I do the latter for certain few trusted email senders after moving them to a folder 'HTML' with a 'From' filter. I can then read the mails directly in HTML format.
"ListPrice: $550.00
I'm running suse 10.0 OSS on a standalone machine. Using Kmail. (Direct access via kmail to pop and stmp from my isp). Do I set a filter on
I do the same...
body or message, containing the listprice etc quote from above, to mark as spam? Or could that compromise some valid emails where I have requested details on say holidays etc from somewhere else?
Hard to say - depends on the emails you have on the system. You could do a filtering and move the messages into a folder 'Junk' or so instead of deleting them (I do that with bogofilter classified spam). Then you could still check if you 'misclassified' (is that a word?) important emails. If you do a filtering on 'body' I suspect it will slow down kmail. Anyway I noticed that bogofilter picks up spam mails more or less quickly after you have them classified manually as spam a few times... gl -- Günter Lichtenberg ========>mailto:lichten@sron.nl
On Tuesday 31 January 2006 12:52, Guenter Lichtenberg wrote:
On Tuesday 31 January 2006 10:05, Peter Collier wrote:
I'm getting hammered by a lot of emails which have different subject lines and a different sender. Kmail displays it in plain text, does not say that it is a html message which can be viewed if you click here but there is html view available if scroll down to the description box, below the email.
Hi could it be that you get a text and a html version? Then the default behaviour of kmail is to display the text message. You could unset this behaviour globally (which I would not recommend, certainly not for spam) or on a folder-by-folder base. I do the latter for certain few trusted email senders after moving them to a folder 'HTML' with a 'From' filter. I can then read the mails directly in HTML format.
"ListPrice: $550.00
I'm running suse 10.0 OSS on a standalone machine. Using Kmail. (Direct access via kmail to pop and stmp from my isp). Do I set a filter on
I do the same...
body or message, containing the listprice etc quote from above, to mark as spam? Or could that compromise some valid emails where I have requested details on say holidays etc from somewhere else?
Hard to say - depends on the emails you have on the system. You could do a filtering and move the messages into a folder 'Junk' or so instead of deleting them (I do that with bogofilter classified spam). Then you could still check if you 'misclassified' (is that a word?) important emails. If you do a filtering on 'body' I suspect it will slow down kmail.
Anyway I noticed that bogofilter picks up spam mails more or less quickly after you have them classified manually as spam a few times...
gl
OK, thanks for the tips everyone. I'll keep classifying as spam for now and see how it picks them up. I do have kmail set up to move spam as unread, to a spam folder, where I can check to make sure I don't want them. I've done that because once in a blue moon, an odd spam email has some interest. Peter C
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Tuesday 2006-01-31 at 09:05 -0000, Peter Collier wrote:
Clamav does not pick any virus. Spamassassin and bogofilter although set to learn, do not pick them up. (All uptodate).
SA needs some hundreds of spam email in order to learn efectively. Perhaps 300..500. In my system it is very efective, but I don't use kmail. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD314AtTMYHG2NR9URAgVtAJ9QYcE0a7jWDVGMwF/MT+AvPb5OrwCggDDS 3Ds/yjV/bXoBzHjHaVjVnNo= =F1XO -----END PGP SIGNATURE-----
Carlos E. R. wrote:
SA needs some hundreds of spam email in order to learn efectively. Perhaps 300..500.
That's only if you use the bayes part. SA should be perfectly capable of picking a lot of spam without using bayes. /Per Jessen, Zürich -- http://www.spamchek.com/ - managed anti-spam and anti-virus solution. Let us analyse your spam- and virus-threat - up to 2 months for free.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Tuesday 2006-01-31 at 16:09 +0100, Per Jessen wrote:
SA needs some hundreds of spam email in order to learn efectively. Perhaps 300..500.
That's only if you use the bayes part.
Of course. But the OP mentioned "learn", and that means bayes.
SA should be perfectly capable of picking a lot of spam without using bayes.
It should... but, for example in my case, it doesn't. There are may spam emails I get that are not tagged by any other rule except bayes: I had to increase the scoring so that a 99% mark by the bayes filter gives 5 points. On the other hand, I'm getting more false positives than a few months back (and not by the bayes filter); some of those emails are also examined by a commercial filter of the mail server, and that one gives a correct score. I don't like how SA is scoring spam recently... - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD4AottTMYHG2NR9URAgpKAJ4jMQe1BZQQpg3RxLeVG/WKZzrJUQCdFgUK TX+LNAHyBWJ2VShjL/XZxWY= =IXMr -----END PGP SIGNATURE-----
Carlos E. R. wrote:
SA should be perfectly capable of picking a lot of spam without using bayes.
It should... but, for example in my case, it doesn't. There are may spam emails I get that are not tagged by any other rule except bayes: I had to increase the scoring so that a 99% mark by the bayes filter gives 5 points. On the other hand, I'm getting more false positives than a few months back (and not by the bayes filter); some of those emails are also examined by a commercial filter of the mail server, and that one gives a correct score.
Interesting. I'm using spamassassin (amongst others) to provide a service for our customers, and I'm not currently using bayes. The hit-rate for SA is still very good - in the 95-98% range.
I don't like how SA is scoring spam recently...
I have changed some of the scores, but usually to a lower level. And I'm still using 2.64, not the 3.x series. /Per Jessen, Zürich
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Wednesday 2006-02-01 at 10:50 +0100, Per Jessen wrote:
Carlos E. R. wrote:
SA should be perfectly capable of picking a lot of spam without using bayes.
It should... but, for example in my case, it doesn't. There are may spam emails I get that are not tagged by any other rule except bayes: I had to increase the scoring so that a 99% mark by the bayes filter gives 5 points. On the other hand, I'm getting more false positives than a few months back (and not by the bayes filter); some of those emails are also examined by a commercial filter of the mail server, and that one gives a correct score.
(I forgot to say that many of those false positives are from newsletters).
Interesting. I'm using spamassassin (amongst others) to provide a service for our customers, and I'm not currently using bayes. The hit-rate for SA is still very good - in the 95-98% range.
Looks good... it must depend on the kind of spam you receive, I suppose. Also, I suppose you must be using the networks tests: it's true that they flag a lot of spam, but sometimes they are unfair.
I don't like how SA is scoring spam recently...
I have changed some of the scores, but usually to a lower level. And I'm still using 2.64, not the 3.x series.
I have the feeling that it worked better, at least the scoring. I'm using spamassassin-3.1.0-0.1 myself. Look, you can see some of the false positives in this very same mail list: One chap got his (mistaken) unsubscription mail tagged as spam by the SuSE list server: X-Virus-Scanned: by amavisd-new at Relay1.suse.de X-Spam-Status: Yes, hits=8.1 tagged_above=-20.0 required=5.0 tests=BAYES_95, DNS_FROM_RFC_POST, DNS_FROM_RFC_WHOIS, HTML_90_100, HTML_MESSAGE, MISSING_SUBJECT X-Spam-Level: ******** The scoring would be something like (in my SA 3.1): BAYES_95 0.0001 0.0001 3.0 3.0 DNS_FROM_RFC_POST 0 1.440 0 1.708 Envelope sender in postmaster.rfc-ignorant.org DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org HTML_90_100 0.584 0 0.567 0.113 Message is 90% to 100% HTML HTML_MESSAGE 0.001 HTML included in message MISSING_SUBJECT 1.729 1.345 2.035 1.816 Missing Subject: header total ^^^^^ = 5.085 It doesn't add to those 8.1 points :-? Let's see with an 3.0.3 score set: BAYES_95 0 0 3.514 3.0 DNS_FROM_RFC_POST 0 1.376 0 1.614 DNS_FROM_RFC_WHOIS 0 0.492 0 0.296 HTML_90_100 0.346 0.189 0.043 0.022 HTML_MESSAGE 0.001 MISSING_SUBJECT 1.109 1.570 1.282 1.226 Even lower. SuSE must be using very altered values. And a badly trained Bayesian database: mine scores that same email at 5%, not 95%. And version 2.63? BAYES_95 did no exist, value not considered. BAYES_90 0 0 2.454 2.101 BAYES_99 0 0 5.400 5.400 HTML_90_100 0.308 1.073 0 1.187 HTML_MESSAGE 0.160 0.001 0.100 0.100 DNS_FROM_RFC_POST -- DNS_FROM_RFC_WHOIS -- MISSING_SUBJECT -- Ha! Version 2.63 did not have many of those tests. It would have got 3.39, ie, not spam. However... it proves my point that the postmaster (ISP) being ignorant of the RFC doesn't prove that their users send spam, and the goal of spam filters is to remove spam, and only spam. IMO, of course ;-) - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD4NCXtTMYHG2NR9URAghyAJkBOX6FedaiToJzkadBpsJUSVl++ACfYmbC vllFbKC2KfN0FCvVtVaxvOw= =xpEB -----END PGP SIGNATURE-----
Carlos E. R. wrote:
(I forgot to say that many of those false positives are from newsletters).
Same here. I'm in the process of building bayes-style filters that are meant for recognising just newsletters. That way I'll be able to add perhaps a couple of points, stopping a newsletter from ending up as a false positive.
Looks good... it must depend on the kind of spam you receive, I suppose. Also, I suppose you must be using the networks tests: it's true that they flag a lot of spam, but sometimes they are unfair.
Yep, I'm using network tests, my own blacklists, honeypots etc.
BAYES_95 0.0001 0.0001 3.0 3.0 DNS_FROM_RFC_POST 0 1.440 0 1.708 Envelope sender in postmaster.rfc-ignorant.org DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org
I don't use rfc-ignorant other than as an indicator of a possibly dodgy server. Given that number of poorly configured mail-servers, using rfc-ignorant is a very agressive step, IMHO.
Even lower. SuSE must be using very altered values. And a badly trained Bayesian database: mine scores that same email at 5%, not 95%.
Bayes is a double-edged sword - you've got to be very particular about what you record as spam/ham. Especially if you're not just training your bayes filters for purely personal use. And you've got to be careful with cleaning up the database too.
However... it proves my point that the postmaster (ISP) being ignorant of the RFC doesn't prove that their users send spam,
Totally agree. /Per Jessen, Zürich -- http://www.spamchek.com/ - managed anti-spam and anti-virus solution. Let us analyse your spam- and virus-threat - up to 2 months for free.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Friday 2006-02-03 at 12:33 +0100, Per Jessen wrote:
Carlos E. R. wrote:
(I forgot to say that many of those false positives are from newsletters).
Same here. I'm in the process of building bayes-style filters that are meant for recognising just newsletters. That way I'll be able to add perhaps a couple of points, stopping a newsletter from ending up as a false positive.
Ah... I simply use "whitelist_from" in the file .spamassassin/user_prefs. It is faster to use, for a limited number of senders. The snag is that I could get faked newsletters instead.
DNS_FROM_RFC_WHOIS 0 0.879 0 1.447 Envelope sender in whois.rfc-ignorant.org
I don't use rfc-ignorant other than as an indicator of a possibly dodgy server. Given that number of poorly configured mail-servers, using rfc-ignorant is a very agressive step, IMHO.
And my HO too. Unfortunately, they are active by default in the spamassassin configuration that SuSE (and us users) uses. Also, I think that quite some of those tests are redundant: if one RBL says that an IP or a domain is bad, some others will say the same. But that, I think, doesn't necessarily mean that the email spammines is higher. Those scores should not be arithmetically added, but some other type of algorithm should be used. Don't know what but kind of: only A says it's bad --> X points only B says it's bad --> Y points A and B says it's bad --> W points. where W should be perhaps the average or the maximum of (X, Y), but not the sum. The result of an IP being listed on a dozen black lists could mean that all think the same, or that all copy data; I rather think it means that it is very probably true that that IP or domain is bad, but it doesn't mean that the probability of being spam is 500%. IMO, of course :-)
Even lower. SuSE must be using very altered values. And a badly trained Bayesian database: mine scores that same email at 5%, not 95%.
Bayes is a double-edged sword - you've got to be very particular about what you record as spam/ham. Especially if you're not just training your bayes filters for purely personal use. And you've got to be careful with cleaning up the database too.
Very true. I doubt the usefulness of site-wide Bayesian databases. Also, I disabled autolearn for the same reason. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD43LftTMYHG2NR9URAparAJ9srkz/xHpnMYZtfHX0js2Ko14DPwCfeC/I EEXgtHXrpvFZ6ha049moFtc= =YRIe -----END PGP SIGNATURE-----
Carlos E. R. wrote:
Also, I think that quite some of those tests are redundant: if one RBL says that an IP or a domain is bad, some others will say the same. But that, I think, doesn't necessarily mean that the email spammines is higher. Those scores should not be arithmetically added, but some other type of algorithm should be used. Don't know what but kind of:
only A says it's bad --> X points only B says it's bad --> Y points A and B says it's bad --> W points.
where W should be perhaps the average or the maximum of (X, Y), but not the sum.
You can do that quite easily with SA - well, relatively easy. You'd use meta rules, ie. rules that only work as part of other rules. Bit too much to get into here, but it's really not too complicated. /Per Jessen, Zürich -- http://www.spamchek.com/ - managed anti-spam and anti-virus solution. Let us analyse your spam- and virus-threat - up to 2 months for free.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Friday 2006-02-03 at 18:27 +0100, Per Jessen wrote:
only A says it's bad --> X points only B says it's bad --> Y points A and B says it's bad --> W points.
where W should be perhaps the average or the maximum of (X, Y), but not the sum.
You can do that quite easily with SA - well, relatively easy. You'd use meta rules, ie. rules that only work as part of other rules. Bit too much to get into here, but it's really not too complicated.
Hum! I'll wait till I'm bored ;-) - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFD5BPJtTMYHG2NR9URAi1FAJ9R2Mg5w0eqZZ1U+90mh0YH/Tau6ACfe7XE GjAuM4NfM61gIrc78j6Sv1Y= =fSsH -----END PGP SIGNATURE-----
participants (5)
-
Carlos E. R.
-
Colin Fraser
-
Guenter Lichtenberg
-
Per Jessen
-
Peter Collier