Hi guys, I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config. Summary in the tagged spam says: ---- Start SpamAssassin results 4.80 points, 4 required; * 0.1 -- BODY: HTML included in message * 0.5 -- BODY: Message is 80% to 90% HTML * 0.7 -- URI: Uses a username in a URL * 1.5 -- URI: URL contains username and (optional) password * 1.9 -- Date: is 12 to 24 hours after Received: date * 0.1 -- Message only has text/html MIME parts ---- End of SpamAssassin results My /etc/mail/spamassassin/local.cf: rewrite_subject 0 use_terse_report 1 use_bayes 1 auto_learn 0 I disabled auto_learn after reading of a lot of people having trouble with spamassassin becoming less accurate for some reason that's related to auto_learn - I'm still trying to find time to read all the why. I prefer to teach my spamassassin by hand too. ~/.spamassassin/user_prefs: required_hits 4 whitelist_from *@slashdot.org *@mweb.com *@mweb.co.za *@sterkinekor.co.za detailed_phrase_score 1 defang_mime 0 dns_available yes use_bayes 1 auto_learn 0 Both are exactly as they came with spamassassin, I just added my whitelists, bayes and auto_learn. This has worked fabulously with all versions of SUSE before, I'm not sure what's up now. Am I missing something important? Second problem is that a lot more spam slips through lately (absence of bayes might contribute) - I see a lot of spam that scores less than 1... Is this just a case of the spammers getting smarter? Thanks Hans
* Hans du Plooy
I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config. Summary in the tagged spam says:
a quickie, look in the ~/.spamassassin directory and check the dates on bayes_journal, bayes_seen and bayes_toks. -- Patrick Shanahan Registered Linux User #207535 http://wahoo.no-ip.org @ http://counter.li.org HOG # US1244711
* Hans du Plooy
[04-10-04 07:46]: I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config. Summary in the tagged spam says:
a quickie, look in the ~/.spamassassin directory and check the dates on bayes_journal, bayes_seen and bayes_toks. I wiped them two days ago, and sa-learned from my collected spam and ham, so
On Saturday 10 April 2004 15:30, Patrick Shanahan wrote: they should be accurate. Or not? I enabled rbl checks and razor, I hope that makes a difference. Thanks Hans
On Saturday 10 April 2004 05:38, Hans du Plooy wrote:
On Saturday 10 April 2004 15:30, Patrick Shanahan wrote:
* Hans du Plooy
[04-10-04 07:46]: I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config. Summary in the tagged spam says:
a quickie, look in the ~/.spamassassin directory and check the dates on bayes_journal, bayes_seen and bayes_toks.
I wiped them two days ago, and sa-learned from my collected spam and ham, so they should be accurate. Or not?
I enabled rbl checks and razor, I hope that makes a difference.
RBL checks are fine. Razor checks are useless, because spam assassin catches more spam than razor does (by a LOT). Submitting known spam to razor helps users who only use razor, but does nothing for you (because your system already called it spam. BTW... I've found that installing spam assassin via CPAN is always preferable to using the included RPMs because you can get updates more frequently. -- _____________________________________ John Andersen
RBL checks are fine. Razor checks are useless, because spam assassin catches more spam than razor does (by a LOT). That's interesting. But: isn't there a good chance that the handful of spam
On Sunday 11 April 2004 10:32, John Andersen wrote: that razor just catch, might just include something that spamassassin missed?
BTW... I've found that installing spam assassin via CPAN is always preferable to using the included RPMs because you can get updates more frequently. Is there a way to make an rpm out of this - I'd like to keep the rpm database happy.
Thanks Hans
On Sunday 11 April 2004 02:23, Hans du Plooy wrote:
On Sunday 11 April 2004 10:32, John Andersen wrote:
RBL checks are fine. Razor checks are useless, because spam assassin catches more spam than razor does (by a LOT).
That's interesting. But: isn't there a good chance that the handful of spam that razor just catch, might just include something that spamassassin missed?
Not in my expierence. Perhaps with some other spam, but all the spam that gets thru to me ALSO failed razor (because my SA calls razor). As i have it configured, razor alone does not generate enough points to rate a trip to /dev/null, so It would end up in my spam box. There I would notice if razor caught something that SA did not catch by any other means.
BTW... I've found that installing spam assassin via CPAN is always preferable to using the included RPMs because you can get updates more frequently.
Is there a way to make an rpm out of this - I'd like to keep the rpm database happy.
Not that I know of, because the CPAN install is totally automated. On the other hand, there is no reason to keep rpm aware of the change if you never install with rpm in the first place. -- _____________________________________ John Andersen
Is there a way to make an rpm out of this - I'd like to keep the rpm database happy. Not that I know of, because the CPAN install is totally automated. On the other hand, there is no reason to keep rpm aware of the change if you never install with rpm in the first place. I'm not bothered on my own machine. I just compiled the source and installed. I'll reinstall (with the new SUSE) in a few months in any case. I'm more concerned with SuSE running on my client's servers. There's one machine running SuSE 8.0 Pro, and I don't see it being upgraded or reinstalled before
John, thanks for all your replies. On Monday 12 April 2004 10:39, John Andersen wrote: the machine itself gets replaced. I'd like to keep the rpm database happy on machines like this so that if anything does go wrong I can, if neccessary, remove a package and start over. Back to the problem I'm having. I upgraded to Spamassassin-2.63 and adjusted my config files as follows: ~/.spamassassin/user_prefs: required_hits 4 [...snip whitelists...] use_terse_report 0 skip_rbl_checks 0 use_razor2 1 use_dcc 1 use_pyzor 1 detailed_phrase_score 1 defang_mime 0 use_bayes 1 dns_available yes auto_learn 0 /etc/mail/spamassassin/local.cf contains the same. Adding rbl checks seems to have made a difference, but I still don't see that bayes is being used: pts rule name description ---- ---------------------- -------------------------------------------------- 1.0 SUBJ_HAS_SPACES Subject contains lots of white space 1.3 GAPPY_SUBJECT Subject: contains G.a.p.p.y-T.e.x.t 0.5 FREE_TRIAL BODY: Free Trial 4.3 MONEY_BACK BODY: Money back guarantee 0.6 PENIS_ENLARGE2 BODY: Information on getting larger penis/breasts 0.4 HTML_FONT_INVISIBLE BODY: HTML font color is same as background 0.1 HTML_FONT_BIG BODY: HTML has a big font 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.3 HTML_TAG_BALANCE_BODY BODY: HTML has unbalanced "body" tags 0.1 HTML_70_80 BODY: Message is 70% to 80% HTML 0.1 HTML_FONTCOLOR_RED BODY: HTML font color is red 0.1 HTML_FONTCOLOR_UNKNOWN BODY: HTML font color is unknown to us 0.0 HTML_MESSAGE BODY: HTML included in message 0.1 HTML_FONTCOLOR_UNSAFE BODY: HTML font color not in safe 6x6x6 palette 0.2 SUBJ_HAS_UNIQ_ID Subject contains a unique ID 0.1 RCVD_IN_SORBS RBL: SORBS: sender is listed in SORBS [24.232.186.11 listed in dnsbl.sorbs.net] 0.1 RCVD_IN_RFCI RBL: Sent via a relay in ipwhois.rfc-ignorant.org [66.18.76.123 has inaccurate or missing WHOIS] [data at the RIR] [66.18.70.48 has inaccurate or missing WHOIS data] [at the RIR] 0.0 CLICK_BELOW Asks you to click below Thanks Hans
* Hans du Plooy
Adding rbl checks seems to have made a difference, but I still don't see that bayes is being used:
as I said before: look in the ~/.spamassassin directory and check the dates on bayes_journal, bayes_seen and bayes_toks. -- Patrick Shanahan Registered Linux User #207535 http://wahoo.no-ip.org @ http://counter.li.org HOG # US1244711
Dunno how I missed it. From /var/log/mail "Apr 12 21:36:10 sigaar spamd[5006]: debug: bayes: Not available for scanning, only 2 ham(s) in Bayes DB < 200" Ran a buch of the SLE list mails through sa-learn --ham and now its working - much better! Thanks for all who answered! Hans
John Andersen wrote: [snip]
Is there a way to make an rpm out of this - I'd like to keep the rpm database happy.
Not that I know of, because the CPAN install is totally automated. On the other hand, there is no reason to keep rpm aware of the change if you never install with rpm in the first place.
Dunno if this will help anyone: First, it's easy to upgrade spamassassin using rpm. Download the source tar.gz for the latest version (2.63), then use the spec file from the 2.60 src.rpm on SuSE ftp (or the original one on the install disks) and alter a few things, like the version name. Building against this spec file produces the two rpms (spamassassin 2.63 and perl-spamassassin 2.63). Second, to improve the spam-catching rate, I'd had to install a bunch of custom rules.cf files from the spamassassin sites. It doesn't take spammers long to work out ways round the current version of spamassassin, and then the hit rate starts to go down. Some of the third-party rule sets are very effective at dealing with the latest tricks. With the custom rules and a few of my own, my hit rate is now up there at nearly 100% again. I've never used razor because the addition of custom rules seems to have plugged the gap. Never had a problem with bayes db here. It's always worked. Just my 2 cents. :) Fish
On Monday 12 April 2004 04:34 pm, Mark Crean wrote: > Second, to improve the spam-catching rate, I'd had to install a bunch of
custom rules.cf files from the spamassassin sites. It doesn't take spammers long to work out ways round the current version of spamassassin, and then the hit rate starts to go down.
Some of the third-party rule sets are very effective at dealing with the latest tricks. With the custom rules and a few of my own, my hit rate is now up there at nearly 100% again. I've never used razor because the addition of custom rules seems to have plugged the gap.
Can you give a few pointers to those rules.cf files? And where do they get placed on your system? I don't really find any current rules.cf file on my system. -- +----------------------------------------------------------------------------+ + Bruce S. Marshall bmarsh@bmarsh.com Bellaire, MI 04/12/04 20:29 + +----------------------------------------------------------------------------+ "Where does it go? It doesn't matter. Flush it."
On Tue, 2004-04-13 at 01:35, Bruce Marshall wrote:
Can you give a few pointers to those rules.cf files? And where do they get placed on your system? I don't really find any current rules.cf file on my system.
See: http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm http://wiki.apache.org/spamassassin/ http://wiki.apache.org/spamassassin/CustomRulesets You just drop them into /etc/mail/spamassassin/. They all need to take the .cf file extension. Then stop spamassassin. Now as root run spamassassin -D --lint - this will test the new rules and make sure they compile OK. Study the output - it will tell you if there's something wrong. Assuming there isn't, restart with rcspamd start, etc. Doing the above will apply the rules site-wide, of course. If you have a set-up with many users running individual bayes dbs then you wouldn't do it this way, I guess. The standard rulesets that come with spamassassin are in /usr/share/spamassassin Works for me, anyway. :) Fish
On Tuesday 13 April 2004 02:27 pm, Mark Crean wrote:
On Tue, 2004-04-13 at 01:35, Bruce Marshall wrote:
Can you give a few pointers to those rules.cf files? And where do they get placed on your system? I don't really find any current rules.cf file on my system.
See:
http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm http://wiki.apache.org/spamassassin/ http://wiki.apache.org/spamassassin/CustomRulesets
You just drop them into /etc/mail/spamassassin/. They all need to take the .cf file extension. Then stop spamassassin. Now as root run spamassassin -D --lint - this will test the new rules and make sure they compile OK. Study the output - it will tell you if there's something wrong. Assuming there isn't, restart with rcspamd start, etc.
Doing the above will apply the rules site-wide, of course. If you have a set-up with many users running individual bayes dbs then you wouldn't do it this way, I guess.
The standard rulesets that come with spamassassin are in /usr/share/spamassassin
Works for me, anyway.
Thanks a bunch!! I did a lot of digging and googling on rules.cf but didn't come up with much.
:)
Fish
-- +----------------------------------------------------------------------------+ + Bruce S. Marshall bmarsh@bmarsh.com Bellaire, MI 04/13/04 15:22 + +----------------------------------------------------------------------------+ "Everything should be built top-down, except the first time."
On Tuesday 13 April 2004 02:27 pm, Mark Crean wrote:
Doing the above will apply the rules site-wide, of course. If you have a set-up with many users running individual bayes dbs then you wouldn't do it this way, I guess.
The standard rulesets that come with spamassassin are in /usr/share/spamassassin
Works for me, anyway.
Hmmm I'm suspicious of all the errors I was getting from those (and the standard) .cf files. I started to go through an comment out those rules with problems, but there were a ton of them and it was a long process. I got suspicious and ran the spamassassin -D --lint repeatedly a couple of times and got a different error each time, with no changes in between. Ever see anything like that? It's like the compile has a problem and stops with an error wherever and whenever the error occurs. -- +----------------------------------------------------------------------------+ + Bruce S. Marshall bmarsh@bmarsh.com Bellaire, MI 04/13/04 15:48 + +----------------------------------------------------------------------------+ "Depression is merely anger without the enthusiasm."
On Tue, 2004-04-13 at 20:50, Bruce Marshall wrote:
Hmmm I'm suspicious of all the errors I was getting from those (and the standard) .cf files. I started to go through an comment out those rules with problems, but there were a ton of them and it was a long process.
I got suspicious and ran the spamassassin -D --lint repeatedly a couple of times and got a different error each time, with no changes in between.
Ever see anything like that? It's like the compile has a problem and stops with an error wherever and whenever the error occurs.
Well I'm just a user but I guess the porpoise of the lint test is to find fishy code. FWIW I'm running the following rulesets from those sites in my previous post and they pass the lint test, though I may have changed or commented out any rules that failed - can't remember: antidrug.cf coding_html.cf drugs_diet.cf general_body.cf header_abuse.cf ratware.cf sare_adult.cf sare_biz_market.cf plus a few rules of my own in another file, mainly to catch the large amounts of spam I currently receive using symbols, accents and the like. A while spent monitoring the incoming rubbish showed that SA was picking up the new rules and using them. All this may easily be overkill or even a bad idea in the sense that it could skew the bayes process. It depends what spam you're getting to a degree. I do also run sa-learn spam/ham once a day (via a cron job) after checking mailboxes. :) Fish
Quoting Hans du Plooy
Hi guys,
I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config. Summary in the tagged spam says:
IIRC, Bayesian filtering does not kick in until either 200 or 400 messages are in the database. HTH, Jeffrey
On Saturday 10 April 2004 14:46, Hans du Plooy wrote:
Hi guys,
I'm running stock standard spamassassin with SUSE 9.0 Pro. It does not seem to be using bayes at all, even though I have specified in both my systemwide config and my user config.
[snip] Just noticed something else. /etc/sysconfig/spamd has two switches as default: -L = local only tests -a = use auto whitelist I guess that was taking priority over my config files. I changed that and will report back as soon as I have received any spam to see if it did kick in. I also added -D for debug messages - hopefully there's a clue in there. Thanks Hans
participants (6)
-
Bruce Marshall
-
Hans du Plooy
-
Jeffrey L. Taylor
-
John Andersen
-
Mark Crean
-
Patrick Shanahan