[opensuse] SPAM filter in openSUSE 11.1
I have an openSUSE 11.1 system that uses ProcMail for local mail delivery. Via that, I send mail to Spam Assassin with rules like this: # Check for SPAM # pipe files smaller than 256k through stand alone spamassassin :0fw * < 256000 | spamc # Spam :0: * ^X-Spam-Status: Yes ".SPAM/" All works great. Spam Assassin detects lots of messages as being spam. My question is how Spam Assassin can learn about messages that were missed as being SPAM? Since this is done outside of my mail reader (evolution), I can't just tell the reader to tell that it is SPAM and that the filter should learn. At least I would not imagine this would be the case. The reason I am doing it this way is that ProcMail is also putting the mail into Maildir folders for my Courier IMAP daemon. I access my mail from numerous mail programs at various locations, and using ProcMail/IMAP to sort mail once is really great. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
* Roger Oberholtzer <roger@opq.se> [09-07-09 10:05]:
I have an openSUSE 11.1 system that uses ProcMail for local mail delivery. Via that, I send mail to Spam Assassin with rules like this:
# Check for SPAM
# pipe files smaller than 256k through stand alone spamassassin :0fw * < 256000 | spamc
# Spam :0: * ^X-Spam-Status: Yes ".SPAM/"
All works great. Spam Assassin detects lots of messages as being spam.
My question is how Spam Assassin can learn about messages that were missed as being SPAM? Since this is done outside of my mail reader (evolution), I can't just tell the reader to tell that it is SPAM and that the filter should learn. At least I would not imagine this would be the case.
09:13 wahoo:~ > spamassassin --help SpamAssassin version 3.2.5 running on Perl version 5.10.0 For more information read the spamassassin man page. Usage: spamassassin [options] [ < *mailmessage* | *path* ... ] spamassassin -d [ < *mailmessage* | *path* ... ] spamassassin -r [ < *mailmessage* | *path* ... ] spamassassin -k [ < *mailmessage* | *path* ... ] spamassassin -W|-R [ < *mailmessage* | *path* ... ] Options: -L, --local Local tests only (no online tests) -r, --report Report message as spam -k, --revoke Revoke message as spam -d, --remove-markup Remove spam reports from a message -C path, --configpath=path, --config-file=path Path to standard configuration dir -p prefs, --prefspath=file, --prefs-file=file Set user preferences file --siteconfigpath=path Path for site configs (def: /etc/mail/spamassassin) --cf='config line' Additional line of configuration -x, --nocreate-prefs Don't create user preferences file -e, --exit-code Exit with a non-zero exit code if the tested message was spam --mbox read in messages in mbox format --mbx read in messages in UW mbx format -t, --test-mode Pipe message through and add extra report to the bottom --lint Lint the rule set: report syntax errors -W, --add-to-whitelist Add addresses in mail to persistent address whitelist --add-to-blacklist Add addresses in mail to persistent address blacklist -R, --remove-from-whitelist Remove all addresses found in mail from persistent address list --add-addr-to-whitelist=addr Add addr to persistent address whitelist --add-addr-to-blacklist=addr Add addr to persistent address blacklist --remove-addr-from-whitelist=addr Remove addr from persistent address list --ipv4only, --ipv4-only, --ipv4 Disable attempted use of ipv6 for DNS --progress Print progress bar -D, --debug [area=n,...] Print debugging messages -V, --version Print version -h, --help Print usage message I believe that you want to "report" as spam. It's not so hard :^) -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer said the following on 09/07/2009 10:04 AM:
[...]
All works great. Spam Assassin detects lots of messages as being spam.
My question is how Spam Assassin can learn about messages that were missed as being SPAM?
RTFM. See 'sa-learn'
Since this is done outside of my mail reader (evolution), I can't just tell the reader to tell that it is SPAM and that the filter should learn. At least I would not imagine this would be the case.
Yes you can. 1. Create a folder SPAM, and subfolders "isSpam", "false-positive' and 'false-negative'. 2. Have procmail put the spam it detects in "isSpam". 3. Manually put items that go into "isSpam" that aren't spam into "false-positive" 4. Manually put items that are in your inbox that *ARE* into "false-negative". 5. Have CRON run 'sa-learn' with '-spam' on 'false-negative' and then empty it 6. Have CRON run 'sa-learn' with '-ham' on 'false-positive' and then empty it. 6. Clean out "isSpam" periodically, either by CRON or manually. You are telling the automatically run learning mechaism of SpamAssassin ('sa-learn') what are examples of spam that it missed ('false-negative') and what it thought was spam that wasn't ('false-positive'). In practice, I would not rely 100% on SpamAssasin. I run blacklist and whitelist filters and sanitizer from procmail *before* SpamAssassin: INCLUDERC=$LIB/whitelist.rc INCLUDERC=$LIB/mailinglists.rc # Another kind of whitelist # INCLUDERC=$LIB/blacklist.rc INCLUDERC=$LIB/attachment.rc INCLUDERC=$LIB/likelyspam.rc INCLUDERC=$LIB/fonts.rc This reduces the load on SpamAssassin. You can find examples of the white/black/mailing/font filters in the many examples of how to use procmail by googling. -- Sed quis custodiet ipsos custodes? [Who watches the watchers?] -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2009-09-07 at 10:51 -0400, Anton Aylward wrote: [Useful step list deleted] Makes sense. My stumbling block is the fact that the mail is in MailDir folders. I guess I would need the cron job to: 1. Get the contents of some MailDir folders into, say, mbox format directories. This is to satisfy the input formats that sa_learn supports, which does not include MailDir. 2. Delete the contents of the MailDir folders that have served their purpose. 3. Run sa_learn as appropriate. I guess there are tools for moving the mail from one format to another that can be run from a cron job. How about deleting the IMAP folder contents after it is no longer needed (step 2)? The reason I want to stick with IMAP is that I could send wrongly classified messages to their folder from any mail client that I read from. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-07 at 17:23 +0200, Roger Oberholtzer wrote:
On Mon, 2009-09-07 at 10:51 -0400, Anton Aylward wrote:
[Useful step list deleted]
Makes sense. My stumbling block is the fact that the mail is in MailDir folders. I guess I would need the cron job to:
1. Get the contents of some MailDir folders into, say, mbox format directories. This is to satisfy the input formats that sa_learn supports, which does not include MailDir.
No need. This should work for maildir: sa-learn --spam --showdots --dir /pathto/spam But it is not documented, though: --dir Ignored; historical compatability --file Ignored; historical compatability --mbox Input sources are in mbox format - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqlWo4ACgkQtTMYHG2NR9X79wCfS0ZQDKGpaVx1lylRkmuYCLuv lGYAn1m1lf1RSJamXXOcP0Rf0sTdAYk2 =I/UL -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2009-09-07 at 21:09 +0200, Carlos E. R. wrote:
Makes sense. My stumbling block is the fact that the mail is in MailDir folders. I guess I would need the cron job to:
1. Get the contents of some MailDir folders into, say, mbox format directories. This is to satisfy the input formats that sa_learn supports, which does not include MailDir.
No need. This should work for maildir:
sa-learn --spam --showdots --dir /pathto/spam
To follow up: I made an IMAP folder called SPAM/falseNegative, where I put missed SPAM. The command to learn that these were really SPAM was then: sa-learn --spam --showdots --dir $MYHOME/Maildir/.SPAM.falseNegative/cur The program ran and said it had learned from the messages there. I will see about a CRON job after I see how to empty the folder when I am finished. Is there any sort of learning status summary that one can look at? I am curious how the learning proceeds. Of course, if it goes well, I will see less SPAM. But I am curious if SA can tell the rules it has. Then I could see what changes as I feed it learning material. Just a curious guy here. Thanks for all the help thus far. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2009-09-09 at 22:55 +0200, Roger Oberholtzer wrote:
sa-learn --spam --showdots --dir /pathto/spam
To follow up:
I made an IMAP folder called SPAM/falseNegative, where I put missed SPAM. The command to learn that these were really SPAM was then:
sa-learn --spam --showdots --dir $MYHOME/Maildir/.SPAM.falseNegative/cur
The program ran and said it had learned from the messages there.
I thought that "$MYHOME/Maildir/.SPAM.falseNegative/cur" would be enough. Perhaps not :-?
I will see about a CRON job after I see how to empty the folder when I am finished.
No hurry. The process knows which emails were processed and doesn't reread them - and a bunch of stored spam is needed if you want to retrain.
Is there any sort of learning status summary that one can look at?
Not that I know, but it could be; the manual should say. Or you can pipe the output of sa-learn to somewhere (just don't use "showdots" in that case. Try things like "--dump": cer@nimrodel:~> sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 26824 0 non-token data: nspam 0.000 0 12774 0 non-token data: nham 0.000 0 211497 0 non-token data: ntokens 0.000 0 1202213771 0 non-token data: oldest atime 0.000 0 1252528851 0 non-token data: newest atime 0.000 0 1252528858 0 non-token data: last journal sync atime 0.000 0 1252521698 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count but don't ask me what it means :-)
I am curious how the learning proceeds. Of course, if it goes well, I will see less SPAM. But I am curious if SA can tell the rules it has. Then I could see what changes as I feed it learning material. Just a curious guy here.
Me too :-) - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqoJr4ACgkQtTMYHG2NR9XIpgCgkY7xUSUbrkuOZ0Htogqukge9 EbIAnA8I9O6AZdXdh3YnENkYoNjNklAl =crGq -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 2009-09-10 at 00:05 +0200, Carlos E. R. wrote:
To follow up:
I made an IMAP folder called SPAM/falseNegative, where I put missed SPAM. The command to learn that these were really SPAM was then:
sa-learn --spam --showdots --dir $MYHOME/Maildir/.SPAM.falseNegative/cur
The program ran and said it had learned from the messages there.
I thought that "$MYHOME/Maildir/.SPAM.falseNegative/cur" would be enough. Perhaps not :-?
It was enough. Which is why I used it.
I will see about a CRON job after I see how to empty the folder when I am finished.
No hurry. The process knows which emails were processed and doesn't reread them - and a bunch of stored spam is needed if you want to retrain.
This is good. I will make a cron job. Then, when I think the folder is too big, I can empty it. I guess the job needs to ruin as root, which is how I ran sa-learn by hand. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer said the following on 09/10/2009 07:25 AM:
This is good. I will make a cron job. Then, when I think the folder is too big, I can empty it. I guess the job needs to ruin as root, which is how I ran sa-learn by hand.
I wouldn't do that if I were you. Sa-learn is not exactly the fastest kid on the block, and if you don't purge the old mail it still has to parse it all before getting to the new stuff. Having the CRON entry delete the candidates after they have been processed is not onerous. As for whether it runs as root or a user depends on other things. The way I have it configured its runs as a user. But then I try to run as little as possible as root, especially things like mail processing where the incoming may contain malware and traps. -- Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. --Albert Einstein -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
This is good. I will make a cron job. Then, when I think the folder is too big, I can empty it. I guess the job needs to ruin as root, which is how I ran sa-learn by hand. I wouldn't do that if I were you. Sa-learn is not exactly the fastest kid on the block,
That is being very kind [seriously]. Someone really needs to do a rewrite of SpamAssasin in something other than perl.
and if you don't purge the old mail it still has to parse it all before getting to the new stuff. Having the CRON entry delete the candidates after they have been processed is not onerous.
If you use Cyrus IMAPd you can use and ipurge event to automatically delete messages X number of days after they are received.
As for whether it runs as root or a user depends on other things. The way I have it configured its runs as a user.
Agree. Running sa-learn as non-superuser is pretty easy; just run spamd as non-root and run sa-learn as the same user. It is even easier if you pull messages via fetchmail rather than reading the filesystem directly (avoids yet more permission issues). fetchmail --all --silent --norewrite --keep --folder 'Shared Folders.departments.cis.spamreport' --mda 'sa-learn --spam' -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Adam Tauno Williams said the following on 09/10/2009 09:11 AM:
As for whether it runs as root or a user depends on other things. The way I have it configured its runs as a user.
Agree. Running sa-learn as non-superuser is pretty easy; just run spamd as non-root and run sa-learn as the same user. It is even easier if you pull messages via fetchmail rather than reading the filesystem directly (avoids yet more permission issues).
fetchmail --all --silent --norewrite --keep --folder 'Shared Folders.departments.cis.spamreport' --mda 'sa-learn --spam'
I won't say I'm paranoid, but I'm a UNIX/Linux user, not a Windows user, and if I can avoid using root, if I can compartmentalise, I see no reason why I shouldn't. After all, its not difficult, and in many ways its less difficult than doing everything as root. This isn't like chrooting, this is just running with reduced privileged. In addition, fetchmail lets me delegate mail out to many mailboxes that I've acquired on the course of my life, pull their contents via POP, and after processing make the results available to me on a local IMAP server. -- The universe we observe has precisely the properties we should expect if there is, at bottom, no design, no purpose, no evil and no good, nothing but blind pitiless indifference. -- Richard Dawkins, River Out of Eden: A Darwinian View of Life (1995), quoted from Victor J. Stenger, Has Science Found God? (2001) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2009-09-07 at 21:09 +0200, Carlos E. R. wrote:
sa-learn --spam --showdots --dir /pathto/spam
I just noticed that my SPAM info in the message (put there by SA) is: tests=BAYES_99,HTML_IMAGE_ONLY_04, HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,MIME_HTML_ONLY,TVD_SPACE_RATIO autolearn=no The autolearn item is odd. I am guessing this is not referring to sa-learn. If not that, how would SA do autolearn? If it does not get it right the first time, short of telling it when it was wrong, how would it learn? Or, does this mean that whatever it has learned in sa-learn will not be used? Is learning one thing, and using it another? If so, software gets more and more like people every day! -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer said the following on 09/09/2009 05:02 PM:
The autolearn item is odd. I am guessing this is not referring to sa-learn. If not that, how would SA do autolearn? If it does not get it right the first time, short of telling it when it was wrong, how would it learn?
Or, does this mean that whatever it has learned in sa-learn will not be used? Is learning one thing, and using it another? If so, software gets more and more like people every day!
Not quite sa-learn. Have you set up your configuration file? I suspect not. Try "man Mail::SpamAssassin::Conf" <quote> bayes_auto_learn ( 0 Φ 1 ) (default: 1) Whether SpamAssassin should automatically feed high-scoring mails or low-scoring mails, for non-spam) into its learning systems. The only learning system supported currently is a naive-Bayesian-style classifier. </quote> RTFM -- When you know that you're capable of dealing with whatever comes, you have the only security the world has to offer . -- Harry Browne -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2009-09-09 at 18:13 -0400, Anton Aylward wrote: ...
Try "man Mail::SpamAssassin::Conf"
It is not easy to know/guess that there is documentation hiding in there... - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqoMwoACgkQtTMYHG2NR9WJ0gCghQxIGNIPBFlnk7/Z6EZugBxy PrMAnAlIQYHPDw38AimIOhGAkLPN6Psr =8jdR -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. said the following on 09/09/2009 06:58 PM:
On Wednesday, 2009-09-09 at 18:13 -0400, Anton Aylward wrote:
...
Try "man Mail::SpamAssassin::Conf"
It is not easy to know/guess that there is documentation hiding in there...
I don't know about that. Anything "new" I RTFM. The "=no" looked like a configuration option. So I ran "apropos spamassassin" to see what FMs there were to R and looked at a few. That one first. I have "more" as my man page pager, so I used that to grep for "learn' Heck, any wannabe rocket scientists could have done it. -- Why do they put Braille on the drive-through bank machines? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anton Aylward wrote:
Carlos E. R. said the following on 09/09/2009 06:58 PM:
On Wednesday, 2009-09-09 at 18:13 -0400, Anton Aylward wrote:
Try "man Mail::SpamAssassin::Conf" It is not easy to know/guess that there is documentation hiding in there...
I don't know about that. Anything "new" I RTFM.
Heck, any wannabe rocket scientists could have done it.
Another way (this is Perl, TMTOWTDI :) is, go to <http://search.cpan.org/> and type 'spamassassin'. The doc you want comes up as the second hit, but if you happened to go to the first one, it appears in the See Also list. Heck, any wannabe Perl hacker could have done it :) Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2009-09-09 at 23:02 +0200, Roger Oberholtzer wrote:
On Mon, 2009-09-07 at 21:09 +0200, Carlos E. R. wrote:
sa-learn --spam --showdots --dir /pathto/spam
(don't forget to feed it both spam and ham mail, the results are much better)
I just noticed that my SPAM info in the message (put there by SA) is:
tests=BAYES_99,HTML_IMAGE_ONLY_04, HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,MIME_HTML_ONLY,TVD_SPACE_RATIO autolearn=no
When you see the "BAYES_" there, then it does bayes testing. It is one of the few places where you can know.
The autolearn item is odd. I am guessing this is not referring to sa-learn.
Right :-)
If not that, how would SA do autolearn? If it does not get it right the first time, short of telling it when it was wrong, how would it learn?
Or, does this mean that whatever it has learned in sa-learn will not be used? Is learning one thing, and using it another? If so, software gets more and more like people every day!
Simple: it means that it has (not) added that email to the bayes database; as spam if it was marked such, or a not spam if not. Ie, it learns on each email it scans as it is received. However, if it doesn't detect mail correctly, and that email is added to the database, then the bayes tests are further biased wrong each time. I prefer to disabled autolearn completely and do manual learning instead. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqoKMoACgkQtTMYHG2NR9W5igCfXLmjwSjJS827+/q/LmVRNNAc uzIAnAk3izRl0150+QhqkINVt/0A5g4a =KrQm -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 2009-09-10 at 00:14 +0200, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wednesday, 2009-09-09 at 23:02 +0200, Roger Oberholtzer wrote:
On Mon, 2009-09-07 at 21:09 +0200, Carlos E. R. wrote:
sa-learn --spam --showdots --dir /pathto/spam
(don't forget to feed it both spam and ham mail, the results are much better)
I have been monitoring my spam, but I do not see any false positives. It could be that I first sort mail from known sources, and the rest gets sent to SA. If I did SA first, perhaps some of my known senders would be falsely classified as spam. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer said the following on 09/10/2009 07:36 AM:
I have been monitoring my spam, but I do not see any false positives. It could be that I first sort mail from known sources, and the rest gets sent to SA.
That is good! It also reduces the load on SA. Of course you might want to feed "sa-learn -ham" your "good" inbox ;-) -- "The capacity to learn is a gift; The ability to learn is a skill; The willingness to learn is a choice." - Brain Herbert, -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 10 September 2009, Roger Oberholtzer wrote:
On Thu, 2009-09-10 at 00:14 +0200, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wednesday, 2009-09-09 at 23:02 +0200, Roger Oberholtzer wrote:
On Mon, 2009-09-07 at 21:09 +0200, Carlos E. R. wrote:
sa-learn --spam --showdots --dir /pathto/spam
(don't forget to feed it both spam and ham mail, the results are much better)
I have been monitoring my spam, but I do not see any false positives. It could be that I first sort mail from known sources, and the rest gets sent to SA. If I did SA first, perhaps some of my known senders would be falsely classified as spam.
You can add known senders to your user_prefs list in the .spamassassin folder. I've got a couple that will get marked spam, so this is the way to let spamassassin do it's thing first, and then sort the mail. Mike -- Powered by SuSE 11.0 Kernel 2.6.25 KDE 3.5 Kmail 1.9 1:51pm up 7 days 4:11, 2 users, load average: 0.10, 0.13, 0.09 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anton Aylward wrote:
Roger Oberholtzer said the following on 09/07/2009 10:04 AM:
[...]
All works great. Spam Assassin detects lots of messages as being spam.
My question is how Spam Assassin can learn about messages that were missed as being SPAM?
RTFM. See 'sa-learn'
Since this is done outside of my mail reader (evolution), I can't just tell the reader to tell that it is SPAM and that the filter should learn. At least I would not imagine this would be the case.
Yes you can.
1. Create a folder SPAM, and subfolders "isSpam", "false-positive' and 'false-negative'.
2. Have procmail put the spam it detects in "isSpam".
3. Manually put items that go into "isSpam" that aren't spam into "false-positive"
4. Manually put items that are in your inbox that *ARE* into "false-negative".
5. Have CRON run 'sa-learn' with '-spam' on 'false-negative' and then empty it
6. Have CRON run 'sa-learn' with '-ham' on 'false-positive' and then empty it.
6. Clean out "isSpam" periodically, either by CRON or manually.
You are telling the automatically run learning mechaism of SpamAssassin ('sa-learn') what are examples of spam that it missed ('false-negative') and what it thought was spam that wasn't ('false-positive').
In practice, I would not rely 100% on SpamAssasin. I run blacklist and whitelist filters and sanitizer from procmail *before* SpamAssassin:
INCLUDERC=$LIB/whitelist.rc INCLUDERC=$LIB/mailinglists.rc # Another kind of whitelist # INCLUDERC=$LIB/blacklist.rc INCLUDERC=$LIB/attachment.rc INCLUDERC=$LIB/likelyspam.rc INCLUDERC=$LIB/fonts.rc
This reduces the load on SpamAssassin.
You can find examples of the white/black/mailing/font filters in the many examples of how to use procmail by googling.
These are all very good instructions. But watch out: If AMAVIS is involved it will all be for not. AMAVIS uses its own site-wide spamassassin Bayes database, rather than user databases, and the above sa-learn have to be pointed to that database instead of the users. Secondly, if you use imap on your machine (if not, WHY not?) the directories you need to scan are not in the user's home directory but rather in the Imap areas, usually /var/spool/imap/user if using cyrus. I plan to drop AMAVIS all together in my next major install, as this site wide spamassassin nonsense is totally wrong. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-07 at 12:26 -0700, John Andersen wrote: ...
I plan to drop AMAVIS all together in my next major install, as this site wide spamassassin nonsense is totally wrong.
What I do is dissable spam checking in amavis, and leave the antivirus part. Then I use spamassassin instead. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqlY4oACgkQtTMYHG2NR9U4jgCcCATxntNWMhhfuBATEhWSk97b hoIAn1s1diE/P5T8BSNbxYDhjofUggtO =bweN -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
On Monday, 2009-09-07 at 12:26 -0700, John Andersen wrote:
...
I plan to drop AMAVIS all together in my next major install, as this site wide spamassassin nonsense is totally wrong.
What I do is dissable spam checking in amavis, and leave the antivirus part. Then I use spamassassin instead.
-- Cheers, Carlos E. R.
That works. The other down side of Amavis is that it doubles the work load on Postfix, by passing all mail thru postfix twice, once thru postfix to Amavis, then back from amavis thru postfix again to the user. I was planning to see if I could reduce this by just having virus scanning in my procmail recipe, but quite frankly I haven't had time to get that far. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-07 at 13:47 -0700, John Andersen wrote:
What I do is dissable spam checking in amavis, and leave the antivirus part. Then I use spamassassin instead.
That works.
The other down side of Amavis is that it doubles the work load on Postfix, by passing all mail thru postfix twice, once thru postfix to Amavis, then back from amavis thru postfix again to the user.
Yes, but is negligible compared to the cpu usage by amavis :-p
I was planning to see if I could reduce this by just having virus scanning in my procmail recipe, but quite frankly I haven't had time to get that far.
Yes, that's possible. But remember that amavis does antivirus and malware checking, too, whereas spamassassin doesn't. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqlfZkACgkQtTMYHG2NR9VxSwCfUYicfLkenhErAPnhzLxc3C6h ZOgAoIocqtqB+PUopUv+y7I+uK74/pDu =+m6L -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
I was planning to see if I could reduce this by just having virus scanning in my procmail recipe, but quite frankly I haven't had time to get that far.
Yes, that's possible. But remember that amavis does antivirus and malware checking, too, whereas spamassassin doesn't.
But Procmail could invoke av scanners, no? May not be the best place, since users have some control over their own procmail. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
* John Andersen <jsamyth@gmail.com> [09-07-09 18:05]:
But Procmail could invoke av scanners, no?
May not be the best place, since users have some control over their own procmail.
Only after the fact. User cannot control the system procmail, only his own recipes in his own workspace, after system's procmail has handed off the mail or done a DROPPRIVS. -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-07 at 15:03 -0700, John Andersen wrote:
Carlos E. R. wrote:
I was planning to see if I could reduce this by just having virus scanning in my procmail recipe, but quite frankly I haven't had time to get that far.
Yes, that's possible. But remember that amavis does antivirus and malware checking, too, whereas spamassassin doesn't.
But Procmail could invoke av scanners, no?
Yes... but two "buts". One, don't do if you have users, which might dissable the AV checking. Two, not all the AV do direct mail checking. SA first separates the parts of the emails, saving the attached files somewhere, and then calls the AV on those files. And can call several AV programs to check the same file (no AV detects all malware). That's a bit of coding.
May not be the best place, since users have some control over their own procmail.
Right. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqliysACgkQtTMYHG2NR9Xk6ACeNhigieOdn31m2uZAuTHnkasA NhgAnjylBzMNOvxVXcK9zPdDjUANTea5 =JvEv -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
Two, not all the AV do direct mail checking. SA first separates the parts of the emails,
I assume you meant Amavis does this, not SA. But I see your point here. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2009-09-07 at 16:38 -0700, John Andersen wrote:
Carlos E. R. wrote:
Two, not all the AV do direct mail checking. SA first separates the parts of the emails,
I assume you meant Amavis does this, not SA. But I see your point here.
Er... yes, right :-) - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkqlnaQACgkQtTMYHG2NR9UHvACeJP7MyN+uhtrOLoRqPHffV3Lf +zIAoITAZwoB2I6l13pS2IK6BdNNXWmL =o2vn -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
John Andersen wrote:
Carlos E. R. wrote:
On Monday, 2009-09-07 at 12:26 -0700, John Andersen wrote:
...
I plan to drop AMAVIS all together in my next major install, as this site wide spamassassin nonsense is totally wrong. What I do is dissable spam checking in amavis, and leave the antivirus part. Then I use spamassassin instead.
That works.
The other down side of Amavis is that it doubles the work load on Postfix, by passing all mail thru postfix twice, once thru postfix to Amavis, then back from amavis thru postfix again to the user.
I was planning to see if I could reduce this by just having virus scanning in my procmail recipe, but quite frankly I haven't had time to get that far.
Frankly, you don't win much by skipping the last Postfix hop and loose instead a lot of transparency. Pure SMTP transports don't use much resources. I prefer the queue management of Postfix compared to scripts and procmail. You can test this when you configure Amavisd-new to skip all tests. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
John Andersen wrote:
Secondly, if you use imap on your machine (if not, WHY not?)
To avoid the space management and network bandwidth issues? At least that's why I've stuck to POP3 so far. /Per -- Per Jessen, Zürich (14.6°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2009-09-07 at 16:04 +0200, Roger Oberholtzer wrote: After all the excellent information in the thread, I see that my solution was not needed. I use evolution. If you mark a message as junk, guess what evolution does: it runs sa_learn with it (in a background thread, so work continues without interruption). I peeked at the process list, and there it was (only when evolution said it was processing the junk mail - after which it disappeared). I am glad I learned what I did so I understand the mechanism. But I think it will end up being purely educational. At least as long as I use evolution and do not mind limiting flagging missed SPAM only in that reader. Once again, thanks to all for for the help. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (11)
-
Adam Tauno Williams
-
Anton Aylward
-
Carlos E. R.
-
Cristian Rodríguez
-
Dave Howorth
-
John Andersen
-
Mike
-
Patrick Shanahan
-
Per Jessen
-
Roger Oberholtzer
-
Sandy Drobic