Hi, It seems that the latest spate of spam is loaded with seemingly arbitrary words to defeat the Baysian processes of SpamAssassin, such as "thayer earsplitting fatal scatterbrain landfill circumstance configuration dowling system smokescreen incompletion oscillatory burgeon boorish mallow repudiate elysee bermuda cohort upbring unix bitnet depressive enigma" I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right? Anyone heard of this and/or have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything? TIA
On Wed December 31 2003 09:51 am, Nick Selby wrote:
Hi, It seems that the latest spate of spam is loaded with seemingly arbitrary words to defeat the Baysian processes of SpamAssassin, such as
"thayer earsplitting fatal scatterbrain landfill circumstance configuration dowling system smokescreen incompletion oscillatory burgeon boorish mallow repudiate elysee bermuda cohort upbring unix bitnet depressive enigma"
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right? Anyone heard of this and/or have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything?
TIA
Well thank you!! I never could figure out why they were dumping all that crap into their messages. That's a good explanation. And yes, it is a regular occurrence here for the spam that does make it through. -- +----------------------------------------------------------------------------+ + Bruce S. Marshall bmarsh@bmarsh.com Bellaire, MI 12/31/03 11:14 + +----------------------------------------------------------------------------+ "The best way to win an argument is to begin by being right." --Jill Ruckelshaus
Hi Bruce / Nick, On Wed, 31 Dec 2003 11:15:05 -0500 UTC (12/31/2003, 10:15 AM -0600 UTC my time), Bruce Marshall wrote: B> On Wed December 31 2003 09:51 am, Nick Selby wrote:
It seems that the latest spate of spam is loaded with seemingly arbitrary words to defeat the Baysian processes of SpamAssassin, such as
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Anyone ... have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything?
Yes, you can turn on RBL checking in your current SA, if it is turned off. Using Razor and spamcop's RBLs will give you the most current, up-to-date, IP address blocks, as a front-line defense. B> Well thank you!! I never could figure out why they were dumping all that B> crap into their messages. That's a good explanation. And yes, it is a B> regular occurrence here for the spam that does make it through. Agreed, this is the latest adaptation made by spammers to bypass Bays filtering. -- Gary A snooze button is a poor substitute for no alarm clock at all.
On Wed, 2003-12-31 at 07:33, Gary wrote:
Hi Bruce / Nick,
On Wed, 31 Dec 2003 11:15:05 -0500 UTC (12/31/2003, 10:15 AM -0600 UTC my time), Bruce Marshall wrote:
B> On Wed December 31 2003 09:51 am, Nick Selby wrote:
It seems that the latest spate of spam is loaded with seemingly arbitrary words to defeat the Baysian processes of SpamAssassin, such as
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Anyone ... have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything?
Yes, you can turn on RBL checking in your current SA, if it is turned off. Using Razor and spamcop's RBLs will give you the most current, up-to-date, IP address blocks, as a front-line defense.
B> Well thank you!! I never could figure out why they were dumping all that B> crap into their messages. That's a good explanation. And yes, it is a B> regular occurrence here for the spam that does make it through.
Agreed, this is the latest adaptation made by spammers to bypass Bays filtering.
Look at this - http://djbdns.skwireless.net rbldns is part of the djbdns and is the best way to run your own RBL. Add this to SA as well as the others. Have fun! Dee
On Wednesday 31 December 2003 09:54 am, W.D.McKinney wrote:
On Wed, 2003-12-31 at 07:33, Gary wrote:
Hi Bruce / Nick,
On Wed, 31 Dec 2003 11:15:05 -0500 UTC (12/31/2003, 10:15 AM -0600 UTC my time), Bruce Marshall wrote:
B> On Wed December 31 2003 09:51 am, Nick Selby wrote:
It seems that the latest spate of spam is loaded with seemingly arbitrary words to defeat the Baysian processes of SpamAssassin, such as
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Anyone ... have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything?
I use a python script called spambayes, it works well against this sort of
email.
On Wednesday 31 December 2003 14:51 pm, Nick Selby wrote: <SNIP>
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Yes, that sounds quite plausible. What they are also doing is skewing the ratio of content-to-function words, in a grammatical sense. The ratio is relatively constant for a given language (for English approx 25-35% function words, like it, is, that, ...) so a list of purely content words would likely be asy to identify - it having a 0% score of function words.
Anyone heard of this and/or have a sense of how to defeat this strategy with current available configurable settings on SA? Spastic? Anything?
Not sure how to implement it with current setups, but you would only need to count the occurances of about 50 specified words and compare to total word count. Dylan
TIA
-- Sweet moderation Heart of this nation Desert us not We are between the wars - Billy Bragg
On Wed, 31 Dec 2003 16:32:26 +0000
Dylan
On Wednesday 31 December 2003 14:51 pm, Nick Selby wrote: <SNIP>
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Yes, that sounds quite plausible. What they are also doing is skewing the ratio of content-to-function words, in a grammatical sense. The ratio is relatively constant for a given language (for English approx 25-35% function words, like it, is, that, ...) so a list of purely content words would likely be asy to identify - it having a 0% score of function words.
Give a look to this for more advanced bayesian filters, this takes into account neighborhood too. Smarter tokenization systems may help as well.
The Wednesday 2003-12-31 at 16:32 -0000, Dylan wrote:
I guess that they're doing this to increase the message size with un-spamlike words to decrease the ratio of spam-like words to non-spam-like words? Does this sound right?
Yes, that sounds quite plausible. What they are also doing is skewing the ratio of content-to-function words, in a grammatical sense. The ratio is relatively constant for a given language (for English approx 25-35% function words, like it, is, that, ...) so a list of purely content words would likely be asy to identify - it having a 0% score of function words.
I have updated to SA 2.61, and I think that it catches more of those than it did the version that came with suse 8.2. It has increased the points given to the bayessian filter: 5.4 BAYES_99 -- Cheers, Carlos Robinson
participants (8)
-
Bruce Marshall
-
Carlos E. R.
-
Dylan
-
Gary
-
Ivan Sergio Borgonovo
-
Jerome Lyles
-
Nick Selby
-
W.D.McKinney