You may input a full SpamAssassin Rules File (.cf) or just a single rule.
Input your rule or rules file in the Source Rules File text field below.
The script will create obfuscated versions of body and header (Subject only) rules.
If you are trying just a single rule, be sure to enter a valid, complete rule.
Include the type (eg: body or header).
Include the rule name (eg: MY_MONEY_SUBJ).
If you are trying a header rule, include the "Subject =~" (Currently the script only supports Subject for header rules).
Include the rule regexp between slashes (eg: "/money/").
Valid example 1: "header MY_MONEY_SUBJ Subject =~ /\bmake money\b/i"
Valid example 2: "body MY_VIAGRA /\bviagra\b/i"
Putting word boundaries around the source word seems prudent ("\b").
You may also include a describe and/or score declaration.
Do not pre-obfuscate your rules (eg: "/m.?0.?n.?[e3].?y/")
Obfuscated words of length 3 or less will generate false hits on binary data, such as images or attached documents. I recommend turning off gap detection in this case. In a later version, I may change the gap detection for 2 and 3 letter words to be more restrictive.
Feedback
Comments? Questions? Problems with the script, or this page? Leave feedback here
Last 3 threads shown.
Displaying threads 15 through 17.
Date: 2006-09-21Author: OlivierReply to this Subject: Reverse character range with Unicode
Hi,
I just noticed that when using Unicode (not using -u) the rules regerated contain character range that are in reverse order:
body LOCAL_OBFU_CSIM_VIAGRA /(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF]|\xC4[\xAF-\xAE]|...
See the [\xB0-\xAF] and then the [\xAF-\xAE] and etc. (I trucated the rule)
So SA will complain that
[2677] warn: config: invalid regexp for rule LOCAL_OBFU_CSIM_VIAGRA: /(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF]|\xC4[\xAF-\xAE]|... Invalid [] range "\xB0-\xAF" in regex; marked by <-- HERE in m/(?i)(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF <-- HERE ]|\xC4[\xAF-\xAE]|...
When using -u (no Unicode)the rule is shorter and there is no such character range trouble.
Date: 2007-01-31Author: andreaReply to this Subject: new kind of spam
How would you fight against this?: "Good day,
Via_grra $1, 80 Cia_aliss $3, 00 Levi_trra $3, 35
http://www.progenyid.*com ( Important ! Remove "*" )
-- But for all the notice anyone took, he might just as well not have answered at all. Im tired! he bellowed finally, after nearly half an hour. No,"
Cosider that: VIAGRA can be written in multiple ways like VIdsfsfA_AxxxGRA and it's pretty impossible to set a rule for so many combination. The link often changes and the last part looks like random book part. Any idea? thanx
Date: 2007-02-06Author: Chris ThielenReply to this Subject: Re: new kind of spam
> How would you fight against this?: > "Good day, > > Via_grra $1, 80 > Cia_aliss $3, 00 > Levi_trra $3, 35
Hi Andrea,
Catching obfuscations like this is something of a cat-and-mouse game. There is a balance to be made between false positives and false negatives on obfuscated text. In this particular case, I suggest setting the multi-gap width to 2 or 3 (-m 3), and setting duplicate chars to 2 or 3 (-d 3). You may need to also enable simple gap (-s).
Turning all these options on will increase the leniency of the matches, but has more potential to cause false positives. It also won't be able to capture the obfuscation examples you have mentioned at the bottom, but again there is a balance that must be made.
SpamAssassin is designed to use a variety of heuristics, not just one or two. I highly recommend getting a working bayes system and enabling the network tests (URIBL, DNSBL, etc).
Chris
Date: 2007-02-14Author: andreaReply to this Subject: Re: new kind of spam
> > How would you fight against this?: > > "Good day, > > > > Via_grra $1, 80 > > Cia_aliss $3, 00 > > Levi_trra $3, 35 > > Hi Andrea, > > Catching obfuscations like this is something of a cat-and-mouse game. There is a balance to be made between false positives and false negatives on obfuscated text. In this particular case, I suggest setting the multi-gap width to 2 or 3 (-m 3), and setting duplicate chars to 2 or 3 (-d 3). You may need to also enable simple gap (-s). > > Turning all these options on will increase the leniency of the matches, but has more potential to cause false positives. It also won't be able to capture the obfuscation examples you have mentioned at the bottom, but again there is a balance that must be made. > > SpamAssassin is designed to use a variety of heuristics, not just one or two. I highly recommend getting a working bayes system and enabling the network tests (URIBL, DNSBL, etc). > > Chris >
thanx for your answer. Actually i focused on "*" to catch those mails cause it seams to be repeated and i did this rule : body COTUS_ASTERISCO /[\w\s]{0,5}\x22\W\x22[\w\s]{0,5}/i It does a good job but it also blocks some nonspam mails. I used to check my rules with RegexBuddy and when i test false positive mails against COTUS_ASTERISCO i doesn't return anything. It looks like i'm doing rules in a different language than the one spamassassin uses. In addition i can't find a secure way to test rules in Windows (can't use -lint option). Your tester seams good but it doesn't let me test multiple lines. thanx for your time
Date: 2008-04-04Author: BenjaminReply to this Subject: error on page
Hi Your script sounds very interresting, but I cant make this site work: http://sandgnat.com/cmos/cmos.jsp
I get the following error all the time: Line: 55 Char: 4 Error: Invalid range in character set Code: 0 URL: http://sandgnat.com/cmos/cmos.jsp? ....
Subject: Reverse character range with Unicode
I just noticed that when using Unicode (not using -u) the rules regerated contain character range that are in reverse order:
body LOCAL_OBFU_CSIM_VIAGRA /(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF]|\xC4[\xAF-\xAE]|...
See the [\xB0-\xAF] and then the [\xAF-\xAE] and etc. (I trucated the rule)
So SA will complain that
[2677] warn: config: invalid regexp for rule LOCAL_OBFU_CSIM_VIAGRA: /(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF]|\xC4[\xAF-\xAE]|...
Invalid [] range "\xB0-\xAF" in regex; marked by <-- HERE in m/(?i)(?!\bviagra\b)(?:\b[vu]|\B(?:\\\/|\xCE\xBD))[\x01-\x2F\x3A-\x40\x5B-\x60\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|(?:\xC4[\xB0-\xAF <-- HERE ]|\xC4[\xAF-\xAE]|...
When using -u (no Unicode)the rule is shorter and there is no such character range trouble.