Below follow some rules and other settings I added to my user_prefs file which is part of the excellent mail classifying Perl program SpamAssassin. Note that I call it a classifying program not a filter nor a spam blocking program. SpamAssassin adds information to an email about its class: spam or ham. Also the score of the email is added as extra information. This can be used to move email to seperate mail folders.
Every time you modify the user_prefs file run spamassassin in lint mode e.g.:
spamassassin --lint
SpamAssassin can also be installed on a computer running a Microsoft Windows (Win32) operating system. Michael Bell has written a HOW-TO on porting SpamAssassin.
I use the prefix LR_ to each rule meaning local rule to distinguish the ones that come with SpamAssassin. Each rule consist of three parts. A Perl regular expression defining the actual rule, a description and a score.
It should be noted that the default score used for each rule in SpamAssassin is the result of a long run of a genetic algorithm that assigns each score in such a way that false positives and negatives reported are minimal for a huge set of spam and non-spam (ham). Adding rules and or changing score values can disturb this fine balance and result in more classification errors. Hence the score value I use in each rule is an example only and you should always verify each mail reported as spam before deletion. Don't throw away mail classified as spam without looking at it.
Perl programmers should be aware that they don't accidentally put a terminating ; after the pattern match.
header LR_SUBJECT_VERY_LONG Subject =~ /.{500}/
describe LR_SUBJECT_VERY_LONG Subject contains a lot of characters
score LR_SUBJECT_VERY_LONG 1.5
This rule is matched if a subject contains at least 500 characters.
header LR_SUBJECT_JOHN Subject =~ /^john/i
describe LR_SUBJECT_JOHN Subject: starts with John
score LR_SUBJECT_JOHN 1
This rule is matched if a subject starts with john
. The case does not
matter thanks to the i option after the closing / in the pattern match.
A lot of spammers start their subject with the part in front of the @ taken from your email address. I have several rules like the above including "postmaster", "hostmaster" and even "ostmaster". I suspect the latter being caused by a flaw in a bot looking for email addresses.
header LR_SUBJECT_RND Subject =~ /%RND_UC_CHAR\[2-8\]/
describe LR_SUBJECT_RND Subject contains %RND_UC_CHAR[2-8]
score LR_SUBJECT_RND 2
I see this one appear now and then. Probably the result of a badly configured spam sending program or a spammer not reading the manual.
# Example: Subject: (857)Scientificaly proven to work(699)
header LR_SUBJECT_NUMBER_FUN Subject =~ /^\(\d+\).+\(\d+\)$/
describe LR_SUBJECT_NUMBER_FUN Subject starts and ends with a number between ()
score LR_SUBJECT_NUMBER_FUN 3
header LR_VIRUS_IT_1 Subject =~ /^Attenzione: Catturato un Virus$/
describe LR_VIRUS_IT_1 Virus warning
score LR_VIRUS_IT_1 6
Very annoying those automatically generated warnings to innocent bystanders. I am keeping a list of clueless senders of those messages.
uri LR_URI_NUMERIC_ENDING m|^https?://.+?\d{4,}$|i
describe LR_URI_NUMERIC_ENDING Ends in a number of at least 4 digits
score LR_URI_NUMERIC_ENDING 1
An URI ending in four or more digits matches this rule. Links like that are often used for not working "unsubscribe" links or in tracking systems.
I used the m|...| form instead of /.../ to make the rule more readable. Otherwise the forward slashes have to be escaped using a back slash.
uri LR_URI_LC_SUB m|^https?://lc\.|i
describe LR_URI_LC_SUB Has lc as subdomain
score LR_URI_LC_SUB 1
The lc subdomain is used now and then in links by spammers. Note that the . must be escaped otherwise it means "any character".
uri LR_URI_AMBER m|^https?://\S+?=[-_\+a-z0-9\.]+\@castleamber\.com|i
describe LR_URI_AMBER Uses @castleamber.com in URI
score LR_URI_AMBER 2
A lot of unsubscribe links contain your email address. I own the domain castleamber.com. Create a rule for your own domain. Note that the @ and the . in the domain name must be escaped by putting a \ in front.
uri LR_US_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.us(?:\/|$)/i
describe LR_US_TLD Contains a URL in the US top-level domain
score LR_US 1.5
Since there is a 'biz' rule I decided to add an 'us' rule as well. A similar rule can be made for the 'info' tld.
Fred Mobach (Systemhouse Mobach bv) warned me that there are non-profit organisations and government related organisations that use the 'us' tld so be careful when you add this rule. Bottom line: always check email marked as spam before you delete it.
header __LR_RE_CAPS Subject =~ /^Re: [A-Z]{2,8}, \S+ \S+ \S+$/
header __LR_MPOP_WEBMAIL X-Mailer =~ /^mPOP Web-Mail 2.19$/
body __LR_LONG_WORDS_5 /([a-z]{6,} ){5,}/
meta LR_EHOSTZZ (__LR_RE_CAPS && __LR_MPOP_WEBMAIL && __LR_LONG_WORDS_5)
describe LR_EHOSTZZ e-hostzz spam and look-a-likes
score LR_EHOSTZZ 6
New set of rules for the so called e-hostzz and look-a-likes spam.
I have several email addresses using castleamber.com as domain. Some are no longer in use and some are even invented by spammers. Hence I created the following list. Also some spammers mail spam to one address and CC the same mail to others within the same domain. If one of the blacklisted addresses is used the mail gets a very high score. So it seems useful to create a few bogus addresses to be used as spam traps.
blacklist_to yse@castleamber.com
blacklist_to google@castleamber.com
blacklist_to master@castleamber.com
blacklist_to ostmaster@castleamber.com
blacklist_to aster@castleamber.com
blacklist_to ohn@castleamber.com
blacklist_to uses@castleamber.com
blacklist_to contains@castleamber.com
The last two have been taken off this page since I used "contains" as a section title and "uses" in the section itself. The email address harvest bot has dropped the space somehow on both.
Some domain names have very obvious names. The following is an excerpt taken from my user_prefs. You can download a list that I update daily.
blacklist_from *@advertisingbymail.com
blacklist_from *@bestbuyuu.com
blacklist_from *@getquickernews.com
blacklist_from *@getquicknews.com
blacklist_from *@myperfectstuff.com
blacklist_from *@yourgreateststuff.com
blacklist_from *@greatnewsnow.com
blacklist_from *@mygreateststuff.com
blacklist_from *@myquicknews.com
blacklist_from *@getwebrx.com
blacklist_from *@purchze3.com
blacklist_from *@thiergreatstuff.com
blacklist_from *.greatoffersbymail.com
Note that the * acts as a wildcard that can be put in front of a . too. For example greatoffersbymail.com uses several different subdomains to send spam from. But all end in .greatoffersbymail.com.
I keep a list of Spam sending domains to be used for blacklisting. You can download the list in the 'blacklist_from' format as well.