Perl programmer for hire: download my resume (PDF).
John Bokma Fighting spam
freelance Perl programmer

Classifying spam

Spam or Ham | 0 comments

I use SpamAssassin to classify my email into spam and non-spam (ham). The next step is to move ham and spam in separate mail folders. Or even better, to use several folders for spam and ham, depending on the score assigned by SpamAssassin.

SpamAssassin adds several headers to make the mail box selection job very easy. For example the following headers come from a ham mail processed by SpamAssassin:

X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on madhuri
X-Spam-Level: ***
X-Spam-Status: No, hits=3.8 required=6.0 tests=HTML_30_40=0.837,
    HTML_MESSAGE=1,LR_AT_CASTLEAMBER=2 autolearn=no version=2.60

Note that the number of *'s behind the X-Spam-Level header corresponds to the rounded down hits value (score).

When an email is classified as spam the extra headers are as follows:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on madhuri
X-Spam-Level: *************
X-Spam-Status: Yes, hits=13.5 required=6.0 tests=HTML_40_50=0.87,
    HTML_MESSAGE=1,LR_US_TLD=1,MIME_HTML_ONLY=0.666,
    USER_IN_BLACKLIST_TO=10 autolearn=no version=2.60

Note that in both examples the score value of each rule is shown. This is not the default setting. You can configure this behavior, in for example the user_prefs file, as follows:

add_header all Status "_YESNO_, hits=_HITS_ required=_REQD_
    tests=_TESTSSCORES_ autolearn=_AUTOLEARN_ version=_VERSION_"

This must be added as one long line. I changed the "tests" setting to _TESTSSCORES_ so I can read the score of each rule even if the email is classified as ham. Remember, always run spamassassin with the --lint option after you make a change to a configuration file.

I use Thunderbird as a mail client and will describe below how to make message filters for this program. Most email programs have a similar way to filter email.

Open the "Message Filters" dialog by selecting "Message Filters..." in the Tools menu. Press the "New..." button. The first drop down menu shows "Subject". Select the "Customize..." option and add the header: X-Spam-Level. The version of Thunderbird I am using didn't let me finish the rule so I canceled the filter rules dialog and selected "New..." again.

In the second drop down menu "contains" is the default and hence right for this rule. Enter several *'s in the third field. For example 10. This means that this rule will be activated if the X-Spam-Level has at least 10 *'s. Select the "Move to folder" action and create a new folder, for example "Serious Spam". Finally, give the filter a name, for example "Serious Spam".

Make several rules, each with less *'s. Note that you can classify your ham with these rules as well. For example if required hits is 5 then four *'s is close to spam. You can use this to move email to a folder that you don't read often.

When you have made several rules it is very important you get the order of the rules right. The rules are evaluated top down. So the most specific rule, the one requiring the most number of *'s should be the first.

Further reading

Please post a comment | read 0 comments | RSS feed