I recently switched from using Spamassassin to using DSpam. Spamassassin, while working very well for me, is slow, and consumes most of my processor. I like to be able to do things while my mail is being fetched.

DSpam's website has some good documentation for DSpam and the single user, but it is all old, and not terribly helpful for my setup. I'll explain my setup, and what I did to make it work.

Credit: Some things come from the DSpam page, specifically the mbox spamassassin conversion is from a post on DSpam for SA Users.

My Setup

I have two primary machines, my laptop and my desktop. Each of these are running Debian Testing (etch). My home directories are syncronized with unison, and I'm only using one machine at a time. Generally I want my experience to be exactly the same on both machines. Spamassasin stores it's Bayesian db in your home directory, so I can keep my spam filters updated and working when I'm on either machine. Since I store my mail locally, I want to be sure I'm training the same version of DSpam with the same data, so they're up to date with each other.

Setting up DSpam

We're going to want DSpam to store it's db in your home directory. This should work in a multi-user environment as well, but I have not tested it. To configure:

./configure --sysconfdir=/etc --enable-homedir --with-dspam-group=dspam \
 --with-dspam-home-group=dspam

You'll need to have made the dspam group, and put your user in it. After configuring, do the standard make and make install

Now you need to get the database created, the easiest way is to run dspam_stats username, replacing username with your username. This should be done as your normal user. Now you should have a directory ~/.dspam.

Edit your /etc/dspam.conf to have the following lines:

Trust username
Preference "signatureLocation=headers"

Trust will give your user account access to do most DSpam things without being root. I set the DSpam signature location to be in my headers. This is only good if you plan to fix errors by piping the entire (headers and all) message to a local dspam process. If you plan to retrain in any other fashion, and your mail client doesn't handle perserving headers well, don't change the signature location.

Training DSpam from SpamAssassin Messages

When SpamAssassin scans a message it adds in headers, and if it finds spam it might add the original message as an attachment to the warning. We don't want DSpam to consider SpamAssassin tokens as part of a spam/not-spam decision. So we need to clean up our messages. There are two methods, depending on if you're using mbox or maildir.

mbox

You'll want to make a cleaned version of each mbox you want DSpam to consider for training purposes. The command to run for a spam box is:

formail -s spamassassin -d < spam.inbox > cleaned.spam.inbox
dspam_corpus --addspam username cleaned.spam.inbox

If the mbox is not a spam box then:

formail -s spamassassin -d < other.inbox > cleaned.other.inbox
dspam_corpus username cleaned.spam.inbox

This will clean your mbox and then train DSpam correctly. dspam_stats -H username should report corpus training correctly.

maildir

Maildir is a bit trickier. You'll want to make a temporary set of maildirs for each box you plan to train. For example:

mkdir -p /tmp/spam/{cur,new}
mkdir -p /tmp/inbox/{cur,new}

Once you've done that, you'll get to convert over the messages. Use the following bash command to pull that off:

for i in ~/maildir/spam/new/*; \
 do (spamassassin -d < $i > /tmp/spam/new/`basename $i`); \
 done

Change "~/maildir/spam/new/" and "/tmp/spam/new/" as needed. This command takes about a second a message, I'd go watch TV or something.

Now that you've converted over your messages, you need to train DSpam on them, once again we'll use a nice bash for loop:

for i in /tmp/spam/new/*; \
 do (cat $i | dspam --user username --class=spam \
  --source=corpus --mode=teft --feature=chained,noise); \
 done

Or in the case of non-spam training:

for i in /tmp/spam/new/*; \
 do (cat $i | dspam --user username --class=innocent \
  --source=corpus --mode=teft --feature=chained,noise); \
 done

After this dspam_stats -H username should show a good number of corpusfed messages.

Procmail Configuration

Procmail configuration is simple a straight-forward:

:0fw
| dspam --user spr --stdout --deliver=innocent,spam

:0
* ^X-DSPAM-Result: spam
$HOME/Mail/spam/

Adjust as needed.

Mutt Configuration

I added the following to my .muttrc so I could easily re-train when I've had errors:

# DSpam management
macro   index   "\cx"   "<enter-command>set wait_key=no\n<pipe-message>dspam
--user spr --class=spam --source=error\n<delete-message>
<enter-command>set wait_key=yes\n" 'Spam Message'
macro   pager   "\cx"   "<enter-command>set wait_key=no\n<pipe-message>dspam
--user spr --class=spam --source=error\n<delete-message>
<enter-command>set wait_key=yes\n" 'Spam Message'
macro   index   "\ca"   "<enter-command>set wait_key=no\n<pipe-message>dspam
--user spr --class=innocent --source=error\n
<enter-command>set wait_key=yes\n<save-message>=" 'Non-Spam Message'
macro   pager   "\ca"   "<enter-command>set wait_key=no\n<pipe-message>dspam
--user spr --class=innocent --source=error\n
<enter-command>set wait_key=yes\n<save-message>=" 'Non-Spam Message'

So now when I hit ctrl-x a message is re-trained as spam and deleted. When I hit ctrl-g a message is re-trained as innocent and then I'm prompted to save to a mailbox.

Linux/Unix Pages

A Django site. Hosted on a Slicehost Slice