[TriLUG] spam filtering

Benjamin Reed ranger at befunk.com
Sun Aug 17 11:53:47 EDT 2003


For those that are interested, I've implemented a new spam-filtering
system that is *rocking*.  Since I set it up yesterday morning, I've
had 0 false positives and it's caught every single piece of spam I've
gotten (in that time frame, I've gotten about 700 e-mails).

It's based off of this site's configuration:
   http://www.cs.wisc.edu/~chalpin/project/spam.html

...but with a lot of changes.

The way it works is this:

1. Mail comes in and is delivered to procmail.
2. Things that I've manually marked for deletion go to /dev/null.
3. I pass the mail off to Corey's white/blacklist checker, which
    first checks tmda for white/grey/blacklisting, and then passes
    off to bogofilter and spamassassin.  The important thing to
    note in this is that if something matches tmda's white or black
    list, it automatically passes the mail off to bogofilter to
    automatically train it better.
4. I can now do my normal procmail filtering, putting things into
    the right folders for mailing lists and such.
5. Anything that's left gets passed to TMDA (the sentry-style mail
    filter).  If it passes that, it's delivered to a "fallthrough"
    folder so that I know to write a new rule to handle that type
    of mail properly (either add the user to the white/greylist,
    or filter the list, or whatever.)

So far, I've had *nothing* fall through to TMDA delivery, which is
good.  I want to avoid bothering people if at all possible.

The other thing I've done is set up some cron jobs to use his
spam/ham script automatically, based on a drop folder.  IE, I use
IMAP, and if I get a false positive, all I need to do is drag it
to my "blacklist" imap folder and the cron job will find it, tell
bogofilter that it's spam, and remove it.

The only setup you need to do is have a good corpus of spam to feed
to bogofilter to start with, so it knows what's good and what's
bad.  It's also a good idea to whitelist as much as possible to
avoid the TMDA confirmation stuff.  If you know your IMAP folders
are clean, you can go into your mail directory and do:

   cat * | formail -s formail -trz -xto |  sort -fu > white

...to make your whitelist.  That will take all of the From's from
your mail.

If anyone's interested in my scripts and configs, I can put them
up somewhere.  So far I'm stoked, I keep watching my procmail logs
and grinning, watching every single mail go to the right place.

-- 
We put a lot of thought into our defaults.  We like them.  If we
didn't, we would have made something else be the default.  So keep
your cotton-pickin' hands off our defaults.  Don't touch.  Consider
them mandatory.  "Mandatory defaults" has a nice ring to it.




More information about the TriLUG mailing list