General

Software

Development

 

Strategies for filtering spam

This section introduces both the blacklist and whitelist strategies before outlining the Bunny's approach.

Blacklist strategy

Traditionally, anti-spam systems are based on maintaining a so-called 'black list' that contains e-mail addresses, domains, and network subnets of known spammers and/or a 'profile' of message headers and message body texts that define what a piece of spam looks like. In short:

"allow everything that is not explicitly denied"

This approach is fundamentally flawed in that it will always lag one step behind: There needs to be spam before it can be on a black list and, hence, it can at best match the evolution of spammer techniques but never outpace it. In addition, the chance of accidental 'false positives' is fairly high with this approach. Effective and reliable spam control requires something that doesn't rely on heuristics that spammers can work around.

Whitelist strategy

A white list centric strategy works exactly the other way around: The anti-spam system maintains a so-called 'white list' of known, trusted contacts whos messages are allowed directly into your mailbox. Messages from unknown senders are held in a pending queue until they respond to a confirmation request sent by the spam system. Such a confirmation verifies their original message to be legitimate and it is delivered to the inbox. In addition, the sender is added to the white list to insure they don't have to confirm future messages. In short:

"deny everything that is not explicitly allowed"

This methodology has the advantage of being very selective about what it allows in, while at the same time permitting legitimate, but previously unknown senders to reach you. Disadvantage is that it requires a sender to confirm once.

The white list strategy is based upon the following assumptions about the current Internet infrastructure:

  1. E-mail addresses cannot be kept secret from spammers.
  2. Content-based filters cannot distinguish spam from legitimate mail with sufficient accuracy.
  3. To maintain economies of scale, bulk-mailing is generally:
    • An impersonal process in which the recipient is not distinguished.
    • A one-way communication channel, from spammer to victim.
  4. Spam will not cease until it becomes prohibitively expensive for spammers to operate.
With a white list strategy, unrestricted access to your mailbox can no longer be assumed, a premise which spammers rely heavily upon. See also Nancy McGough's article on the white list approach or reverse spam filtering, as she dubs it.

The Bunny's approach

The Bunny combines a 'white list' for known/trusted senders with a confirmation system that enables unknown but legitimate senders to be added to your white list and, with that, be allowed to your inbox. It prepends a 'black list' for undesired senders but you will (intentionally) need to maintain that list manually, the bunny will only read it.

The current implementation comprises a simple filter (the "bunny" script) combined with some fairly straightforward procmail rules ("rc.bunny"). As usual, the "Keep It Simple, Stupid" (KISS) principle applies. :-)

Figure 2: The Bunny's approach to filtering spam

Free web graphics made by Matthew Peters



Coalition Against Unsolicited Commercial E-mail

European Coalition Against Unsolicited Commercial E-mail



SourceForge