Using statistics to identify spam, and the problem of open communication systems

Filed in by Kirk Averett | September 13, 2005 12:20 pm

A Good Idea?

A few days ago I read a great persuasive essay by
Jonathan Zdziarski of the dspam project about statistical spam identification.  It forced me to rethink some of my assumptions about the right way to identify spam.  Jonathan argues compellingly that the best automatic identification possible will come from a learning computer program trained by humans to recognize good and bad messages.  The program looks at email the way a human might and has the potential to identify better than 99% of spam messages– a fantastic percentage when battling smart humans who have intense financial motivation to get spam through.  I think we’re going to run some statistical software alongside our existing anti-spam system and see just how well it might work for our customers.

Dumb Idea

Marcus Ranum is the author of another interesting essay[1] called, “The Six Dumbest Ideas in Computer Security”.  The #1 bad idea he listed is “Default Permit”.  Marcus explains that allowing communication between computers by default is a very bad idea.  Instead, block everything by default then go back and allow what must be allowed.  I think he’s right in a lot of cases, but it leads to a problem for business email.

Not Trusting Customers = Trouble

If you take that approach with email you use systems that form a wall around your company as protection from the bad emailers of the world.  When a customer needs to reach you via email they have to scale that wall at least one time to get a message to you: they fill out an extra form, wait a few minutes and then click a link when asked, or at least must prove that they’re human.

After the outside email user has scaled the wall one time, they generally don’t need to scale it again unless they change email addresses– not an everyday thing, though also not uncommon.

Putting up barriers to conducting business tends to harm a business– why else would businesses spend as much as they do on lobbyists?  At least in the U.S., it would appear that the very high costs of paying lobbyists are still somewhat lower than jumping over certain regulatory walls.  Requiring a potential customer to wait longer or do more than they expect reduces their willingness to do business and has a small but accumulating impact.  Not every user is put off by the wall, but some will be confused or annoyed by it because they expected to walk right into your store and talk with someone.

Open Doors

We’re back to where we started: if you’re going to accept email from the world by default you need a way to let good mail in and keep bad mail out that is about as smart as you are and doesn’t take any of your time.  I think our current system of spam scoring, blacklists, safelists, etc. is very effective and users with advanced needs have some ability to tweak things.

But if there’s any way to make it even more accurate: you’d better believe we’ll use it!  We’ll tell you how our tests went in a few months.

-Kirk

Endnotes:
  1. interesting essay: http://www.ranum.com/security/computer_security/editorials/dumb/

Source URL: http://www.rackspace.com/blog/using-statistics-to-identify-spam-and-the-problem-of-open-communication-systems/