Critical Failure of Crowd Sourcing Email Spam Filtering

Summary

Systems like Yahoo, Gmail, Microsoft Outlook and others partially rely on their users to identify spam. This helps automated systems become more effective. Even so, these hybrid systems (human + automation) have shortcomings.

Spam Variations

Traditional unsolicited emails (or spam) are simply an annoyance. They try to get people to buy something, or support some cause. Many of these can be easily detected based on the fact that millions are being sent out, with an identical message, and certain senders can be identified as sources of spam.

An early attempt to get around spam filters involved having the message body contain the spam message, but then below a random snippet of text from a novel, or other random characters, would appear. Spam filters saw these as unique emails and not bulk emails. So, other measures had to be developed for identifying those more complicated spam campaigns.

Some spam is obvious, such as emails from Wells Fargo Bank with a return address email and website links that aren’t Wells Fargo, being sent to millions of people. Most systems are smart enough to filter those out.

There’s another kind of spam known as spear phishing. This occurs in smaller batches, or individual emails, crafted to be more effective at tricking you into clicking something or divulging your private information. Some of these are more automated, utilizing databases of information about people. Other attempts may be manually crafted by individuals. It’s these kinds of campaigns that may be difficult or impossible for a traditional email spam filter system to detect. This is where hybrid models of email spam filtering are useful.

How Crowd Sourcing Email Filters Work

With crowd sourcing, email service providers add a link, sometimes near the Reply/Forward button that lets you notify them of emails that are spam or phishing. Spam messages are perceived as an annoyance, but not as potentially dangerous as phishing messages which might result in account information being divulged. They are also handled differently. So, this is why Google (Gmail) provides a distinction for reporting.

In theory, the information provided by individuals helps build a smarter and more effective system utilizing human input combined with automated systems.

How Crowd Sourcing Email Filters Fail

The problem with crowd sourcing is that, like any democratic process, there can sometimes be failures. Sometimes the majority of people are just wrong, maybe due to lack of accurate information, awareness, or education/training. If people aren’t trained on how to properly use the spam and phishing response buttons, and don’t really know how to properly identify emails, then the system fails.

If you’re a Gmail user, take a look at your Spam folder sometime. You might be surprised by what you find. Along with the expected spam, you’ll notice some important genuine emails from well established and trusted businesses and organizations.

The reason for this is because those organizations send out bulk emails to their members. These are mostly ‘opt-in’ emails, meaning that people at one point, maybe a year ago, clicked a box indicating they wanted correspondence. Or, in some cases, the checkbox was already selected by default and they didn’t uncheck it.

What’s happening millions of times per day is this…

  1. People receive a legitimate email from Bank of America, United Way, Old Navy — or some other provider that they at one point in the past agreed to receive email from.
  2. They agreed to receive these, and haven’t been doing anything about it for a year.
  3. One day they get upset about receiving the emails, so instead of taking 3 seconds to click the unsubscribe link, they click the spam or phishing notification buttons.
  4. Another 10 million people do the same thing that day with similar emails.
  5. Eventually, the system determines from people’s feedback that emails from the ‘Save the Earth Foundation’ (or whatever legitimate source) are spam. So, it begins sending all messages from that organization or business into the spam folder.
  6. So, that’s why, now when you go to your spam folder you’ll find some emails there from legitimate businesses and organizations.
  7. Gmail has a mechanism in place to correct for this problem. In your spam folder there is a button labeled Not Spam. When you click this, you’re helping the system learn that these aren’t spam.
  8. However, at this point, unfortunately, the masses of millions of people who either don’t know, are apathetic, or are too lazy to unsubscribe outnumber the people who know how to use the system.
  9. It’s partially not the fault of the masses because most of the systems implemented today aren’t intuitive, aren’t usability tested, and don’t have proper documentation, tutorials, or user guides.

So, this is why when you keep clicking the Not Spam button on your legitimate desirable emails, future ones continue to end up in your spam folder.

What You Can Do

Here are some suggestions of what you can do about the predicament described above.

  1. Check your spam folder weekly, and use the Not Spam button where appropriate. This may help correct the system and stop false positives (emails mistakenly considered to be spam).
  2. When you receive undesirable emails from legitimate organizations, use the unsubscribe button. Make sure it’s actually legitimate by checking what email address it came from. Also, hover your mouse over any links in the email (such as the unsubscribe link) and make sure the address they go to is the sender’s website (or a known source like Mailchimp or Constant Contact).
  3. Make sure that desirable senders are in your address book.
  4. Create rules that override the spam filters. Such as, “emails from this sender should always go to my inbox.” Such email rules are configurable in Microsoft Outlook software or other email clients, and some online email services (like Gmail) allow you to have rules. These are sometimes called filters. It’s an automated way to have your email sorted or managed.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.