3rd post in the Fraud series – email spam!
The first spam message was sent in 1978 to 600 people. Today, approximately 100 billion spam messages are sent every day. You might have encountered one within last 12 hours. Spam messages not only choke the bandwidth and mail boxes, it also affect businesses.
To fight spam, email providers like Yahoo, Google, Microsoft etc. deploy Spam Filters which detect spam. There are 2 key methods of spam detection:
- Rule based detection (Host filters, Client filters, White Lists, Real time black hole lists)
- Probabilistic detection (Bayesian Filters)
Rule Based Detection
Generally the first level of detection where mails are validated against certain rules, such as:
- Email Header analysis
- Email Subject analysis
- Blacklisted URL check
- Email body text analysis
Probabilistic detection (Bayesian Filters)
Bayesian filter is a wonderful method for fighting spam. The beauty of this technique is that it adapts to the Spammers’ behavior, making it difficult for spammers to beat the system. This filter is used in almost all server side / client side mail filtering products.
Bayesian filters use Bayes’ theorem, which calculates the statistical probability of mail being a spam. In its simpler form, the probability of spam will be:
The Bayesian filter is ‘taught’ to identify spams (based on keywords, phrases and analysis of messages previously marked as spam). Once trained, based on the spamicity of the word, individual probabilities are added up. Filters these days also use the sliding window concept to identify the phrases of words rather than single words. Like any other method, Bayesian Filtering is also vulnerable and might be prone to ‘poisoning’.
To fight spam, governments are also establishing the anti-spam laws. The Canadian government passed sweeping legislation in 2010 that has become known as Canada’s Anti-Spam Legislation (CASL), which comes into force on July 1, 2014.