Spam Filtering and SpamAssassin
Nov. 5, 2008 by kwarren
Spam has become a fact of life for us all. Children now ponder why a meat product would be named after junk e-mail. Addressing the issue of spam, also known as unsolicited commercial e-mail (UCE), is a complex task. Ideally, we would block all spam without ever interfering with legitimate e-mail, all with minimal effort. In reality, however, spam fighting is a constant cat-and-mouse game between e-mail administrators and spammers seeking to circumvent the latest antispam technologies. This TOM endeavors to clarify Wesleyan’s approach to controlling spam and provide guidance on how to use our anti-spam tools most effectively.
Wikipedia offers this definition of spam:
Spamming is the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages . . .
Spamming remains economically viable because advertisers have no operating costs beyond the management of their mailing lists, and it is difficult to hold senders accountable for their mass mailings. Because the barrier to entry is so low, spammers are numerous, and the volume of unsolicited mail has become very high. The costs, such as lost productivity and fraud, are borne by the public and by Internet service providers, which have been forced to add extra capacity to cope with the deluge. Spamming is widely reviled, and has been the subject of legislation in many jurisdictions.
(http://en.wikipedia.org/wiki/E-mail_spam)
When configuring our antispam tools, ITS frequently hears from users who are concerned that our system may reject legitimate e-mail. Our anti-spam system allows users to have significant control over how aggressively spam is filtered. At conservative settings (which is the default), the risk of rejecting a legitimate e-mail is extremely small.
So how does filtering work? Every mail message that comes through our mail servers is evaluated by our spam filtering system, called SpamAssissin. Most spam mail messages have certain identifying characteristics. SpamAssassin looks for the presence of these and assigns a score. Some characteristics are scored higher than others. For example, a message with “Viagra” in the subject line is likely to receive a relatively high score on that criterion alone. Other criteria might be an abundance of non-alpha characters or the prevalence of capital letters in the body of a message. Each criterion has a value associated with it. Each message ends up with a total score based on these criteria. We have found this system of using blended criteria to assess spam probability is highly reliable.
Users can tell the system how to handle messages based on the total spam score. The system can perform three actions on suspected spam: tag, move, or delete. Tag simply means that it appends the subject line to inform the user that the message is suspected to be spam. Move will tag the message and move it to a junk email folder. Delete means the message is removed from the system.
Until recently, Wesleyan users had the option of declining Spam filtering entirely. For security reasons, we now require minimal spam filtering for all accounts, but users still have significant control over how aggressively to act on suspected spam.
The default setting tags messages at a score of 9, moves them to a junk folder at a score of 15, and deletes them at a score of 20+. These settings have been tested by users in the community and ITS and have been successful. In spam terms, a score of 20 or higher is virtually guaranteed to be spam. If more proactive filtering is desired, users can select from two additional presets which will cause the system to tag, move, and delete messages at lower spam scores.
Users on Exchange will see their “moved” email in their junk email folder. Cyrus users will see a junk email folder in their Web Mail where they can look at the messages. Eudora users will need to login to Web Mail to see the Junk Email folder.
To look at individual spam settings, users can go to EPortfolio, Tools & Links, SpamAssassin Configuration http://quicklink.wesleyan.edu/SPAM. From here you can set your level of filtering as well as add addresses to the whitelist or blacklist. A whitelist is a list of known good addresses that Spam Assassin will always allow. A blacklist is a list of addresses that are always junk.
Within your email client (Outlook, Eudora, etc), you also have the ability to set Junk Mail settings. This can work in conjunction with your SpamAssassin settings, although often you may find that it is not necessary to have both.
In Outlook, Junk Email settings are under, Actions>Junk Email>Junk Email Options.
For more information:
http://tech.yahoo.com/gd/changing-the-junk-e-mail-settings-in-outlook-2007/200314
For setting local spam settings in Eudora:
PC
http://www.yale.edu/its/email/howdoi/eudora/junkmail-filtering.html
Mac
http://www.it.iastate.edu/pub/gag341/gag341.html
A word of caution: local junk email settings are more likely to tag legitimate email and are easier to mis-configure. In an effort to combat that and prevent false positives, users may allow all messages from a domain (such as Wesleyan.edu). In doing so, they may be opening themselves to more spam since spam messages usually masquerade their source. The best course of action is for users to evaluate their flow of email and determine if anything more than SpamAssassin settings are needed.
Spam filtering is far more of an art than a science. Applications have to learn and re-learn as the sources of spam modify their methods to bypass new filtering technologies. A minimum level of spam filtering is both wise and necessary to protect the Wesleyan user community from being inundated with unwanted, and often harmful, messages.
