How to train the SpamAssassin Bayesian Classifier

BayesInSpamAssassinIf you are finding that our recent tutorial on how to configure SpamAssassin on a cPanel server is not performing quite as well as could be expected you might need to train the SpamAssassin Bayesian classifier.

The Bayesian filter can learn which emails are good and which are bad based upon your actions of moving them to a good folder (i.e. don’t mark as spam), or bad folder (i.e. spam).  You then configure a cron job to action certain commands once a day (it can be resource intensive, so set for off-peak hours), and it will run a command on your server to teach SpamAssassin whether to mark similar emails as spam or not.  You can read more about the Sa-learn feature here.

Setting up the SA-Learn Bayesian Classifier System

  • Make sure you configure your mail clients to use IMAP.
  • Set up two folders in your mail client. One should be where you move spam mails to, such as “Spam” or “Junk”.  The other should be somewhere you move mails you want to mark as not spam such as “NotSpam”.
  • Set up Cron jobs to make SpamAssassin process those mails as Spam or Not Spam as follows:
sa-learn -p ~/.spamassassin/user_prefs –spam ~/mail/{cur,new}
sa-learn -p ~/.spamassassin/user_prefs –ham ~/mail/{cur,new}

In these two examples “spam” is the location of mails that should be marked as spam, and “notspam” is the location of the mails that should not have been marked as spam.

“” should be replaced with your actual domain.

“Youremail” should be the first part of your email so if your email was jonathan[@], you would enter “jonathan”.

If you wish to set this up for all emails on your account, you can replace “youremail” with an asterisk “*”.

We highly recommend setting the cron jobs just once per day at off-peak times (i.e. 6 a.m. GMT) to avoid your web host being concerned about the system load resource usage.

  • Now you have configured the cron job you just need to move the mails to the relevant folders, noting that either webmail or IMAP must be used.
  • The final step is to generate the user_prefs file. This can be done by going into the SpamAssassin configuration settings — “Configure Apache SpamAssassin” (see this tutorial), enter the required score and click save. When you click save, the file will be generated automatically. You can check by logging into the file manager, navigating to the .spamassassin folder and making sure the file is there.

  1. Reply Javier May 18, 2016 at 11:50 pm

    Hi, I tried your steps, but when the Cron Job run, I get the following:

    Error message:
    /bin/bash: sa-learn: command not found

    I also tried this command:

    /usr/local/bin/sa-learn -p ~/.spamassassin/user_prefs –spam ~/mail/{cur,new}

    Error message:
    /bin/bash: /usr/local/bin/sa-learn: No such file or directory

    I have a dedicated Server at Inmotion. I am running the Cron Jobs at cPanel level, but I am rookie in Linux.

    • Reply Jonathan Griffin May 19, 2016 at 12:20 am

      See if you can find the location using:

      which sa-learn

      what does that say?

Leave a reply