I get tons or email and quite a bit of it is SPAM these days. To combat this I use MailScanner with Postfix, ClamAV, and SpamAssassin. I also setup special mailboxs on all email accounts specifically so that the users can classify mail as either SPAM or if necessary as HAM. Once a week or so a process runs to learn from the SPAM folders. I thought that process was working quite well. It turns out I made a simple goof that has kept my SpamAssassin Bayesian Filter from being able to read my Bayesian database.
Yet, the learning process worked flawlessly for either SPAM or HAM. Why HAM? Because occasionally I have to go through all the caught SPAM email and unlearn that message as SPAM for my users. This process also worked quite well, but I was constantly getting flooded with the same old SPAM messages. So I need to dive deeper.
The problem is that all the MailScanner, Postfix, and SpamAssassin code runs as the user “postfix”, while the Bayesian Learning process stores all its data as the user root as it runs as root. Actually, it stores the Bayesian databases as the user root. Alas, I had a permission problem and none of the tools told me this was the case.
The fix was to move my Bayesian database from the /root/.spamassassin directory to the /etc/MailScanner/bayes directory and then change the owner of those files to be “postfix”. Then I created a symbolic link from /etc/MailScanner/bayes to /root/.spamassassin which allowed my current Bayesian learning scripts to continue to work. With a simple change the SpamAssassin configuration for MailScanner and a reset of MailScanner finally was solved.
The problem is finally solved and email I have marked as SPAM is finally being treated as such. Such a simple issue, I wonder why SpamAssassin was just not complaining it could not reach the Bayesian databases. For something this serious, the error should have been made available somewhere.