Spamassassin: Train Bayes Recognition

To train the Bayes recognition of Spamassassin you need to have a big amount of spam mails which you can feed to Spamassassin. Normally you would not have such amount of spam if you are just running your personal mailserver.

However you can download spam mails from the Untroubled.org Spam Archive.

Just download the archives you want (I recommend just to use newer archives e.g. from the last 2 years) and then run the “sa-learn” command of Spamassassin. I did it like that:

cd /tmp
mkdir spam
cd spam

# Download the Archives
wget http://untroubled.org/spam/2018.7z
wget http://untroubled.org/spam/2019-01.7z
# ... <as much archives you want ...>

# Unpack the archives (you need to have 7z installed!) and
# delete the 7z files afterwards
for i in *.7z ; do 7z x "$i" ; done
rm /tmp/spam/*.7z

# Train your spam database
/usr/bin/sa-learn --username amavis --dbpath /var/lib/amavis/.spamassassin --spam /tmp/spam/*

# Remove all spam archive fails again
rm -rf /tmp/spamCode language: Bash (bash)

Spamassassin: Train Bayes Recognition