Add regional lists to Spamassasin #2468

henningwerner · 2025-01-05T08:30:24Z

Is it possible to add regional source lists to spam assasin?

I get daily about 20x mails in german which are not flagged as spam into my inbox.

Moving them to spam doesnt seems to train the spam engine.

Any ideas to improve filtering further?

sptcguy · 2025-01-08T17:15:03Z

Unfortunately (And someone correct me if I'm wrong) any config modifications to the mail stack have to be done manually with MiaB. I've found the settings that MiaB defaults to for Spamassassin to be rather ineffective, requiring quite a bit of tuning to get working properly.

Can you post the X-Spam-Status and X-Spam-Report from the headers of one the emails that SA is failing to classify as spam?

henningwerner · 2025-01-08T19:13:47Z

Here some examples:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE autolearn=no
    autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 1.0000]
    * 0.2 BAYES_999 BODY: Spamwahrscheinlichkeit nach Bayes-Test:
    * 99.9-100%
    * [score: 1.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.4

-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ****
X-Spam-Status: No, score=4.6 required=5.0 tests=BAYES_99,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,PYZOR_CHECK,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE
    autolearn=no autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 0.9971]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 1.4 PYZOR_CHECK Gelistet im Pyzor-System
    * (https://pyzor.readthedocs.io/en/latest/)
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 4.6

The received mails looks very similar, only the subject and body is different.

sptcguy · 2025-01-08T19:38:08Z

Ok, you'll notice that the flags: BAYES_99 BODY and BAYES_999 BODY were triggered when SA scanned the email. This means your bayes classifier has in-fact learned them and thinks they're spam. You just need to configure SA to treat the bayes classification less conservatively.

The first step you can take is to enable a BAYES_99 short circuit. This will tell SA to automatically mark an email as spam if the classifier thinks there's a 99% probability the content is spam. It will then stop processing any further rules.

Here's how to enable it:

Open /etc/spamassassin/local.cf
Locate the following commented out line: # shortcircuit BAYES_99 spam
Uncomment it.
Uncomment the line # use_bayes 1 for good measure/sanity check.
Restart Spamassassin and the proxy daemon: sudo systemctl restart spamassassin spampd

Be sure to closely monitor your spam directory for false positives. For this to work effectively your Bayesian classifier needs to be trained adequately on both spam and ham.

henningwerner · 2025-01-08T19:57:43Z

These steps should I repeat after every MIAB updates, right?

Will give it a try, thanks for the instructions.

sptcguy · 2025-01-08T20:07:13Z

These steps should I repeat after every MIAB updates, right?

Yep! I have a post update script that reinstates all my tweaks and custom rules.

henningwerner · 2025-01-08T20:17:56Z

These steps should I repeat after every MIAB updates, right?

Yep! I have a post update script that reinstates all my tweaks and custom rules.

Can you share this as Gist please?

Here another example:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level:
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
    SPF_PASS autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Report:
    * -1.9 BAYES_00 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 0-1%
    * [score: 0.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at
    * https://www.dnswl.org/, no trust
    * [69.72.43.12 listed in list.dnswl.org]
    * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
X-Spam-Score: -2.2

I don't understand why this terrible spam is not recognized.

sptcguy · 2025-01-08T20:43:30Z

If you notice the same or similar emails making it through and the classifier is not recognizing them as spam (BAYES_50, BAYES_99, BAYES_999), the first thing to check is if you still have any of the culprit mails buried in your inbox. If there are too many still in your inbox, there's a chance that the classifier has "learned" them as ham.

If this is the case, make sure everything that the classifier is failing to catch gets moved to your spam directory. Then monitor for a gradual improvement. I say gradual because it'll take a few emails before SA starts flagging them as BAYES_99.

If your inbox is already clear of any lingering/buried spam then we can move onto verifying your directory mapping and trying some score overrides.

I'm assuming based on the screenshot you posted that the email originates from a .ru TLD?

henningwerner · 2025-01-08T20:47:09Z

Previously I had not moved the mails to spam but deleted them. Could this also have a negative impact?

From: Court Surety <[email protected]>

The mail was sent through the Mailgun infrastructure.

sptcguy · 2025-01-08T20:51:47Z

Previously I had not moved the mails to spam but deleted them. Could this also have a negative impact?

Yes. MiaB configures Dovecot to automatically train the Bayesian classifier when you move an email to the Spam directory or click your clients spam button. In order for SA to learn new spam it's imperative you move it to the spam directory.

Do you still have the spam mail in your trash? If so move it all to your spam directory.

henningwerner · 2025-01-09T20:14:45Z

Today I received two more mails with the same format I screenshoted above (3rd message).

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE autolearn=no
    autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 0.2 BAYES_999 BODY: Spamwahrscheinlichkeit nach Bayes-Test:
    * 99.9-100%
    * [score: 1.0000]
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 1.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.4

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: **
X-Spam-Status: No, score=2.8 required=5.0 tests=BAYES_95,DMARC_PASS,
    HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,
    T_REMOTE_IMAGE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.0 BAYES_95 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 95-99%
    * [score: 0.9595]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 2.8

My local.cf

# This is the right place to customize your installation of SpamAssassin.
#
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
# Only a small subset of options are listed below
#
###########################################################################

#    A 'contact address' users should contact for more info. (replaces
#    _CONTACTADDRESS_ in the report template)
# report_contact [email protected]


#   Add *****SPAM***** to the Subject header of spam e-mails
#
# rewrite_header Subject *****SPAM*****


#   Save spam messages as a message/rfc822 MIME attachment instead of
#   modifying the original message (0: off, 2: use text/plain instead)
#
# report_safe 1
report_safe 0


#   Set which networks or hosts are considered 'trusted' by your mail
#   server (i.e. not spammers)
#
# trusted_networks 212.17.35.


#   Set file-locking method (flock is not safe over NFS, but is faster)
#
# lock_method flock


#   Set the threshold at which a message is considered spam (default: 5.0)
#
# required_score 5.0


#   Use Bayesian classifier (default: 1)
#
use_bayes 1


#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1


#   Set headers which may provide inappropriate cues to the Bayesian
#   classifier
#
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status


#   Whether to decode non- UTF-8 and non-ASCII textual parts and recode
#   them to UTF-8 before the text is given over to rules processing.
#
# normalize_charset 1

#   Textual body scan limit    (default: 50000)
#
#   Amount of data per email text/* mimepart, that will be run through body
#   rules.  This enables safer and faster scanning of large messages,
#   perhaps having very large textual attachments.  There should be no need
#   to change this well tested default.
#
# body_part_scan_size 50000

#   Textual rawbody data scan limit    (default: 500000)
#
#   Amount of data per email text/* mimepart, that will be run through
#   rawbody rules.
#
# rawbody_part_scan_size 500000

#   Some shortcircuiting, if the plugin is enabled
#
ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
#
#   default: strongly-whitelisted mails are *really* whitelisted now, if the
#   shortcircuiting plugin is active, causing early exit to save CPU load.
#   Uncomment to turn this on
#
#   SpamAssassin tries hard not to launch DNS queries before priority -100.
#   If you want to shortcircuit without launching unneeded queries, make
#   sure such rule priority is below -100. These examples are already:
#
# shortcircuit USER_IN_WHITELIST       on
# shortcircuit USER_IN_DEF_WHITELIST   on
# shortcircuit USER_IN_ALL_SPAM_TO     on
# shortcircuit SUBJECT_IN_WHITELIST    on

#   the opposite; blacklisted mails can also save CPU
#
# shortcircuit USER_IN_BLACKLIST       on
# shortcircuit USER_IN_BLACKLIST_TO    on
# shortcircuit SUBJECT_IN_BLACKLIST    on

#   if you have taken the time to correctly specify your "trusted_networks",
#   this is another good way to save CPU
#
# shortcircuit ALL_TRUSTED             on

#   and a well-trained bayes DB can save running rules, too
#
shortcircuit BAYES_99                spam
# shortcircuit BAYES_00                ham

endif # Mail::SpamAssassin::Plugin::Shortcircuit
pyzor_options --homedir /etc/spamassassin/pyzor
add_header all Report _REPORT_
add_header all Score _SCORE_
bayes_path /home/user-data/mail/spamassassin/bayes
bayes_file_mode 0666

Edit: I already cleared the dustbin yesterday evening.

sptcguy · 2025-01-09T21:31:14Z

Those emails should have immediately gone to spam based on the flags. Just verifying, you did restart both spamassassin and spampd after saving local.cf correct?

henningwerner · 2025-01-10T07:55:57Z

Yes after saving the local.cf I restarted both with your command sudo systemctl restart spamassassin spampd

sptcguy · 2025-01-10T17:00:11Z

I just checked the mail server I manage at work which is an older version of MiaB and it appears that the setup process disables the shortcircuit plugin in some of the earlier versions.

Check /etc/spamassassin and see if the loadplugin Mail::SpamAssassin::Plugin::Shortcircuit is commented out: grep -i shortcircuit /etc/spamassassin/v320.pre

If it is, uncomment it, save and restart spamassassin: sudo systemctl restart spamassassin spampd

henningwerner · 2025-01-11T09:35:54Z

You are right! I'll try it out and let you know tomorrow if the mails have been moved correctly to the spam folder.
So I shouldn't take action with every MIAB update since the shortcircuit plugin was only disabled on earlier versions, right?

henningwerner · 2025-01-12T10:27:23Z

@sptcguy looks much better, thanks for your effort!

sptcguy · 2025-01-13T15:56:49Z

Glad it's working better! I'll try and to make my post-update script a little more universal and share it.

henningwerner · 2025-01-14T09:00:21Z

Today two mails have passed the SA filter. Does it make sense to reduce the threshold value from 5 to 3, for example? Or should I train the filter further?

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_50,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_IMAGE_ONLY_28,HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE,
    URIBL_BLACK shortcircuit=no autolearn=no autolearn_force=no
    version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 0.8 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%
    * [score: 0.5621]
    * -0.1 SPF_PASS SPF check passed
    * -0.1 DMARC_PASS DMARC check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 1.4 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of
    * words
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * 1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    * [URIs: handel360.shop]
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.6

The mails looks similar to the one I posted in the third message.

sptcguy · 2025-01-14T19:20:29Z

Does it make sense to reduce the threshold value from 5 to 3, for example?

Yeah, you can certainly do that. 3.0 is typically what I default to on any mail server I'm managing.
Keep in mind though, since you previously weren't putting Spam mails into the Spam directory, your classifier needs to learn what to block. You'll notice the flag BAYES_50. This means that the classifier is starting to learn but needs to see a few more similar emails before it hits the shortcircuit threshhold. Here's how you can adjust the required score:

In /etc/spamassassin/local.cf uncomment and change required_score to 3.0
Also uncomment bayes_auto_learn 1 just to be sure that the classifier is automatically processing hits when the count threshold is reached.
Restart spamassassin: sudo systemctl restart spamassassin spampd

Also note that if you decrease the score to 3, you may run into issues with false positives due to misconfigured SPF on the remote sender side. In an ideal world we would automatically send those to spam, but you may have senders of whom are legit yet don't have DKIM/SPF etc properly setup.

I'd first see if putting spam messages that make it to your inbox into Spam, makes a difference. (Note: you may have to do this 2 or 3 times for a given type of message before SA learns to block it)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add regional lists to Spamassasin #2468

Add regional lists to Spamassasin #2468

henningwerner commented Jan 5, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025

henningwerner commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 8, 2025

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 8, 2025

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 9, 2025 •

edited

Loading

sptcguy commented Jan 9, 2025

henningwerner commented Jan 10, 2025

sptcguy commented Jan 10, 2025 •

edited

Loading

henningwerner commented Jan 11, 2025

henningwerner commented Jan 12, 2025

sptcguy commented Jan 13, 2025

henningwerner commented Jan 14, 2025 •

edited

Loading

sptcguy commented Jan 14, 2025 •

edited

Loading

Add regional lists to Spamassasin #2468

Add regional lists to Spamassasin #2468

Comments

henningwerner commented Jan 5, 2025 • edited Loading

sptcguy commented Jan 8, 2025

henningwerner commented Jan 8, 2025 • edited Loading

sptcguy commented Jan 8, 2025 • edited Loading

henningwerner commented Jan 8, 2025

sptcguy commented Jan 8, 2025 • edited Loading

henningwerner commented Jan 8, 2025

sptcguy commented Jan 8, 2025 • edited Loading

henningwerner commented Jan 8, 2025 • edited Loading

sptcguy commented Jan 8, 2025 • edited Loading

henningwerner commented Jan 9, 2025 • edited Loading

sptcguy commented Jan 9, 2025

henningwerner commented Jan 10, 2025

sptcguy commented Jan 10, 2025 • edited Loading

henningwerner commented Jan 11, 2025

henningwerner commented Jan 12, 2025

sptcguy commented Jan 13, 2025

henningwerner commented Jan 14, 2025 • edited Loading

sptcguy commented Jan 14, 2025 • edited Loading

henningwerner commented Jan 5, 2025 •

edited

Loading

henningwerner commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 8, 2025 •

edited

Loading

sptcguy commented Jan 8, 2025 •

edited

Loading

henningwerner commented Jan 9, 2025 •

edited

Loading

sptcguy commented Jan 10, 2025 •

edited

Loading

henningwerner commented Jan 14, 2025 •

edited

Loading

sptcguy commented Jan 14, 2025 •

edited

Loading