Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regional lists to Spamassasin #2468

Open
henningwerner opened this issue Jan 5, 2025 · 18 comments
Open

Add regional lists to Spamassasin #2468

henningwerner opened this issue Jan 5, 2025 · 18 comments

Comments

@henningwerner
Copy link

henningwerner commented Jan 5, 2025

Is it possible to add regional source lists to spam assasin?

I get daily about 20x mails in german which are not flagged as spam into my inbox.

Moving them to spam doesnt seems to train the spam engine.

Any ideas to improve filtering further?

@sptcguy
Copy link

sptcguy commented Jan 8, 2025

Unfortunately (And someone correct me if I'm wrong) any config modifications to the mail stack have to be done manually with MiaB. I've found the settings that MiaB defaults to for Spamassassin to be rather ineffective, requiring quite a bit of tuning to get working properly.

Can you post the X-Spam-Status and X-Spam-Report from the headers of one the emails that SA is failing to classify as spam?

@henningwerner
Copy link
Author

henningwerner commented Jan 8, 2025

Here some examples:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE autolearn=no
    autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 1.0000]
    * 0.2 BAYES_999 BODY: Spamwahrscheinlichkeit nach Bayes-Test:
    * 99.9-100%
    * [score: 1.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.4
-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ****
X-Spam-Status: No, score=4.6 required=5.0 tests=BAYES_99,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,PYZOR_CHECK,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE
    autolearn=no autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 0.9971]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 1.4 PYZOR_CHECK Gelistet im Pyzor-System
    * (https://pyzor.readthedocs.io/en/latest/)
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 4.6

image

The received mails looks very similar, only the subject and body is different.

@sptcguy
Copy link

sptcguy commented Jan 8, 2025

Ok, you'll notice that the flags: BAYES_99 BODY and BAYES_999 BODY were triggered when SA scanned the email. This means your bayes classifier has in-fact learned them and thinks they're spam. You just need to configure SA to treat the bayes classification less conservatively.

The first step you can take is to enable a BAYES_99 short circuit. This will tell SA to automatically mark an email as spam if the classifier thinks there's a 99% probability the content is spam. It will then stop processing any further rules.

Here's how to enable it:

  1. Open /etc/spamassassin/local.cf
  2. Locate the following commented out line: # shortcircuit BAYES_99 spam
  3. Uncomment it.
  4. Uncomment the line # use_bayes 1 for good measure/sanity check.
  5. Restart Spamassassin and the proxy daemon: sudo systemctl restart spamassassin spampd

Be sure to closely monitor your spam directory for false positives. For this to work effectively your Bayesian classifier needs to be trained adequately on both spam and ham.

@henningwerner
Copy link
Author

These steps should I repeat after every MIAB updates, right?

Will give it a try, thanks for the instructions.

@sptcguy
Copy link

sptcguy commented Jan 8, 2025

These steps should I repeat after every MIAB updates, right?

Yep! I have a post update script that reinstates all my tweaks and custom rules.

@henningwerner
Copy link
Author

These steps should I repeat after every MIAB updates, right?

Yep! I have a post update script that reinstates all my tweaks and custom rules.

Can you share this as Gist please?

Here another example:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level:
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
    SPF_PASS autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Report:
    * -1.9 BAYES_00 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 0-1%
    * [score: 0.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at
    * https://www.dnswl.org/, no trust
    * [69.72.43.12 listed in list.dnswl.org]
    * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
X-Spam-Score: -2.2

image

I don't understand why this terrible spam is not recognized.

@sptcguy
Copy link

sptcguy commented Jan 8, 2025

If you notice the same or similar emails making it through and the classifier is not recognizing them as spam (BAYES_50, BAYES_99, BAYES_999), the first thing to check is if you still have any of the culprit mails buried in your inbox. If there are too many still in your inbox, there's a chance that the classifier has "learned" them as ham.

If this is the case, make sure everything that the classifier is failing to catch gets moved to your spam directory. Then monitor for a gradual improvement. I say gradual because it'll take a few emails before SA starts flagging them as BAYES_99.

If your inbox is already clear of any lingering/buried spam then we can move onto verifying your directory mapping and trying some score overrides.

I'm assuming based on the screenshot you posted that the email originates from a .ru TLD?

@henningwerner
Copy link
Author

henningwerner commented Jan 8, 2025

Previously I had not moved the mails to spam but deleted them. Could this also have a negative impact?

From: Court Surety <[email protected]>

The mail was sent through the Mailgun infrastructure.

@sptcguy
Copy link

sptcguy commented Jan 8, 2025

Previously I had not moved the mails to spam but deleted them. Could this also have a negative impact?

Yes. MiaB configures Dovecot to automatically train the Bayesian classifier when you move an email to the Spam directory or click your clients spam button. In order for SA to learn new spam it's imperative you move it to the spam directory.

Do you still have the spam mail in your trash? If so move it all to your spam directory.

@henningwerner
Copy link
Author

henningwerner commented Jan 9, 2025

Today I received two more mails with the same format I screenshoted above (3rd message).

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE autolearn=no
    autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 0.2 BAYES_999 BODY: Spamwahrscheinlichkeit nach Bayes-Test:
    * 99.9-100%
    * [score: 1.0000]
    * 3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
    * [score: 1.0000]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.4
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: **
X-Spam-Status: No, score=2.8 required=5.0 tests=BAYES_95,DMARC_PASS,
    HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,
    T_REMOTE_IMAGE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 3.0 BAYES_95 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 95-99%
    * [score: 0.9595]
    * -0.1 DMARC_PASS DMARC check passed
    * -0.1 SPF_PASS SPF check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 2.8

My local.cf

# This is the right place to customize your installation of SpamAssassin.
#
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
# Only a small subset of options are listed below
#
###########################################################################

#    A 'contact address' users should contact for more info. (replaces
#    _CONTACTADDRESS_ in the report template)
# report_contact [email protected]


#   Add *****SPAM***** to the Subject header of spam e-mails
#
# rewrite_header Subject *****SPAM*****


#   Save spam messages as a message/rfc822 MIME attachment instead of
#   modifying the original message (0: off, 2: use text/plain instead)
#
# report_safe 1
report_safe 0


#   Set which networks or hosts are considered 'trusted' by your mail
#   server (i.e. not spammers)
#
# trusted_networks 212.17.35.


#   Set file-locking method (flock is not safe over NFS, but is faster)
#
# lock_method flock


#   Set the threshold at which a message is considered spam (default: 5.0)
#
# required_score 5.0


#   Use Bayesian classifier (default: 1)
#
use_bayes 1


#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1


#   Set headers which may provide inappropriate cues to the Bayesian
#   classifier
#
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status


#   Whether to decode non- UTF-8 and non-ASCII textual parts and recode
#   them to UTF-8 before the text is given over to rules processing.
#
# normalize_charset 1

#   Textual body scan limit    (default: 50000)
#
#   Amount of data per email text/* mimepart, that will be run through body
#   rules.  This enables safer and faster scanning of large messages,
#   perhaps having very large textual attachments.  There should be no need
#   to change this well tested default.
#
# body_part_scan_size 50000

#   Textual rawbody data scan limit    (default: 500000)
#
#   Amount of data per email text/* mimepart, that will be run through
#   rawbody rules.
#
# rawbody_part_scan_size 500000

#   Some shortcircuiting, if the plugin is enabled
#
ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
#
#   default: strongly-whitelisted mails are *really* whitelisted now, if the
#   shortcircuiting plugin is active, causing early exit to save CPU load.
#   Uncomment to turn this on
#
#   SpamAssassin tries hard not to launch DNS queries before priority -100.
#   If you want to shortcircuit without launching unneeded queries, make
#   sure such rule priority is below -100. These examples are already:
#
# shortcircuit USER_IN_WHITELIST       on
# shortcircuit USER_IN_DEF_WHITELIST   on
# shortcircuit USER_IN_ALL_SPAM_TO     on
# shortcircuit SUBJECT_IN_WHITELIST    on

#   the opposite; blacklisted mails can also save CPU
#
# shortcircuit USER_IN_BLACKLIST       on
# shortcircuit USER_IN_BLACKLIST_TO    on
# shortcircuit SUBJECT_IN_BLACKLIST    on

#   if you have taken the time to correctly specify your "trusted_networks",
#   this is another good way to save CPU
#
# shortcircuit ALL_TRUSTED             on

#   and a well-trained bayes DB can save running rules, too
#
shortcircuit BAYES_99                spam
# shortcircuit BAYES_00                ham

endif # Mail::SpamAssassin::Plugin::Shortcircuit
pyzor_options --homedir /etc/spamassassin/pyzor
add_header all Report _REPORT_
add_header all Score _SCORE_
bayes_path /home/user-data/mail/spamassassin/bayes
bayes_file_mode 0666

Edit: I already cleared the dustbin yesterday evening.

@sptcguy
Copy link

sptcguy commented Jan 9, 2025

Those emails should have immediately gone to spam based on the flags. Just verifying, you did restart both spamassassin and spampd after saving local.cf correct?

@henningwerner
Copy link
Author

Yes after saving the local.cf I restarted both with your command sudo systemctl restart spamassassin spampd

@sptcguy
Copy link

sptcguy commented Jan 10, 2025

I just checked the mail server I manage at work which is an older version of MiaB and it appears that the setup process disables the shortcircuit plugin in some of the earlier versions.

Check /etc/spamassassin and see if the loadplugin Mail::SpamAssassin::Plugin::Shortcircuit is commented out: grep -i shortcircuit /etc/spamassassin/v320.pre

If it is, uncomment it, save and restart spamassassin: sudo systemctl restart spamassassin spampd

@henningwerner
Copy link
Author

You are right! I'll try it out and let you know tomorrow if the mails have been moved correctly to the spam folder.
So I shouldn't take action with every MIAB update since the shortcircuit plugin was only disabled on earlier versions, right?

@henningwerner
Copy link
Author

@sptcguy looks much better, thanks for your effort!

@sptcguy
Copy link

sptcguy commented Jan 13, 2025

Glad it's working better! I'll try and to make my post-update script a little more universal and share it.

@henningwerner
Copy link
Author

henningwerner commented Jan 14, 2025

Today two mails have passed the SA filter. Does it make sense to reduce the threshold value from 5 to 3, for example? Or should I train the filter further?

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box
X-Spam-Level: ***
X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_50,DKIM_SIGNED,
    DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,HTML_FONT_LOW_CONTRAST,
    HTML_IMAGE_ONLY_28,HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_REMOTE_IMAGE,
    URIBL_BLACK shortcircuit=no autolearn=no autolearn_force=no
    version=3.4.6
X-Spam-Report: =?ISO-8859-1?Q?
    * 0.8 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%
    * [score: 0.5621]
    * -0.1 SPF_PASS SPF check passed
    * -0.1 DMARC_PASS DMARC check passed
    * -0.0 SPF_HELO_PASS SPF: HELO-Name entspricht dem SPF-Datensatz
    * 0.0 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
    * 1.4 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of
    * words
    * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML-Schriftfarbe =e4hnlich der
    * Hintergrundfarbe
    * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
    * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
    * valid
    * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
    * author's domain
    * 1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    * [URIs: handel360.shop]
    * 0.0 T_REMOTE_IMAGE Message contains an external image?=
X-Spam-Score: 3.6

The mails looks similar to the one I posted in the third message.

@sptcguy
Copy link

sptcguy commented Jan 14, 2025

Does it make sense to reduce the threshold value from 5 to 3, for example?

Yeah, you can certainly do that. 3.0 is typically what I default to on any mail server I'm managing.
Keep in mind though, since you previously weren't putting Spam mails into the Spam directory, your classifier needs to learn what to block. You'll notice the flag BAYES_50. This means that the classifier is starting to learn but needs to see a few more similar emails before it hits the shortcircuit threshhold. Here's how you can adjust the required score:

  1. In /etc/spamassassin/local.cf uncomment and change required_score to 3.0
  2. Also uncomment bayes_auto_learn 1 just to be sure that the classifier is automatically processing hits when the count threshold is reached.
  3. Restart spamassassin: sudo systemctl restart spamassassin spampd

Also note that if you decrease the score to 3, you may run into issues with false positives due to misconfigured SPF on the remote sender side. In an ideal world we would automatically send those to spam, but you may have senders of whom are legit yet don't have DKIM/SPF etc properly setup.

I'd first see if putting spam messages that make it to your inbox into Spam, makes a difference. (Note: you may have to do this 2 or 3 times for a given type of message before SA learns to block it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@henningwerner @sptcguy and others