More than one match found for domains regex #129

ItayBenAvi · 2025-02-05T16:04:01Z

Describe the bug
mail-parser fails to run regex on domains that ends with ".id"
My scraper fails on them specifically

To Reproduce
Change domain to end with .id

Expected behavior
ID extracted correctly and only once

Environment:

OS: Linux
Docker: yes
mail-parser version 4.1.2

vmeyet · 2025-02-20T13:46:18Z

We've got the same issue/behavior (also on v4.1.2).

The parsing is erroring if any of the mail "received" contains a domain in .id

Here is a simple reproducible example:

The email containing a domain with .id

Received: from web.myhost.id
	by smtp.domain.com (Proxmox) with ESMTPS id SOMEIDHERE
	for <[email protected]>; Wed, 19 Feb 2025 15:00:00 +0700 (WIB)
From: "Someone" <[email protected]>
To: <[email protected]>
Subject: OK
Message-ID: <[email protected]>
Date: Wed, 19 Feb 2025 12:00:53 +0000
Content-Type: multipart/mixed; boundary="--_BOUND"
MIME-Version: 1.0

----_BOUND
Content-Type: text/plain; name="hello_world.txt"
Content-Transfer-Encoding: base64

aGVsbG8gd29ybGQK

----_BOUND--

The code

import pathlib
from mailparser import parse_from_string

with pathlib.Path('/path/to/file/test.txt').open() as file:
    parse_from_string(file.read()) # Gives an error log

# More than one match found for [^\w](?:id\s+(?P<id>.+?)(?:\s*[(]?envelope-from|\s*[(]?envelope-sender|\s+from|\s+by|\s+with(?! cipher)|\s+for|\s+via|;)) in from web.myhost.id by smtp.domain.com Proxmox with ESMTPS id SOMEIDHERE for <[email protected]>; Wed, 19 Feb 2025 15:00:00 +0700 WIB
# More than one match found for [^\w](?:id\s+(?P<id>.+?)(?:\s*[(]?envelope-from|\s*[(]?envelope-sender|\s+from|\s+by|\s+with(?! cipher)|\s+for|\s+via|;)) in from web.myhost.id by smtp.domain.com Proxmox with ESMTPS id SOMEIDHERE for <[email protected]>; Wed, 19 Feb 2025 15:00:00 +0700 WIB

Replacing the domain by a different tld than .id removes the error.

vmeyet mentioned this issue Feb 20, 2025

Support for .id and .by top level domains in parse_received #131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More than one match found for domains regex #129

More than one match found for domains regex #129

ItayBenAvi commented Feb 5, 2025

vmeyet commented Feb 20, 2025

More than one match found for domains regex #129

More than one match found for domains regex #129

Comments

ItayBenAvi commented Feb 5, 2025

vmeyet commented Feb 20, 2025