Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong html_top for forwarded email w/o a message at the top #14

Open
andreip opened this issue Dec 21, 2016 · 2 comments
Open

wrong html_top for forwarded email w/o a message at the top #14

andreip opened this issue Dec 21, 2016 · 2 comments
Labels

Comments

@andreip
Copy link

andreip commented Dec 21, 2016

First, thanks for the great library!

I'm getting into a case where the unwrap_html includes html_top field and the value of it is pretty much all the html when I provide it with a forward message that doesn't contain any message above the ----Forward message---- pattern. In comparison, the unwrap function doesn't include the text_top if you give it the plain text version of the same email. See some tests you can add and see this pass/fail in actual repo.

A test example that would pass

Adding a similar test to UnwrapTestCase.test_gmail_forward, but without the "Hello" message at the beginning:

class UnwrapTestCase(TestCase):
    ...

    def test_gmail_forward_no_message(self):
        # No more Hello message at the beginning, unlike test_gmail_forward
        self.assertEqual(unwrap("""

---------- Forwarded message ----------
From: Someone <[email protected]>
Date: Fri, Apr 26, 2013 at 8:13 PM
Subject: Weekend Spanish classes
To: [email protected]

Spanish Classes
Learn Spanish
"""), {
            'type': 'forward',
            'from': 'Someone <[email protected]>',
            'date': 'Fri, Apr 26, 2013 at 8:13 PM',
            'subject': 'Weekend Spanish classes',
            'to': '[email protected]',
            'text': 'Spanish Classes\nLearn Spanish',
        })

and this passes just fine, since unwrap doesn't include text_top in the output when there's no message at the top.

A test example that would fail

Adding a similar test to HTMLUnwrapTestCase.test_gmail_forward, but without the text at the top again:

class HTMLUnwrapTestCase(TestCase):
    ...

    def test_gmail_forward_no_message(self):
        # diff to the html from test_gmail_forward ; missing:
        # <             test
        # <             <div><br></div>
        # <             <div>blah</div>
        html = '''
<html>
    <head></head>
    <body>
        <div dir="ltr">
            <div><br><div class="gmail_quote">---------- Forwarded message ----------<br>
                From: <b class="gmail_sendername">Foo Bar</b> <span dir="ltr">&lt;<a href="mailto:[email protected]">[email protected]</a>&gt;</span><br>
                Date: Thu, Mar 24, 2016 at 5:17 PM<br>
                Subject: The Subject<br>
                To: John Doe &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;<br><br><br>
                <div dir="ltr">Some text<div><br></div><div><br></div></div></div><br>
            </div>
        </div>
    </body>
</html>'''

        self.assertEqual(unwrap_html(html), {
            'type': 'forward',
            'subject': 'The Subject',
            'date': 'Thu, Mar 24, 2016 at 5:17 PM',
            'from': 'Foo Bar <[email protected]>',
            'to': 'John Doe <[email protected]>',
            'html': '<html><head></head>\n    <body><div dir="ltr"><div><div class="gmail_quote"><div dir="ltr">Some text</div></div></div></div></body></html>',
         })

and this fails, since unwrap_html also includes a html_top field in the output. And the value of the html_top is basically all the email in this case, so not useful:

u'<html><head></head>\n    <body><div dir="ltr"><div><div class="gmail_quote">---------- Forwarded message ----------<br>\n                From: <b class="gmail_sendername">Foo Bar</b> <span dir="ltr">&lt;<a href="mailto:[email protected]">[email protected]</a>&gt;</span><br>\n                Date: Thu, Mar 24, 2016 at 5:17 PM<br>\n                Subject: The Subject<br>\n                To: John Doe &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;<br><br><br>\n                <div dir="ltr">Some text<div><br></div><div><br></div></div></div><br>\n            </div>\n        </div>\n    </body>\n</html>

Do you think that it should behave the way I'm suggesting, so not include html_top here? In that case I can look into it and provide a PR.

Something unrelated to this but I realized I haven't tried making it have an html_bottom too, but might not make sense in forward case, only in reply case, right? Since how would you tell if that's the bottom or it's part of the forwarded email itself?

@thomasst
Copy link
Member

Thanks for reporting this, that's a bug!

If you want to investigate and submit a PR, feel free to do so. Otherwise, I'll take a look at it when I get a chance.

@thomasst
Copy link
Member

thomasst commented Jun 7, 2018

While working on this it's also worth taking a look at #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants