Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing RTF email messages #16

Open
fadeyev opened this issue Oct 20, 2019 · 5 comments
Open

Add support for parsing RTF email messages #16

fadeyev opened this issue Oct 20, 2019 · 5 comments

Comments

@fadeyev
Copy link

fadeyev commented Oct 20, 2019

As disscussed in #15 there are Outlook msg files that have only RTF body, which were created from RTF directly, not from HTML (you can create such email in Outlook by selecting FORMAT TEXT tab -> Format section -> Rich Text when creating a new message). Current parser doesn't parse such emails even closely to something readable.

To support this we need a generic RTF parser, which can parse generic RTF file and then convert it to HTML. It should handle handle all RTF formatting like \pard\plain \f0\b and convert it to HTML tags (like <div>, <span>, etc.) and style attributes (like font-size, font-family, etc.)
Probably we can combine current parser and generic one written by kschroeer/rtf-html-java.

@bbottema
Copy link
Owner

Perfect. The change is probably actually on bbottema/rtf-to-html.

@fadeyev
Copy link
Author

fadeyev commented Oct 20, 2019

Ah, my bad, sorry - you can move the request to that project if you like.

@bbottema
Copy link
Owner

It's fine like this, no problem.

@bbottema
Copy link
Owner

I've had a talk with @kschroeer and he is willing to have his code merge with this code base into one cohesive solution. He did stress that he wants to make sure the solution is not tied to any other libraries to keep it as light-weight as possible, something I totally agree with.

Swing could be an optional dependency if people really would like to play with that option and I myself like to keep the option available for completeness sake.

Finally the result should be as you state in your opening: take kschroeer/rtf-html-java as a base, add the specifics of the RFC compliant converter, while defining defaults for non RTF-HTML elements.

@Faelean
Copy link

Faelean commented Jan 28, 2020

When viewing these two rtf mails

https://github.com/Sicos1977/MSGReader/blob/master/MsgReaderTests/SampleFiles/RtfSampleEmail.msg
https://github.com/Sicos1977/MSGReader/blob/master/MsgReaderTests/SampleFiles/RtfSampleEmailWithAttachment.msg

I get the following as the textHTML (screenshot from the second one as the first contains way too much text):

image

Is this related to this enhancement or a separate issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants