-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing RTF email messages #16
Comments
Perfect. The change is probably actually on bbottema/rtf-to-html. |
Ah, my bad, sorry - you can move the request to that project if you like. |
It's fine like this, no problem. |
I've had a talk with @kschroeer and he is willing to have his code merge with this code base into one cohesive solution. He did stress that he wants to make sure the solution is not tied to any other libraries to keep it as light-weight as possible, something I totally agree with. Swing could be an optional dependency if people really would like to play with that option and I myself like to keep the option available for completeness sake. Finally the result should be as you state in your opening: take kschroeer/rtf-html-java as a base, add the specifics of the RFC compliant converter, while defining defaults for non RTF-HTML elements. |
When viewing these two rtf mails https://github.com/Sicos1977/MSGReader/blob/master/MsgReaderTests/SampleFiles/RtfSampleEmail.msg I get the following as the textHTML (screenshot from the second one as the first contains way too much text): Is this related to this enhancement or a separate issue? |
As disscussed in #15 there are Outlook msg files that have only RTF body, which were created from RTF directly, not from HTML (you can create such email in Outlook by selecting FORMAT TEXT tab -> Format section -> Rich Text when creating a new message). Current parser doesn't parse such emails even closely to something readable.
To support this we need a generic RTF parser, which can parse generic RTF file and then convert it to HTML. It should handle handle all RTF formatting like
\pard\plain \f0\b
and convert it to HTML tags (like<div>
,<span>
, etc.) and style attributes (likefont-size
,font-family
, etc.)Probably we can combine current parser and generic one written by kschroeer/rtf-html-java.
The text was updated successfully, but these errors were encountered: