Skip to content

MarkKremer/EmailReplyParser

This branch is 1 commit ahead of willdurand/EmailReplyParser:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e7e1c0b · Sep 20, 2022
Jan 30, 2022
Sep 20, 2022
Sep 20, 2022
Jan 30, 2022
Nov 26, 2013
Nov 16, 2011
Jan 30, 2022
Jan 30, 2022
Apr 12, 2021

Repository files navigation

EmailReplyParser

GitHub Actions Total Downloads Latest Stable Version

EmailReplyParser is a PHP library for parsing plain text email content, based on GitHub's email_reply_parser library written in Ruby.

Installation

The recommended way to install EmailReplyParser is through Composer:

composer require willdurand/email-reply-parser

Usage

Instantiate an EmailParser object and parse your email:

<?php

use EmailReplyParser\Parser\EmailParser;

$email = (new EmailParser())->parse($emailContent);

You get an Email object that contains a set of Fragment objects. The Email class exposes two methods:

  • getFragments(): returns all fragments;
  • getVisibleText(): returns a string which represents the content considered as "visible".

The Fragment represents a part of the full email content, and has the following API:

<?php

$fragment = current($email->getFragments());

$fragment->getContent();

$fragment->isSignature();

$fragment->isQuoted();

$fragment->isHidden();

$fragment->isEmpty();

Alternatively, you can rely on the EmailReplyParser to either parse an email or get its visible content in a single line of code:

$email = \EmailReplyParser\EmailReplyParser::read($emailContent);

$visibleText = \EmailReplyParser\EmailReplyParser::parseReply($emailContent);

Known Issues

Quoted Headers

Quoted headers aren't picked up if there's an extra line break:

On <date>, <author> wrote:

> blah

Also, they're not picked up if the email client breaks it up into multiple lines. GMail breaks up any lines over 80 characters for you.

On <date>, <author>
wrote:
> blah

The above On ....wrote: can be cleaned up with the following regex:

$fragment_without_date_author = preg_replace(
  '/\nOn(.*?)wrote:(.*?)$/si',
  "",
  $fragment->getContent()
);

Note though that we're search for "on" and "wrote". Therefore, it won't work with other languages.

Possible solution: Remove "[email protected]" lines...

Weird Signatures

Lines starting with - or _ sometimes mark the beginning of signatures:

Hello

--
Rick

Not everyone follows this convention:

Hello

Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub

**********************DISCLAIMER***********************************
* Note: blah blah blah                                            *
**********************DISCLAIMER***********************************

Strange Quoting

Apparently, prefixing lines with > isn't universal either:

Hello

--
Rick

________________________________________
From: Bob [[email protected]]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick

Unit Tests

Setup the test suite using Composer:

$ composer install

Run it using PHPUnit:

$ ./vendor/bin/simple-phpunit

Contributing

See CONTRIBUTING file.

Credits

  • GitHub
  • William Durand

License

EmailReplyParser is released under the MIT License. See the bundled LICENSE file for details.

About

PHP library for parsing plain text email content.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%