Coder Social home page Coder Social logo

crisp-oss / email-forward-parser Goto Github PK

View Code? Open in Web Editor NEW
48.0 10.0 17.0 188 KB

🐛 Parses forwarded emails and extracts original content.

Home Page: https://www.npmjs.com/package/email-forward-parser

License: MIT License

JavaScript 100.00%
email mail parser forward

email-forward-parser's Introduction

Email Forward Parser

Test and Build Build and Release NPM Downloads

Parses forwarded emails and extracts original content.

This library supports most common email clients and locales.

😘 Maintainer: @eliottvincent

Who uses it?

Crisp

👋 You use this library and you want to be listed there? Contact us.

Features

This library is used at Crisp everyday with around 1 million inbound emails.

  • Supported clients: Apple Mail, Gmail, Outlook Live / 365, Outlook 2013, Outlook 2019, New Outlook 2019, Yahoo Mail, Thunderbird, Missive, HubSpot, IONOS by 1 & 1, MailMate
  • Supported locales: Croatian, Czech, Danish, Dutch, English, French, Finnish, German, Hungarian, Italian, Japanese, Norwegian, Polish, Portuguese (Brazil), Portuguese (Portugal), Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian

Usage

const EmailForwardParser = require("email-forward-parser");

const result = new EmailForwardParser().read(MY_EMAIL_STRING);

console.log(result.forwarded);
// true

API

Parse a forwarded email

read(body, subject) checks whether an email was forwarded or not, and parses its original content (From, To, Cc, Subject, Date and Body):

  • body must be a string representing the email body (as returned by mailparser, for example)
  • subject must be a string representing the email subject. This parameter is optional, but recommended to improve the detection for some email clients (especially New Outlook 2019)
const EmailForwardParser = require("email-forward-parser");

const result = new EmailForwardParser().read(MY_EMAIL_STRING, MY_SUBJECT_STRING);

console.log(result);
// {
//   forwarded: true,
//
//   message: "Praesent suscipit egestas hendrerit.",
//
//   email: {
//     body: "Aenean quis diam urna.",
//
//     from: {
//       address: "[email protected]",
//       name: "John Doe"
//     },
//     to: [{
//       address: "[email protected]",
//       name: "Bessie Berry"
//     }],
//     cc: [{
//       address: "[email protected]",
//       name: "Walter Sheltan"
//     }],
//
//     subject: "Integer consequat non purus",
//     date: "25 October 2021 at 11:17:21 EEST"
//   }
// }

How does it work?

Email forwarding (i.e. when you manually forward a copy of an email by clicking the "Forward" button in your email client) is not standardized by any RFC. Meaning that email clients are free to format the forwarded email the way they want.

There is no magic bullet to handle such disparities. The only viable solution is to rely on regular expressions (a lot!), to account for each email client's specificities:

Client Detectable via subject Detectable via separator Subject localized Separator localized All original information available Original information localized Other specificities
Apple Mail Yes Yes Yes Yes Yes Yes --
Gmail Yes Yes No No Yes Only some parts --
Outlook Live / 365 Yes Yes Yes No Yes No --
Outlook 2013 Yes No ? -- ? ? --
Outlook 2019 Yes Yes No Yes No Yes The From and Date parts (only original information available) are embedded in the separator, rather than the body itself
New Outlook 2019 Yes No Yes -- Yes Yes --
Yahoo Mail Yes Yes No Yes Yes Yes The original information are all stuck to each other, without line breaks
Thunderbird Yes Yes No Yes Yes Yes --
Missive Yes Yes No No Yes No --
HubSpot Yes Yes Yes Yes Yes Yes --
IONOS by 1 & 1 ? Yes ? ? Yes ? --
MailMate Yes Yes ? ? Yes ? --

Contributing

Feel free to fork this project and submit fixes. We may adapt your code to fit the codebase.

You can run unit tests using:

npm test

License

email-forward-parser is released under the MIT License. See the bundled LICENSE file for details.

email-forward-parser's People

Contributors

baptistejamin avatar eliottvincent avatar gwatts avatar sc0ttes avatar valeriansaliou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

email-forward-parser's Issues

Mails with blocks added after underscore are not correctly managed

Hi,

Your lib is great! Thank you!

Nevertheless I have an issue when I parse a forwarded message containing an automatically insterted block that is inserted in the end following multiple "_".

A reproducer:

I transfer you that mail.

De : Jorge BARANGUAN <[email protected]>
Envoyé : jeudi 6 avril 2023 16:17
À : Jorge BARANGUAN <[email protected]>
Objet : ***URGENT** 9673155358 nos réf


MY body email...
  ________________________________
  This email (including any attachments) is intended for the designated recipient(s) only, and may be confidential, non-public, proprietary, and/or protected by the attorney-client or other privilege. Unauthorized reading, distribution, copying or other use of this communication is prohibited and may be unlawful. Receipt by anyone other than the intended recipient(s) should not be deemed a waiver of any privilege or protection. If you are not the intended recipient or if you believe that you have received this email in error, please notify the sender immediately and delete all copies from your computer system without reading, saving, printing, forwarding or using it in any manner. Although it has been checked for viruses and other malicious software (\"malware\"), we do not warrant, represent or guarantee in any way that this communication is free of malware or potentially damaging defects. All liability for any actual or alleged loss, damage, or injury arising out of or resulting in any way from the receipt, opening or use of this email is expressly disclaimed.

When performing new EmailForwardParser().read(mailBody, "***URGENT** 9673155358 nos réf"), the lib detects the part after the ____ (This email (including any attachments) is intended for the designated recipient(s) only...) as the forwarded email, hence I cannot extract the from/to information.

Do you think that it could be fixed by removing this groups of _ characters before parsing?

Add types for the library

Hi! This is great, I was starting to write a very crude version and found this great library. Is it possible to add type declarations for TypeScript, and/or would you accept such a contribution to the codebase?

Cannot find module './build/Release/re2.node'

Good day to everyone!
I ran this in js file with node file-ex.js

const EmailForwardParser = require("./email-forward-parser");
const result = new EmailForwardParser().read(MY_EMAIL_STRING, MY_SUBJECT_STRING);
console.log(result);

And got the error below:

Error: Cannot find module './build/Release/re2.node'
Require stack:

  • C:\Users\dmitry\Desktop\email-forward-parser\node_modules\re2\re2.js
  • C:\Users\dmitry\Desktop\email-forward-parser\lib\parser.js
  • C:\Users\dmitry\Desktop\email-forward-parser\lib\index.js
  • C:\Users\dmitry\Desktop\file-ex.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:931:15)
    at Function.Module._load (internal/modules/cjs/loader.js:774:27)
    at Module.require (internal/modules/cjs/loader.js:1003:19)
    at require (internal/modules/cjs/helpers.js:107:18)
    at Object. (C:\Users\dmitry\Desktop\email-forward-parser\node_modules\re2\re2.js:3:13)
    at Module._compile (internal/modules/cjs/loader.js:1114:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1143:10)
    at Module.load (internal/modules/cjs/loader.js:979:32)
    at Function.Module._load (internal/modules/cjs/loader.js:819:12)
    at Module.require (internal/modules/cjs/loader.js:1003:19) {
    code: 'MODULE_NOT_FOUND',
    requireStack: [
    'C:\Users\dmitry\Desktop\email-forward-parser\node_modules\re2\re2.js',
    'C:\Users\dmitry\Desktop\email-forward-parser\lib\parser.js',
    'C:\Users\dmitry\Desktop\email-forward-parser\lib\index.js',
    'C:\Users\dmitry\Desktop\file-ex.js'
    ]
    }

Outlook Desktop does not parse the forwarded email

Hi

We have this email that comes forwarded from Outlook Desktop, and the library does not parse it.

This is the email (one peculiarity is the body is sent in base64) as smtp.

--_000_MWHPR12MB18567616136F0AE4AF3AF9BFCF529MWHPR12MB1856namp_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

CgpGcm9tOiBKb2UgTW9udGFuYSA8am9lbW9udGFuYUBlbWFpbC5jb20+ClNlbnQ6IFR1ZXNkYXksIEphbnVhcnkgMTEsIDIwMjIgMzozNCBQTQpUbzogTWFyeSBNb250YW5hIDxtYXJ5bW9udGFuYUBlbWFpbC5jb20+ClN1YmplY3Q6IFByZXNzaW5nIG1hdHRlcgoKSGkKCkJ5ZQo=

Decoded is:



From: Joe Montana <[email protected]>
Sent: Tuesday, January 11, 2022 3:34 PM
To: Mary Montana <[email protected]>
Subject: Pressing matter

Hi

Bye

NOTE: I had to fake the email and the base64, so in theory it does work.

After some tests and playing around with the text, I was able to workaround this by adding a __ separator at the beginning, kind of tricking the library so it can parse it.

This would be how it would work, but obviously we want to try to avoid this minor hacks.

____________________________


From: Joe Montana <[email protected]>
Sent: Tuesday, January 11, 2022 3:34 PM
To: Mary Montana <[email protected]>
Subject: Pressing matter

Hi

Bye

Can a fix be possible (so we don't have to add the workaround)?

Thanks!

Failed to parse non-indented forwarded emails

We've recently seen many forwarded emails that did not have any indentation (not sure the vendor/client application that does this).

But the fact is that the email cannot be parsed and we cannot extract the contents.

image

Any recommendation for this? Do you consider this is a bug?

Multiple re2 installations, should be a peer dependency

Hello

We found an error because we are using this library, and url-regex-safe at the same time, and both link to node-re2 library.

So when we run jest, we find a malloc issue. See this repo we created https://github.com/blastradius-ai/re2-malloc-error.

The solution would be to put re2 as a peer-dependency, like url-regex-safe has done it in its latest version (https://github.com/spamscanner/url-regex-safe/releases/tag/v3.0.0).

So that way we would have to install re2 ourselves, and only a single instance is created.

Hope this can be done, or if you want we can create a PR for that.

Thanks

Would it be possible to get the layered forwarded emails?

This is more a feasibility question.

If I have this email:

E2
  E1
    E0

Currently we get E0 information. This is good.

Would it be feasible to get E2 and E1 separately? Like in

const allFw = new EmailForwardParser().readAll(emailBodyAsText, emailSubject);

allFw[0] == E0
allFw[1] == E1
allFw[2] == E2

(backwards would also be ok)

I know the lib can't today, but the question would it be sound and feasible?

Thanks!

GMail Plain Text Wrapping

Thanks for providing this library, it's excellent.

I wonder if you've ever seen an issue we're seeing where when we pass the plain-text email text to the library which has come from gmail, the lines are wrapped at 78 characters. This means that the cc list doesn't get parsed correctly and we end up with incomplete recipient lists. An example of how those emails look is shown below.

---------- Forwarded message ---------
From: Sender <[email protected]>
Date: Fri, 25 Feb 2022 at 18:08
Subject: Test Email
To: Recipient <[email protected]>
Cc: Recipient 1 <[email protected]>, Recipient 2 <
[email protected]>, Recipient 3 <[email protected]>


Email Start Here...

Running this package with node18 in aws lambda results in GCLIB errors

I'm getting the error below when attempting to run node18 with this package installed.

ERROR	0 Error: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /var/task/node_modules/email-forward-parser/node_modules/re2/build/Release/re2.node)
    at Module._extensions..node (node:internal/modules/cjs/loader:1243:18)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/var/task/node_modules/email-forward-parser/node_modules/re2/re2.js:3:13)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12) {
  code: 'ERR_DLOPEN_FAILED'
}

I think this could be remedied by updating the re2 library to the latest version, so that it supports new targets.

Specifically, the 1.16.0 version does not seem to have targets for node18, while the 1.18.0 latest build does.

Would it be possible to add this update? Would you prefer I make a PR for this?

error while importing package

I'm trying to use this package but my builds fail with the following error:

ERROR in ../../email-forward-parser/node_modules/re2/re2.js 3:12-42
Module not found: Error: Can't resolve './build/Release/re2' in './node_modules/email-forward-parser/node_modules/re2'
 @ ../../email-forward-parser/lib/parser.js 4:34-48
 @ ../../email-forward-parser/lib/index.js 4:13-32

webpack compiled with 1 error
Environment: darwin, node 14.19.1,

Python version?

Hey this is more kind of a question than an issue. Is this package available as a python package? I'd be very keen to see one as I'm currently developing an email alias service that runs on Python :D

Outsource regexes?

In regard to #10 I think it would be very beneficial if the regexes would be outsourced to a single repo focused only on them. This would allow developers to use and update them more easily (which will also help you! :D).

To integrate them into other apps one may download them for example on a nightly basis via ci:cd and include them into their package.

What do you think? I'd be very happy about it! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.