Coder Social home page Coder Social logo

qqilihq / partial-emlx-converter Goto Github PK

View Code? Open in Web Editor NEW
71.0 2.0 7.0 2.5 MB

Convert .emlx and .partial.emlx files created by Apple’s Mail.app to .eml

License: MIT License

TypeScript 90.58% JavaScript 8.40% Shell 1.03%
eml eml-files emlx partialemlx

partial-emlx-converter's Introduction

📧 .emlx and .partial.emlx to .eml converter

Actions Status codecov npm version

This script converts .emlx and .partial.emlx files written by Apple’s Mail.app into fully self-contained, “stand alone” .eml files which can be imported and opened by a great variety of email applications (Mail.app, Thunderbird, …).

Apple uses these formats for internal storage (see ~/Library/Mail/Vx), and under normal circumstances you will not come in contact with those files. Unfortunately, one of my IMAP mailboxes went out of service and I was not able to copy all the messages to a different account with Mail.app, even though all mails and attachments were there (see here for the story).

That’s why I created this script.

Installation

With Homebrew

This is the easiest way if you’re not a Node.js developer. Install the script and all dependencies with Homebrew:

$ brew install qqilihq/partial-emlx-converter/partial-emlx-converter

I would like to make this script available in the Homebrew core repository as well, but for this the project needs more ⭐️ and 🍴 — please help!

With NPM/Yarn

Use a current version of Node.js (currently built and tested with v10.15.3 LTS) and run the following command to install the script globally with npm:

$ npm install --global partial-emlx-converter

Usage

Run the script with at least two arguments: (1) Path to the directory which contains the .emlx and .partial.emlx files, (2) path to the existing directory where the results should be written to.

$ partial-emlx-converter /path/to/input /path/to/result

Optionally, you can specify --ignoreErrors as third argument. This way, the conversion will not be aborted in case there’s an error for a file (see the log output for details in this case).

Build

Use a current version of Node.js (currently built and tested with v10.15.3 LTS). Install the dependencies, run the tests, and compile the TypeScript code with yarn or npm:

$ yarn
$ yarn test
$ yarn build

Releasing to NPM

Commit all changes and run the following:

$ npm login
$ npm version <update_type>
$ npm publish

… where <update_type> is one of patch, minor, or major. This will update the package.json, and create a tagged Git commit with the version number.

After releasing a new version, remember to update the Homebrew formula here.

About the file formats

Disclaimer: I figured out the following by reverse engineering. I cannot give any guarantee about the correctness. If you feel, that something should be corrected, please let me know.

.emlx and .partial.emlx are similar to .eml, with the following peculiarities:

.emlx

These files start with a line which contains the length of the actual .eml payload:

2945
Return-Path: <[email protected]>
X-Original-To: [email protected]

The number 2945 denotes, that the actual .eml payload is 2945 characters long, starting from the second line.

At the end, these files contain an XML property list epilogue, which holds some Mail.app-specific meta data. Using the given character length at the file’s beginning, this epilogue can be stripped away easily and an .eml file can be created.

Edit: Later, I found those additional sources, which basically confirm my findings:

.partial.emlx

Mail.app uses this format to save emails which contain attachments. Attachments are saved as separate, regular files relative to the .partial.emlx file. Afaik, Apple does this due to Spotlight indexing.

Mail.app’s internal file structure looks as follows (nested into two further hierarchies of directories named with number 0 to 9):

Attachments/
  1234/
    1.2/
      image001.jpg
    2/
      file.zip
  …
Messages/
  1234.partial.emlx
  …

1234 is obviously the email’s ID. The Attachments directory contains the raw attachment files, whereas Messages contains the messages stripped of their attachments (and .emlx files, for messages which did not contain any attached files in first place).

The subdirectories 1.2 and 2 in above’s example are numbered according to their positions within the corresponding email’s Multipart hierarchy.

To convert a .partial.emlx file into an .eml file, the separated attachments need to be re-integrated into the file.

Credits

Without the following modules I would probably be still working on this script (or have given up on the way). Thank you for saving me so much time!

Beside that, here are some resources which I found very helpful during development:

Contributing

Pull requests are very welcome. Feel free to discuss bugs or new features by opening a new issue. In case you submit any bug fixes, please provide corresponding test cases and make sure that existing tests do not break.


Copyright (c) 2018 – 2023 Philipp Katz

partial-emlx-converter's People

Contributors

dependabot[bot] avatar qqilihq avatar slokhorst avatar yuergen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

partial-emlx-converter's Issues

Different boundary strings crash

While trying to convert a Mail/V3 inbox:

partial-emlx-converter INBOX.mbox/E511469D-DF47-4B2F-A77E-B6611C8680B2/ eml/
Converting [========--------------------------------] 20% 511.1s Data/1/6/Messages/61439.emlx(node:72447) UnhandledPromiseRejectionWarning: Error: Different boundary strings (expected '----=_NextPart_7ae48436ccb4c946256817a6c56cb01c', got: '----=_NextPart_7ae48436ccb4c946256817a6c56cb01c_alt')
    at parts.forEach.part (/usr/local/lib/node_modules/partial-emlx-converter/dist/converter.js:87:19)
    at Array.forEach (<anonymous>)
    at writeBody (/usr/local/lib/node_modules/partial-emlx-converter/dist/converter.js:80:11)
    at /usr/local/lib/node_modules/partial-emlx-converter/dist/converter.js:61:13
    at Generator.next (<anonymous>)
    at fulfilled (/usr/local/lib/node_modules/partial-emlx-converter/dist/converter.js:5:58)
(node:72447) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:72447) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I have included Data/1/6/Messages/61439.emlx (with some data replaced with REDACTED, I haven't touched the part names/headers). Note that it doesn't contain anything like _NextPart_7ae48436ccb4c946256817a6c56cb01c or the _alt variant. Please let me know if there's some other file I can include to help diagnose the problem.

Edit: I think this is the relevant emlx instead: 11507.emlx

Attachment hunt may fail on large folders?

I believe I'm seeing a problem where attachments seem to fail for a folder structure with more than 100,000 emails, but not for ones with fewer.

My Node skills are awful, but I'm wondering if this is related to how the search for the attachment folders works, e.g., maybe it's looking in Data/1/1/Attachments while it should be looking in Data/1/1/1/Attachments or something along those lines.

Other conversions were successful, and I'm most grateful!

Save mails with Windows Line Ending

EMLs that are not saved with Windows line ending (\r\n) but instead with UNIX line ending (\n) are not displayed properly in Outlook, thus making them seem inconsistent / faulty after converting.

All other mail applications can read EML files with Windows line endings just fine, so this seems like the more compatible way to save them. Alternatively this could also be a command line flag.

Conversion starts but fails after a while with node V8.10 ubuntu

Hello

while converting I hit this error

Processing Messages/10752.emlx (node:644) UnhandledPromiseRejectionWarning: Error: Different boundary strings at parts.forEach.part (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:76:19) at Array.forEach (<anonymous>) at writeBody (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:69:11) at parts.forEach.part (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:88:13) at Array.forEach (<anonymous>) at writeBody (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:69:11) at emlformat.parse (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:52:17) at Object.emlformat.parse (/home/qube/Téléchargements/partial-emlx-converter/node_modules/eml-format/lib/eml-format.js:554:5) at Promise (/home/qube/Téléchargements/partial-emlx-converter/build/src/converter.js:45:19) at new Promise (<anonymous>) (node:644) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3) (node:644) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. qube@ipv4:~/Téléchargements/partial-emlx-converter/build/src$
I hope it helps, thank you

Conversion stops when attachment cannot be found

It seems that Apple Mail loses a lot of attachments, when mail has been stored for years. When I try to convert my mailboxes, I get lots of errors like these:

Processing 728F079D-C331-494A-844E-885BC16E5661/_Neu.mbox/E401EF12-8C7D-40D6-B4A7-784D2410BE37/Data/8/8/Messages/88886.partial.emlx
(node:4391) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open '/home/juergen/mail/V5/728F079D-C331-494A-844E-885BC16E5661/_Neu.mbox/E401EF12-8C7D-40D6-B4A7-784D2410BE37/Data/8/8/Attachments/88886/1/Technical English.zip'
    at Object.fs.openSync (fs.js:646:18)
    at Object.fs.readFileSync (fs.js:551:33)
    at transformRec (/home/juergen/github/partial-emlx-converter/build/src/converter.js:116:29)
    at parts.forEach (/home/juergen/github/partial-emlx-converter/build/src/converter.js:98:36)
    at Array.forEach (<anonymous>)
    at transform (/home/juergen/github/partial-emlx-converter/build/src/converter.js:98:11)
    at emlformat.parse (/home/juergen/github/partial-emlx-converter/build/src/converter.js:49:17)
    at Object.emlformat.parse (/home/juergen/github/partial-emlx-converter/node_modules/eml-format/lib/eml-format.js:554:5)
    at Promise (/home/juergen/github/partial-emlx-converter/build/src/converter.js:45:19)
    at new Promise (<anonymous>)
(node:4391) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
(node:4391) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I suggest skipping these mails during conversion and maybe storing them in a folder named rejects. I am doing that manually now.

I have no idea about node.js, so I will not be able to provide code for this. :(

Aside from that partial-emlx-converter is a great tool. Thank you for your effort!

Missing line break after boundary string lets conversion fail

For some messages I am converting I get errors like this one:

(node:8919) UnhandledPromiseRejectionWarning: Error: Different boundary strings (expected '===============7980816086339553567==', got: '===============7980816086339553567==-')
    at parts.forEach.part (/home/juergen/github/partial-emlx-converter/build/src/converter.js:76:19)
    at Array.forEach (<anonymous>)
    at writeBody (/home/juergen/github/partial-emlx-converter/build/src/converter.js:69:11)
    at emlformat.parse (/home/juergen/github/partial-emlx-converter/build/src/converter.js:52:17)
    at Object.emlformat.parse (/home/juergen/github/partial-emlx-converter/node_modules/eml-format/lib/eml-format.js:554:5)
    at Promise (/home/juergen/github/partial-emlx-converter/build/src/converter.js:45:19)
    at new Promise (<anonymous>)
    at processEmlx (/home/juergen/github/partial-emlx-converter/build/src/converter.js:44:12)
    at /home/juergen/github/partial-emlx-converter/build/src/converter.js:30:38
    at Generator.next (<anonymous>)
(node:8919) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
(node:8919) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

This following excerpt is from the emlx file:

--===============7980816086339553567==-<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>

Other mails look like this:

--===============7104596152329226118==--
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">

The missing line break seems to be the culprit.

Add paypal/donate link to the README

I scoured the web off and on for 6 months for a tool to save my huge backed up email archive (whose IMAP is now offline) from death at the hands of Apple's nonstandard format. Just when I was about give up (and attempt to hack together something myself and probably fail) I found your partial-emlx-converter. PLEASE let us buy you a drink!

Error interrupted conversion

When I ran the script, it converted exactly 10 messages, and then I got this error, which unfortunately terminated the process.

(node:84936) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open '../archive.mbox/8C963429-A03A-442E-BFCA-76C71193F741/Data/0/1/Attachments/10010/2.2/map_6e1b1a73-4f66-484b-bad9-56004041cc80'

It'd be great if there were a way to just skip attachments it can't find. I'm trying to recover messages from a Time Machine backup. I'd be perfectly content if it just left out all attachments.

TypeError: Cannot read property 'readFile' of undefined

Was hoping this would work, moved to a new computer and the import/Catalina/something broke and a lot of my old email archive mailboxes only show blank message windows. I tried to rebuild and it broke it worse. I have a copy of the pre-migration mailboxes, but doing an import in Mail on them only brings in avery small percentage.

Anyway, I am getting the error 'TypeError: Cannot read property 'readFile' of undefined' for each emlx file in the folder when i try this on a 10.15.2 system.

Non-Unicode encoding of message to convert

First, a lot of thanks for your question at https://apple.stackexchange.com/questions/312942/recovering-emails-from-defunct-imap-account, showing the problem I was experiencing today was real, and for this solution. This works great, and was a life-saver. A suggestion maybe: in the installation instructions, it might be worth mentioning, for the non-programmer, that you need to cd to the uncompressed directory partial-emlx-converter-master first, and use sudo with npm.

I met an issue with one message, which had been forwarded to me twice the same day, by a Mac user using Mail and a Windows user using Thunderbird. The message included accented (French) characters. The version forwarded by the Mac user looked perfectly fine after applying your script; it turns out it was UTF-8, namely the forwarded content started with

--Apple-Mail=_F89F0554-9AC4-4DD0-8186-2C5ED0B9DB79
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-Type: TEXT/PLAIN;
charset=utf-8

and was followed by UTF-8 text. The version forwarded by the Windows user had all accents mixed up after applying the script (see the attached screenshot, first version left, second version right); it was Windows Latin 1, namely the forwarded content started with

--------------9195395EF01B76400F9EFD5B
Content-Transfer-Encoding: 8BIT
Content-Type: TEXT/PLAIN;
charset=windows-1252;
format=flowed

and was followed by Windows Latin 1 text. My impression is that the script read the message source as UTF-8 instead of Windows Latin 1.

The problem was finally solved by opening both the original .partial.emlx file and the converted .eml file in a text editor, making sure both were opened as Windows Latin 1, then pasting the message body from the original file to the converted file, saving the result as Windows Latin 1.

Capture d’écran 2020-11-20 à 17 34 40

Attachments were not converted

Great! My partial.emlx files were successfully converted to eml files, but all the converted eml files did not contain any attachment file... although the source emlx files had attachments (cf. attached data source).
I explored your code, and found "data.body" (after parsing elm) were not an array, but was strings including multi-part border strings, which means the multi-part eml sources were not parsed correctly. This resulted in appending contents of the original emlx files only, without attaching any file. Do you have any concern on this?

Environment: MacOS X 10.13.4, Node v6.11.0, Apple Mail 11.3
Data source: data.zip

Many thanks in advance!

TypeError: this.stream.clearLine is not a function

I'm getting the following error when I run it:

TypeError: this.stream.clearLine is not a function
    at ProgressBar.interrupt (/usr/local/Cellar/partial-emlx-converter/3.0.2/libexec/lib/node_modules/partial-emlx-converter/node_modules/progress/lib/node-progress.js:210:15)
    at processEmlxs (/usr/local/Cellar/partial-emlx-converter/3.0.2/libexec/lib/node_modules/partial-emlx-converter/dist/converter.js:26:17)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.