The email from slightlyloony

What is EmailService?

EmailService is general purpose email service provider module, handling both outbound and inbound emails. It does not provide an email service with SMTP, POP3, IMAP4, and so on; it relies on third-party providers such as GMail or Amazon SES for those. It is not a standalone program, but rather an embedded service provider with a programmatic interface that your code can call to send or receive email.

Ways that automatically transmitted emails are used

Probably the first thing that comes to mind with automated email are the marketing emails (spam!) that we all hate getting. Generally speaking, these bulk emails are identical emails that are sent out to many email addresses. These are by far the simplest type of automated email — when less than a few hundred destination email addresses are involved, this can be as simple as a single email being sent to a list of email addresses (any combination of To, CC, and BCC).

Automated emails aren’t necessarily bulk emails, however — they can be emails that are specific to a single email address. For example, Amazon sends me confirmations of the orders that I’ve placed with them. Those emails go only (I hope!) to me, and the content is specific to me. The industry term for this sort of email is transactional email, because most often they are part of a transaction of some kind. However, the key element that distinguishes transactional email from bulk eail is that the former is customized (or, if you will, personalized), and the latter is generic; identical for every recipient.

It’s worth noting that there really isn’t a bright line between the two kinds of automated emails. For example, I’m working toward providing automated weather report emails to our local community. I’ll likely end up with a few hundred subscribers. Some of them will get simple text reports, some will get simple graphical reports (with images of thermometers, rain gauges, etc.), some will get detailed reports that include historical information, and some will get still other kinds of reports. Perhaps I’ll end up with a half dozen or so kinds of reports. So does that mean I have a half dozen different bulk emails going out daily? Or does it mean that I have transactional email that happens to have only a half dozen variations, instead of one per person? It doesn’t really matter except for one technical detail: you’ll likely use different capabilities within EmailService if you approach it as several kinds of bulk mail versus transactional email with a few variations.

While EmailService certainly supports bulk mail, most of what it does is aimed at supporting transactional email — the more complex of the two approaches.

Ways that automated processing of received emails is used

Automated processing of received emails (APRE) is relatively uncommon for consumer-focused applications, though I have used a few over the years. Usually they’re quite simple, for instance asking you to reply to an email with a key word in the subject line of your reply. An automated system then picks up the reply and takes some action based on it. I’m planning to use APRE for more in my own systems, mainly for remote administration. EmailService has simple support for APRE, mainly consisting of the ability to monitor multiple inboxes and forward received emails.

A bit about email addresses and identities

I tend to think of email addresses as belonging to a person, one individual — but that is not always the case. I’ve run into quite a few systems that use email addresses as identifiers, but this turns out to be quite problematic. Here are some common circumstances that will illuminate why:

People change their email addresses. I used to have a Yahoo email address, then I had a generic gmail address, and now I have a gmail address that’s on my own domain. How confusing!
People may have multiple email addresses (I have 11 at the moment, and have had as many as 30).
The email address may be for an email group, not an individual. Many email servers support email groups. Each member of the group has their own inbox, but if an email is received at the group’s collective address, a copy of that email goes to each member of the group (and the sender has no way of knowing who actually gets that email).
The email address may be for a machine or system, and not a person at all. There are many systems that use email for inter-process communications, especially where connectivity is spotty. Some also use it as a way of getting input from users, and such a system is one of the main reasons I’m writing this package.
The email address may be shared. I know several couples who have but a single email address, and they actually have a single inbox that both of them use. Shared email addresses are especially common in organizations, as an alternative to an email group.
The email address may be for an organization, not an individual. For instance, many companies have generic email addresses like [email protected] or [email protected]. These addresses may actually be for a group within the organization, it may be a shared inbox, or they may go to an individual — and that individual may change over time.

The upshot of this is that it really isn’t a good idea to think of an email address being some sort of unique identifier of a person. The notions of identity and of email addresses are certainly connected, just not one-to-one — it really is a many-to-many relationship.

Why does this matter to you? Well, if you’re sending emails with personalized information, it means your system really should not use an email address as a unique identifier of a person (or organization, or machine). For all the reasons above, your system should separate the notions of identities and email addresses, but maintain many-to-many relationships between them.

Plain email versus HTML email

Almost all modern email clients — whether on a computer, phone, or tablet — are capable of rendering HTML mail, complete with colors, different fonts, images and photographs, etc. Many people have never even seen a plain text email! However, email clients do exist that are not capable of rendering HTML mail. Most of these clients are used from the command line on systems like Linux, but not all of them. I ran into someone a few weeks ago whose Windows laptop was running the Claws email client — with no capability for displaying HTML mail. He liked Claws because it was zippy.

So how do you decide whether to use HTML mail or plain text email? The simplest answer is to leverage a standard part of email technology to send both. That way you’ll make everyone happy!

A more challenging issue, really, is around the features you leverage within the HTML of an HTML email. Unfortunately there is no standard for this, and the various email clients vary wildly. If you’re a developer, I’m sure you won’t be surprised to hear that Microsoft’s products are, in general, the most "out there" of them all. There are dozens and dozens of web sites devoted to all the nasty details of designing HTML emails. After reading through way too many of them, I’ll boil their advice down to three things:

Use tables for layout.
Stick with the simpler, older HTML constructs.
Don’t expect JavaScript to work.

Some background on how email actually works

When I first started working on this project, I had a mental model for email that turned out to be far, far too simple. This section is a sort of primer on modern email, with an emphasis on the things that were new or surprising or unobvious to me. Writing it all down helps embed it in my own mind, and may be useful for you, too.

Email sending protocols

SMTP (Simple Mail Transfer Protocol) seems to be the only protocol in general use for sending email. In that sense, it’s pretty darned close to an actual standard, though there are annoying variants that one has to deal with in code.

Email receiving protocols

POP3 (Post Office Protocol version 3) and IMAP (Internet Message Access Protocol) are the only two in common use. POP3 is far older and less capable (for instance, it knows nothing about folders), and probably really shouldn’t be used anymore. In EmailService I’m using IMAP.

Email protocols in Java

There are many projects out in the open-source world that implement some combination of SMTP, POP3, and IMAP, but there is one that absolutely dominates: JavaMail, actively developed at [Eclipse Foundation], who took over in 2017, and renamed it Jakarta Mail. I’ve chosen Jakarta Mail (version 2.0.1) as the protocol implementation for EmailService.

Email providers

Many organizations host their own email servers, especially (and most unfortunately) with Micrsoft Exchange (hit with quite the hacker problem in 2021). Most individuals and smaller companies use third-party email providers, the dominant one of which is gmail. Any of these providers can be used with EmailService so long as Jakarta Mail can be configured to support it — and so far I’ve not found any email providers that Jakarta Mail couldn’t support.

The components of an email

There are some basic components of email that we all are familiar with. Well, that we all think we’re familiar with, because we see them in our email inbox:

Addresses: The To, CC, and BCC that together determine who receives the email.
Subject: The brief little one-liner that is intended to give the recipient some clue about what the email concerns.
Body: The actual content of the email.
Attachments: Files that are transmitted along with the email.

That’s not really how emails are put together, however. In reality, there are just two parts to an email: the headers, and the body (aka message):

The Headers are a series of fields that convey all sorts of information, including parts of the email itself (the addressees, the subject, etc.) and a sort of log of the email’s passage through the Internet.
The Body (Message) contains the contents of the email as you see it when you open the email. This can be just plain text, but more often has MIME-encoded content including HTML, inline images, attached files, etc.

Anatomy of an Email Address

Most of us are by now quite familiar with these two valid formats when an email address is contained in a string:

mailbox@domain (like [email protected])
Display Name<mailbox@domain> (like Tom Dilatush<[email protected]>)

It may surprise you to know that there are actually quite a few other details, and some of which may surprise you. Wikipedia has a good article about them.

Some things that surprised me:

The mailbox part is technically case-sensitive — so [email protected] and [email protected] should be two independent email addresses. I’m not sure I’ve ever seen that in the wild, and apparently both server and client support is spotty, but there it is.
A surprising variety of characters are totally legal in a mailbox name. So, for example, the mailbox name {Agent#91} is totally ok.
Even more characters are allowed if you quote the mailbox name with double quotes. For instance, the mailbox name "I..might..be..crazy!!!" is fine, as is "This is a valid mailbox name!@#$%^".
You can also escape individual characters with a backslash (the documents call that "quoted characters"): This\ mailbox\ name\ is\ fine.
The mailbox part cannot be more than 64 characters long.

To, CC, and BCC

These three header fields together specify what email addresses should receive the email. All three email address groups (To, CC, and BCC) can accept a number of email addresses, including zero (although if all three have no addresses your email isn’t going anywhere!). The format is simple enough: just a comma-separated list of email addresses formatted as described in Anatomy of an Email Address. The maximum number of addressees in each group is dependent on the SMTP provider, and there doesn’t seem to be any convention to this, much less a standard. In every case I’ve seen personally, you can have well over 100 addresses in each of those address groups. Some vendors (I’m looking at you, gmail!) limit the total number in all three, rather than having a limit in each address group.

Most likely you’re familiar with the behavior of To, CC, and BCC — but just in case you’re not, here’s a summary.

To: All the email addresses on the "To" list are visible to everyone who receives the email. On "Reply All", every email address on the "To" list receives a copy of the reply.
CC: All the email addresses on the "CC" list are visible to everyone who receives the email. On "Reply All", every email address on the "CC" list receives a copy of the reply. Note that the behavior of email addresses on the "CC" list is identical to that of the email addresses on the "To" list — the "CC" lists exists as a cue to the human reader. If his or her email address is on the "To" list, that’s a cue that this email is directed toward them, and action may be expected. If the email address is on the "CC" list, that’s a cue that he or she is being sent a copy of the email just for their interest, and no action is expected.
BCC: All of the email addresses on the "BCC" list are invisible to everyone who receives the email. On "Reply All", no email address on the "BCC" list receives a copy of the reply. The "BCC" list primarily allows addressees to receive a copy of the email without the knowledge of the "To" or "CC" email addressees. Sometimes it’s also used to protect people from the horrifying consequences of "Reply All" email storms.

Subject

This is the simplest component of an email, but even it has its complications! While there is no maximimum length that I could find, there is a practical limit: most inboxes on computer clients only show the first 50 or 60 characters of the subject line (generally truncating with an ellipsis, like This is my WAY too long…). Many mobile email clients, when used on a phone, only show 20 or 25 characters. These limits mean that short subject lines are definitely better, and that the information in them should be front-loaded, so that if some of the subject line is truncated, the poor user can still figure out what the mail is about.

HTML is not specifically disallowed in the subject line, but I’ve never found an email client that would actually render it. The RFCs that control email format still specify the subject line as ASCII, but you can control the character encoding of an email, including on the subject line. EmailService defaults to UTF-8, but you can change that if you want something different.

Body (Message)

The body of a modern email can be just ridiculously complex. This complexity was all enabled by a standard called MIME (Multipurpose Internet Mail Extensions). Prior to the advent of MIME, email bodies were just plain ASCII text — ah, those were such simple days! MIME basically standardizes a way of encoding things other than ASCII-encoded text into plain ASCII-encoded text — things such as HTML, images, audio, video, attachments, and much more.

The body of a modern email can also be very simple: just a string of ASCII characters. That’s how email started out, and that sort of simple email actually still works just fine. However, the result is not the fancy thing we’re all used to in our email these days.

Much more typical today — and most likely, the kind of email you’d like to send — is an email composed in HTML, perhaps with photographs, graphics, and possibly even videos. You’d still want it to work for a recipient who had an email client that couldn’t handle HTML, though. You can do all of this by using MIME, which is fully supported by EmailService and Jakarta Mail (which EmailService depends on). Suppose, for instance, that you wanted to create an email that used HTML to format the body, with two inline images (more on that later), and a plain text message for those recipients who couldn’t read HTML email. To do this, you’d create a tree of MIME nodes that looked like this:

               (a) multipart: alternative
                   |                    |
        b) content: text/plain      (c) multipart: related
                                        |      |       |
                                        |      |       +-- (f) content: image/jpeg
                                        |      +-- (e) content: image/png
                                        +-- (d) context: text/html

You can see there are two kinds of nodes: "multipart" nodes that are simply nodes that contain other nodes, and "content" nodes (that are always leaf nodes) that carry some kind of content. Taking it one piece at a time:

A multipart whose children are alternative "views" of the email. In this case, a plain text view and an HMTL view. Email clients are supposed to prefer the last alternative that they’re capable of displaying.
The plain text content, which should only be displayed in email clients that cannot display HTML.
A multipart whose children are all related to each other — in this case, they’re all pieces of the HTML email.
The HTML document, which among other things has <img/> tags that refer to the two images (following), via specially formed URLs.
One of the images referred to by the HTML document.
The other image referred to by the HTML document.

Here’s what the MIME document looks like for the MIME tree outlined above, except that the encoded image data is elided to keep this to a reasonable length:

MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_1_1911152052.1618240895083"

------=_Part_1_1911152052.1618240895083
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Who cares what I say in here?
------=_Part_1_1911152052.1618240895083
Content-Type: multipart/related; boundary="----=_Part_0_1644231115.1618240895077"

------=_Part_0_1644231115.1618240895077
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

<html>
<p>
<img width="20" src="cid:0"/>
<img width="30" src="cid:1"/>
<img width="40" src="cid:0"/>
Look at the pretty image below!
</p>
</html>

------=_Part_0_1644231115.1618240895077
Content-Type: image/png
Content-Transfer-Encoding: base64
Content-ID: <0>
Content-Disposition: inline

------=_Part_0_1644231115.1618240895077
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
Content-ID: <1>
Content-Disposition: inline

------=_Part_0_1644231115.1618240895077--
------=_Part_1_1911152052.1618240895083--

A few things worth noting in this:

The colored bits are boundary markers, which are between the pieces of a multipart node. There are three boundary markers (in red) for the alternative multipart — the first containing the plain text alternative, and the second the HTML alternative. There are four boundary markers (in blue) within the HTML alternative, marking the boundaries of the three related leaf nodes: the HTML document and the two images. This all looks like gobbledegook at first glance, but it’s actually not hard to read or understand.
At the start of each MIME piece there are MIME headers. These are all the lines after the boundary, but before the blank line. These are how the type of each MIME piece is encoded.
The <img/> HTML tags have source URLs of the form "cis:<number>". The "cis:" prefix is how the special URLs that refer to related items are formed.
Note how the HTML alternative comes after the plain text alternative. This order tells the email client to render the HTML alternative if it can, but to fall back to the plain text alternative if it cannot.

Inline vs. Attachments

In the example above, the images were inlined - but what does that actually mean? It means that the images will be treated as part of the HTML document, and will display within the document as determined by the <img/> tags and other layout directives.

The alternative is to attach the image, which EmailService can also do, and which can sometimes have advantages. Some email clients will display image attachments below the body of the email. Others will make the user download the attachment and then open it with another application. That may not sound wonderful, but if you’re sending emails to people using email clients that cannot display HTML mail, it may actually be a useful thing.

Headers

There’s one more piece of an email, one that’s invisible to a normal user reading an email on an email client, but that’s very important to how email works: that’s the email headers. Here’s an example taken from an actual email (with some private information changed):

Delivered-To: [email protected]
Received: by 2002:ab3:1617:0:0:0:0:0 with SMTP id b23csp4814988lta;
        Tue, 30 Mar 2021 14:10:04 -0700 (PDT)
X-Received: by 2002:a9d:7e8d:: with SMTP id m13mr28412924otp.54.1617138603888;
        Tue, 30 Mar 2021 14:10:03 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1617138603; cv=none;
        d=google.com; s=arc-20160816;
        b=LGXw4xpUCRqyqWE/z9KjgO5YBsk3unCauikBjNC92mJnQNMtC3CkBlkf6cGb/bv34E
         OO8b+t2l7ZQGUFr1Ri9jb5NyfBxVmOJu58u+OR3h2eKM6GLL8Q+3rvqkBXXGB5fsAaKe
         8SkqVzt9XYgMwxmaQqDs9s63LCKXxE50qkCZgKfk4WsT5z0TBCkq6qi6InI17uSb3qdP
         qZlqTrG4DxDj0crOCm7wsRAU/JKdUAPLuUC9CMe0+okkearSbvLbhbmpETqd2cETTb6W
         OlW3ub+YPPDprObTtGnZ3DgL6HJGAriF3wZyJQUm+rPyM6PmZbJg7jBbEdz/8HKNSHuN
         naEQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=content-transfer-encoding:mime-version:subject:message-id:to:from
         :date:dkim-signature;
        bh=aNLzqLLf9B0IoZxm+ZlN89BgZuIGKLPQqRumylaJ5Uc=;
        b=TuuOMEbi8RuKlw2a3yv9KUoQfXFL/jaX8h4R/nmzVnax09d5Kve8Zmk6ZFlSxuaD45
         dKe49we2vHp7JCNVIJl/0ZMGxwH/0vL00FbnwI4/uaTuep/aXHbVSszeDrCAKGFSwdRl
         WhjfG9AC4LU7N8++3Yher9BlytH3dS8V5/TQ1PQPHZHFvtf179lF7hQS6GGEAfBFpQT7
         j8XAXCmybbUJguta6aC6f9XCq038pwy2xm9m9ez5FJawUAEEt4txvhb7Wua2jcq2g63h
         5hXKGjmGjODwLdMPy/dASTDEZeguX950y3kf/4D/ZRILWTV/REQhaRIfbQR9JIrzzcv2
         5DUg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass [email protected] header.s=20150623 header.b=N1A1lPUr;
       spf=neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of [email protected]) [email protected]
Return-Path: <[email protected]>
Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41])
        by mx.google.com with SMTPS id a17sor18773otr.38.2021.03.30.14.10.03
        for <[email protected]>
        (Google Transport Security);
        Tue, 30 Mar 2021 14:10:03 -0700 (PDT)
Received-SPF: neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of [email protected]) client-ip=209.85.220.41;
Authentication-Results: mx.google.com;
       dkim=pass [email protected] header.s=20150623 header.b=N1A1lPUr;
       spf=neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of [email protected]) [email protected]
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=dilatush-com.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:message-id:subject:mime-version
         :content-transfer-encoding;
        bh=aNLzqLLf9B0IoZxm+ZlN89BgZuIGKLPQqRumylaJ5Uc=;
        b=N1A1lPUrH7hF1ihsF3f617cR9lBZUzfAoi4wncvCHRKVTDsD2pSA+FBXAZ83F2c7kD
         RjT6T6EgpjIwyyxudb+hPhUQjoCa8lfwTDIu27tNp49NFEQp3zsm9GRvw5SGVHz4JeT2
         N7SlxGiJsVXZqjy64DgUgCR9VHNxCQK9S0CFY3erI4haWPUhuPbe8q6KAfZS/2vwyJyc
         wUA6IE0bsziacsK8oz3epG6p+N8XgfhXkqvSuigRXlhxcQEp8GK6pjzxv6jJcH+4LIOL
         qgCaJM0NRp6uP+9EBJEtU4CC61A7JdnE0ID4N5J+ECvenud+ZORGRnopE+OeWkLksYN3
         nDOg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:message-id:subject:mime-version
         :content-transfer-encoding;
        bh=aNLzqLLf9B0IoZxm+ZlN89BgZuIGKLPQqRumylaJ5Uc=;
        b=CXwG4EwsA+h1ePlWDXJ5AOOUPWAf6rCkHboIhS/ozSvnIX+b1r5Pf80wNe9h7B0sOX
         iSP+g5CfgDn4unTNw1DK17Xra6l3PHpr6PxuuMDOlR6DpkGs86MhE4GDxGmhRyiJzGVf
         S72QTiuSDFXcTRnmJLCJx/CFEZqJbJhyUb45XF8lvD6bBik+ZwDboLKWplDgUWevGX6S
         idjLvcgGzs8gaYQvDGh5LmF60SVXfEdIFFSr/1NSlIOTGrcA7Ah7fFb2CrjYoltyVw2G
         2ft3a5cyTDcLN3I5U2phVN9OE8u9IeDBuDBlPBNKQlA+CacrjpwpqoYgg6ULP5GKxSYM
         t48g==
X-Gm-Message-State: AOAM531HlBFNlJwnM3v/HBFHHBkkCMjnyIg6c3HnsXPPNPRub6g7/iGw
	voTxfL3vibDnVUOeew9EKfynigzWXq433sCW
X-Google-Smtp-Source: ABdhPJzjEAunQnlgITeEtpfcDttOkSoMDk5Q1CfIJuKRdf83QNlYFehTeT6ML9LcVmKSN61gbl7Qpg==
X-Received: by 2002:a9d:65c6:: with SMTP id z6mr28043870oth.232.1617138603203;
        Tue, 30 Mar 2021 14:10:03 -0700 (PDT)
Return-Path: <[email protected]>
Received: from 10.3.254.57 (c-71-199-18-153.hsd1.ut.comcast.net. [71.199.18.153])
        by smtp.gmail.com with ESMTPSA id l191sm43088oih.16.2021.03.30.14.10.02
        for <[email protected]>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 30 Mar 2021 14:10:02 -0700 (PDT)
Date: Tue, 30 Mar 2021 15:10:02 -0600 (MDT)
From: Burger Empire <[email protected]>
To: [email protected]
Message-ID: <1422222071.0.1617138602673@[10.3.254.57]>
Subject: Test

A lot of these header fields are interesting; you can read about their meaning here (or do a little searching for the name of the header — the part terminated by a colon(":")). The headers are in reverse chronological order: the most recent headers appear first in the text.

What I found most interesting was the way you can read the history of how an email was forwarded from one email server to another (headers are added at each hop). For our purposes here, though, there are only a few that matter:

Date: The date and time that the message was written, which by default means when it was first transmitted to an SMTP server.
From: What email address the email was sent from. This email address is where replies from the client would go to.
To: The email address this email is being sent to.
Message-ID: A unique identifier for this email. This can be very useful for correlating a reply to the email that provoked it.
Subject: The subject line of the email.
Return-Path: The email address that bounced (undeliverable) emails are sent to. The domain for this should be the same as the from address — otherwise email providers may tag you as a spammer.

How EmailService works

Email providers

EmailService is configured with a list of email providers, with at least one provider being required for it to function. For each provider, the configuration includes:

The internal name of the provider, which acts as the "handle" by which the provider can be referenced.
The Session properties for the provider (this includes credentials).
The capabilities of the provider (SMTP, POP3, IMAP)
The priority for the provider, indicating whether this provider is more or less preferable than other providers.

Sending emails

When you want to send email through EmailService, you create an EmailSpec that contains the following:

A list of [0..n] EmailProperties instances, each of which has:
- A name, unique for each instance.
- Any number of named string properties.
A list of EmailAddressee instances, each of which specifies:
- The EmailAddress to send the email to.
- The list (To, CC, or BCC) that the email address should appear on.
- The name of the EmailProperties to use. This may be absent, in which case no email properties are used. If the name is present, it refers to the EmailProperties instance with that name.
The EmailDocument, defined in detail in The Email Document.

You might note that nothing above specifies images or attached files. That’s because those items are potentially large enough that they can use a problematic amount of memory and network resources. Instead, the EmailDocument contains the information required to let EmailService stream the data from a file, a web site, or from the email document itself. There’s much more detail on these mechanisms in The Email Document section.

The combination of EmailProperties and EmailAddressee, together with automatic modification of the EmailDocument allows for great flexibility, handling all these cases:

Simple bulk email, where a list of email addresses all receive the identical document. In this case, there may be no EmailProperties at all, and the EmailAddressee instances just specify the email address and what list they are to appear on.
Pure transactional email, where a list of email addresses each receive an email with different information. In this case, there will be a unique EmailProperties instance associated with each EmailAddressee, and the properties in EmailProperties will be used to modify the EmailDocument as appropriate for each individual recipient. Alternatively there could be no EmailProperties and just a single EmailAddressee, with a custom EmailDocument created for that single addressee.
Variant email (a term I made up), where a list of email addresses each receive one of some number of variations of an email. For instance, a list of people might get daily weather reports, but the weather reports might come in several variations: one just text, one with graphical gauges reporting current conditions, and another more elaborate that includes historical data in graphs. In this case there will be a unique EmailProperties for each variation, and each EmailAddressee will specify the EmailProperties with the variation to use for that email address.

Email Properties

As discussed in the preceding section, there are any number of named collections of named email properties associated with any email being sent. The following rules apply to the names of both the collections and the EmailProperties:

The names must start with a letter (upper or lower case), and otherwise must be composed of letters, numbers, and underscores. No other characters are allowed.
The names must be at least one character long, but no longer than 32 characters.
The names must be unique within their scope. Specifically, every collection of email properties must have a unique name, and every property within a collection must have a unique name.
The values of all properties are strings, but they may be string representations of numbers, and treated as such in a Test Expression.

Content Sources

When EmailService needs to read content (for example, for inlined images), it does so from a named content source that has been configured. Each content source has the following attributes:

Name: must be unique across all content sources.
Type: one of the following:
- File
- Web
Location: a string with the location of the content source.
- For file sources, the full path to the base directory. Specific content files have paths relative to the base directory. For instance, /content/contracts might be a location.
- For web sources, the URL to the base web location. Specific contents have paths relative to that base web locations, or are queries to that web location, or both. For example, https://accounts.burger.com/contracts might be a location.
Mode: one of the following:
- READ_ONLY (or GET): A read-only content source, usable for sending attachments but not for receiving them. After a resource is read from this content source, the resource remains and could be read again.
- READ_AUTO (or GET followed by DELETE: An automatically deleting read-only content source, usable for sending attachments but not for receiving them. After a resource is read from this content source, it is automagically deleted.
- READ_WRITE (or GET and PUT): A read/write content source, usable for both sending and receiving attachments.
- WRITE_ONLY (or PUT): A write-only content source, usable for receiving attachments, but not for sending them.

The Email Document

The EmailDocument is a string containing a modified form of HTML with a few custom tags and some other special capabilities, all of which are described in detail in the sections following.

Includes

The <include-file/> tag probably does exactly what you’d expect it to do. It has only one attribute: src="URL", and that attribute specifies the URL (which may be one of the Special URLs) from which the text (encoded in UTF-8) to be included is read. Here’s a simple example, first showing the document as composed — before the <include-file/> tag is processed:

<html><body><include-file src="https://bog.standard.com/includes/abc.html"/></body></html>

Then after the tag is processed, it is replaced with what was read from a GET on https://bog.standard.com/includes/abc.html:

<p>
   Today's topic is "How to tell a giraffe from a mouse."
</p>

After EmailService processes the <include-file/> tag, the resulting EmailDocument looks like this:

<html><body><p>
   Today's topic is "How to tell a giraffe from a mouse."
</p></body></html>

The <include-file/> tag may appear anywhere in the EmailDocument, even in places where a tag wouldn’t ordinarily be valid. For instance:

<plain-text<include-file src="http://bogus.com/weird.html"/>

That include will actually work, and assuming the resulting EmailDocument is valid, the resulting document will work.

Property value substitutions are allowed within the source URL of an <include-file/> tag, like this:

<include-file src="https://invoice.bigcompany.com/invoice?client=:::client_id:::"/>

If the client_id property of a particular EmailAddressee had the value 88278823, then after property value substitution (but before the URL was referenced), the line would look like this:

<include-file src="https://invoice.bigcompany.com/invoice?client=88278823"/>

Note that the <include-file/> tag is (and must be) self-closing (ends with />). It can appear any number of times within the EmailDocument, and it can be nested (that is, an included document fragment may itself have <include-file/> tags). EmailService will process all <include-file/> tags, including nested tags, before any other processing is done.

Property Value Substitutions

If the EmailDocument contains any strings in the form :::property name::: (we’ll call that a PSUB), that string will be replaced with the value of the EmailProperty for the EmailAddressee being processed. For example, if the EmailDocument contained a sentence like this:

   Hi, :::friendly first name:::!  Thank you for shopping with us!

And for a given user there is an EmailProperty named "friendly first name" with a value of "Tom", then after the property value substitutions are processed that sentence will look like this:

   Hi, Tom!  Thank you for shopping with us!

Text substitutions like these are pretty obvious, but there’s a very useful purpose for them that’s not simple text substitutions like this — customizing URLs. For example, suppose you are sending an HTML mail to your customers, and you want to include a link to their account. Your HTML email might contain an anchor tag like this:

<a href="https://burger.com/account/summary.html?acct=:::acct:::">Click here to see your account.</a>

Then each EmailAddressee will have an EmailProperty named acct that contains that addressee’s account number (let’s say KDL99123 for our example). After the property value substitutions are processed, that anchor tag would look like this:

<a href="https://burger.com/account/summary.html?acct=KDL99123">Click here to see your account.</a>

PSUBs can appear anywhere in an EmailDocument, and any number of them may be used (including PSUBs with identical property names). PSUBs may even be nested (that is, the value of a PSUB may itself include PSUBs). PSUBs are the second thing processed when an EmailDocument is processed, right after includes (though include URLs may contain PSUBs - see Includes).

Conditional Tags

Conditional tags allow blocks of an EmailDocument to be included or excluded based on the result of an test expression based on one or more EmailProperties. Here’s a simple example to illustrate the idea with an if/else/endif construct, using the EmailProperty named "status":

<plain-text>
   <es-if test="status = 'paid'">
      Thank your for your prompt payment!
   <es-else/>
      Pay up right now, you bum!
   </es-if>
</plain-text>

What’s above is the EmailDocument as composed. Note that the <es-if> tag must be closed with the usual </es-if> tag — but the optional <es-else/> tag is self-closed. After EmailService processes the conditional tags, if the status property for a given user had the value "paid", the EmailDocument would look like this:

<plain-text>
      Thank your for your prompt payment!
</plain-text>

See Test Expression for (much) more detail about how those test expressions work.

The switch/case/endcase construct works very similarly. Before conditionals processing:

<html><body><p>
   <es-switch>
         Greetings!
      <es-case test="status = 'paid'"/>
         Thank you for your prompt payment!
      <es-case test="status = 'balance due'"/>
         We would appreciate your prompt payment on the balance due.
      <es-case test"status = 'overdue'"/>
         Pay up right now, you bum!
      <es-case default/>
         Uh, sorry, but we have no idea what to say.
   </es-switch>
</p></body></html>

The test expressions in the <case test="…"/> tags are evaluated in the order that they appear in the EmailDocument. When one of them tests true, then the following block (terminated either by the next <case test="…"/> tag or the </es-switch> tag) is included, and all the other blocks following other <case test="…"/> tags is excluded.

If the value of "status" was "balance due", the EmailDocument after conditionals processing would look like this:

<html><body><p>
         Greatings!
         We would appreciate your prompt payment on the balance due.
</p></body></html>

Some notes:

The <es-case default/> tag defines the case tat would be included if no other case test expressions evaluated to true.
The text (if any) between the <es-switch> tag and the first <es-case test="…"/> tag is always included.
The <es-case test="…"/> tags are (and must be) self-closing (ending with />).

Conditional tags (both if and switch) may be nested, like this:

<plain-text>
   <es-if test="status = paid">
      <es-if test="whale = yes">
        Bless you, oh magnificent customer, for your prompt and large payment!!!
      <es-else/>
        Thank your for your prompt payment!
      </es-if>
   <es-else/>
      Pay up right now, you bum!
   </es-if>
</plain-text>

Specifically, the following nestings are allowed (and others disallowed):

An if/else/endif may be nested inside an if/else or an else/endif.
A switch/case/endswitch may be nested inside an if/else or an else/endif.
An if/else/endif may be nested inside a switch/case, case/case or a case/endswitch.
A switch/case/endswitch may be nested inside a switch/case, case/case or a case/endswitch.

Conditional tags may appear almost anywhere in the EmailDocument, and any number of them may be present. If the property they reference does not exist for a particular EmailAddressee, then EmailService will not send an email (but will return an error for that addressee). Conditional tags are the third step in processing an EmailDocument, immediately after property substitutions.

Test Expression

Test expressions are used in Conditional Tags, and they must evaluate to a boolean value (true or false) that is used to control a conditional tag. The following elements may appear in a test expression:

Email property names: any unquoted string that starts with a letter (upper or lower case) is treated as a property name. When the expression is evaluated, the property’s value is substituted for its name. Most of the time the expression evaluator can infer whether the property’s value should be handled as a string value or as a numeric value. On those occasions when it cannot make the inference, it defaults to handling the value as a string value - but if the property name is suffixed with a hash (#), then it will be handled as a numeric value.
String literals: Any string surrounded by single quotes (like burger in I loved that 'burger'!) is a string literal. If you need to use a single quote inside of a string literal, preface it with a backslash, like this: 'I loved that \'burger\'!'. String literals may include any characters other than a newline.
Numeric literals: Any unquoted sequence of numbers is a numeric literal. The sequence may include a single decimal point. A negative number may be indicated by any one of leading minus sign, a trailing minus sign, or surrounding parentheses. A numeric literal may not include any whitespace. Some examples of valid numeric literals: 12, .002, 53., -99.3, 79.300-, (44.32). The string 1 300.32 is not valid, as there’s a space. Similarly, a34 and 44b are not valid because there are non-numeric characters.
Parentheses: As you’d expect, if you put parentheses around a subexpression, that indicates a high precedence for the subexpression. Without parentheses, the expression evaluator evaluates strictly left-to-right - so use parentheses liberally!
Square brackets: Any expression can include a single instance of an integer (positive or negative) inside square brackets. The integer indicate the precision of numeric results. The integer indicates the power of ten to round to. For instance, a -2 means round to the nearest 10^-2, or 0.01. Similarly, a 3 indicates rounding to the nearest 10^3, or 1000.
Operators: These are discussed detail in the table below.

Operator	Explanation
+	The left side and the right side must either both be strings, or both be numeric. In the case of strings, the strings are concatenated. For instance, if the value of the property `alpha` is 'yogurt', then the result of `alpha + 'bet'` is `'yogurtbet'`. In the case of numeric values, the values are added. For example, if the value of the property `amount` is 124, then the result of `amount + 45.2` is 169.2.
-	The left side and the right side must both be numeric values, and the result is the left side minus the right side. For instance, if the value of `payment` is 245.45 then the result of `245.45 - payment` is zero.
*	The left side and the right side must both be numeric values, and the result is the product of the left side and the right side. For instance, if the value of `interest` is 0.04 and the value of `balance` is 34.50, then the result of `balance * (1 + interest)` is 35.88. Note the use of parentheses to force `1 + interest` to evaluate first.
/	The left side and the right side must both be numeric values, and the result is the quotient of the left side divided by the right side. The square bracket precision notation is particularly useful with division, as the result can easily have many decimals. For example, if the value of the property `total` is 7362.34, then the result of `total / 12[-2]` is 613.53.
=	Evaluates to `true` if the value of the left side is equal to the value of the right side. For instance, `status = 'paid'` would evaluate to `true` if the value of the `status` property was `'paid'`.
!=	Evaluates to `true` if the value of the left side is not equal to the value of the right side. For instance, `due != 0` would evaluate to `true` if the value of the `due` property was `3.45` (or any other non-zero value).
>	Evaluates to `true` if the value of the left side is greater than the value of the right side. For numbers, that means if the left side is a number larger than the right side, the result is `true`. An example: `balance > 100` would evaluate to `true` if the value of `balance` is more than 100. For strings, it’s a bit tricker: the left side is compared to the right side with Java’s `Collator.compare()`, and the result of that is used to determine which side is greater (meaning, after in sort order). For instance, `name > 'Smith'` would evaluate to `true` if the value of `name` would come after `Smith` in collation order. The collation order may value with the language or locale.
<	Evaluates to `true` if the value of the left side is less than the value of the right side. All the comments about numeric and string values for `>` apply for this operator as well.
>=	Evaluates to `true` if the value of the left side is greater than or equal to the value of the right side. All the comments about numeric and string values for `>` apply for this operator as well.
<=	Evaluates to `true` if the value of the left side is less than or equal to the value of the right side. All the comments about numeric and string values for `>` apply for this operator as well.
&	Evaluates to `true` if both the subexpression on the left side and the subexpression on the right side evaluate to `true`. Note that this operator must have a subexpression on both sides, not a string or numeric example. For example: (balance > 0) & (status != 'current') The `balance > 0` is a subexpression that evaluates to either `true` or `false`, as is `status != 'current'`. The parentheses around each subexpression are required so that the subexpressions will evaluate before the `&` operator is evaluated. The end result: if a particular EmailAddressee had a balance owed, and the account status was not current, then this expression would evaluate to `true`.
\|	Evaluates to `true` if either the subexpression on the left side or the subexpression on the right side evaluate to `true`. The comments about `&` also apply for this operator.
^	Evaluates to `true` if either the subexpression on the left side or the subexpression on the right side evaluate to `true`, but not both (the "exclusive or" function). The comments about `&` also apply for this operator.

Plain Text Section

This tag is very straightforward. An example:

<plain-text>
    Here is the plain text message I want displayed in my email for recipients that don't have HTML clients.
</plain-text>

That’s really all there is to it. This tag may appear only once in an EmailDocument, and only at the top level.

HTML Section

This tag encloses the block of conventional HTML (optionally including the special tags for Inlined elements like images, audio, and video). An example:

<html>
  <body>
    <p style="color:red;">This is a very <i>simple</i> example of what I might put in my HTML email.
    </p>
    <p>It's pretty boring, admittedly.
    </p>
  </body>
</html>

There’s plenty of help on the web for composing HTML email. A few examples here, here, and here. This tag may appear only once in an EmailDocument, and only at the top level.

Attachments

This tag tells EmailService to attach a file to the email, with the specified file name, getting the data for the file from the specified source. Note that the data is read by EmailService, not the email recipient. That means the data source must be accessible to EmailService, but it doesn’t have to be accessible to the email recipient. An example:

<attached-file name="burger.doc" src="https://internal.burger.com/secret_burger_formula.doc"/>
<attached-file name="bun.doc" src="file:///home/bilbo/bunrecipe.doc"/>

This tag may appear any number of times (including zero), but only at the top level. Note that it is (and must be) a self-closing tag (ends with />). The example shows the sources as convention URLs, but the Special URLs may be used as well.

Inlined elements

The HTML section can include <img>, <audio>, and <video> tags just like a web page, and these will work the same way (but be aware that many email clients to not yet support the <audio> and <video> tags). When you include these tags, the data will come from the URL you specify as the source, which usually is a publicly accessible web site. For most users, that means there will be a bit of a delay between the time they open the email and the time the image appears (or the audio can be heard, or the video played). Often, especially for images, this delay is undesirable. You can avoid this delay by including the source of data in the email — this is called inlining. The way you do that with EmailService is by using one of the special tags <inline-img>, <inline-audio>, or <inline-video>. You compose your HTML using these tags exactly you would the non-inlined version, except that the source URL (which may be one of the Special URLs) will be read by EmailService and the data from the source attached to the email. Once the inline data has been read, EmailService will convert the tag to the non-inlined version, keeping all the attributes except the src attribute, which will be converted to a special reference to the inlined data. All that sounds terribly complicated, but it’s actually pretty simple. Here’s an example as composed in the EmailDocument:

<html><body>
  <inline-img src="file:///home/tom/dog.jpg" width="10"/>
</body></html>

And this is what it looks like after EmailService has read the data and converted the tag. In other words, this is what’s actually transmitted to the email’s recipient:

<html><body>
  <img src="cid:1" width="10"/>
</body></html>

That funny-looking src="cid:1" is a special form of URL that email clients understand — this one tells the email client to get the image data from the attachment named "1", which is where EmailService put the data that it read from file:///home/tom/dog.jpg. Note that the width="10" attribute was kept in the tag sent to the email recipient.

The special inline tags may appear anywhere inside the HTML section where a non-inlined tag of the same type could be used.

Special URLs

For any of the special EmailService tags (for includes, inlined elements, and attachments), the src URL can be either a standard URL (like https://, http://, or file://) or it can be the special content source URL that has this form:

cs://<content source name>/<relative path or query>

For example, suppose you’ve configured EmailService with a content source named aardvark, with a type of Web, and a location of https://receipts.paradise.com. You’re sending an email to an EmailAddressee with EmailProperties containing a property named "receipt_id" with a value of "98887200272352". Your EmailDocument might contain the following:

<inline-img src="cs://aardvark/receipt?id=:::receipt_id:::"/>

After all the processing is completed, EmailService will read the data from the URL https://receipts.paradise.com/receipt?id=98887200272352, and generate an image tag like <img src="cis:1"/> to send in the email.

Receiving emails

Why does the world need Email?

I’m not sure the world actually does need EmailService, but I sure did! I wanted to integrate email with some of my own applications that do these things:

Send transactional emails. For instance, personalized weather reports, system status to administrators, etc. For these emails, I need more than just the email address — I need some way to get the personalized content.
Send bulk emails. For instance, daily weather reports.

Then once I started to actually implement these functions, I realized that I want my email to work with multiple third-party providers — for reliability, lower cost, and sometimes for features.

Dependencies

EmailService has several dependencies:

Util is a utilities module the author also wrote, freely available from here.
JSON is the bog-standard Java JSON module, freely available from here.
Jakarta Mail is the bog-standard Java email API, freely available from here. It’s dependency the Jakarta Activation package is available here.
JSoup is an open-source Java HTML parser, available here.

Why is Email’s code so awful?

The author is a retired software and hardware engineer who did this just for fun, and who (so far, anyway) has no code reviewers to upbraid him. Please feel free to fill in this gap! You may contact the author at [email protected].

How is Email licensed?

Email is licensed with the quite permissive MIT license:

Created: November 16, 2020
Author: Tom Dilatush link:mailto:[email protected]
Github: https://github.com/SlightlyLoony/Email
License: MIT

Copyright 2020, 2021 by Tom Dilatush (aka "SlightlyLoony")

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE A AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

slightlyloony / email Goto Github PK

email's Introduction

What is EmailService?

Ways that automatically transmitted emails are used

Ways that automated processing of received emails is used

A bit about email addresses and identities

Plain email versus HTML email

Some background on how email actually works

Email sending protocols

Email receiving protocols

Email protocols in Java

Email providers

The components of an email

Anatomy of an Email Address

To, CC, and BCC

Subject

Body (Message)

Inline vs. Attachments

Headers

How EmailService works

Email providers

Sending emails

Email Properties

Content Sources

The Email Document

Includes

Property Value Substitutions

Conditional Tags

Test Expression

Plain Text Section

HTML Section

Attachments

Inlined elements

Special URLs

Receiving emails

Why does the world need Email?

Dependencies

Why is Email’s code so awful?

How is Email licensed?

email's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org