Electronic Mail Basics

Electronic mail (email) allows a person to compose a message and to send it to another person, usually on a remote system. Most email software also provides software to facilitate reading, saving, printing and replying to email.

Until very recently, electronic mail was the single biggest generator of traffic volume on the Internet.

Email is delivered as follows:

Key concepts:

RFC821 (and subsequent) SMTP protocol
RFC822 (and subsequent) message format
SMTP Relay, mailbox delivery.
POP for mailbox access

RFC 822 Message Format

RFC822 (and several subsequent RFCs) defines the structure of an email message in the Internet. It has become the generic standard for email messages.

It defines a message format containing two parts:

A set of headers, some of which are are mandatory and others optional. The headers have a fixed format, consisting of a keyword which starts immediately after a newline, followed by a colon character, followed by a space and a value. Some headers include:
```
From: pscott@ironbark.bendigo.latrobe.edu.au
To: hjc@ironbark
Reply-To: p.scott@latrobe.edu.au
Subject: Problems in Indy lab?
```
A body, which may contain any plain ASCII text. The body part follows the headers, separated from them by a blank line. There are certain rules which ensure that no line in the body can be mistaken for a header line, even though the two parts are logically distinct. Note that more recent standards than RFC822 (MIME) extend the range of possible messages which can be sent by email as enclosures or attachments, see later.

The SMTP Protocol

The Simple Mail Transfer Protocol defined in RFC821 specifies how mail is delivered from one system to another. It is a relatively straightfoward protocol.

Initially, an email client (usually the delivery agent software on the originating machine) establishes a TCP connection to the SMTP server (at port 25) on the destination machine.

The server responds with an informative message beginning with the 3-digit code 220 The client then sends a HELO command identifying the domain name of the system it is running on.

The client software then transmits one (or more) mail messages to the server. Each message is preceded by a MAIL-FROM and one or more RCPT-TO messages. The responses to these messages always begin with 3-digit numbers followed by a human readable message. Then the text of the message itself (including its headers) is transmitted using a DATA message.

Finally, a QUIT message from the client tells the server to close the TCP connection. An example of this is given on the next slide.

An SMTP Session

NB: Text in italics is sent from the client. Note that messages from the server always have a 3-digit code at the start of line.

220 redgum.bendigo.latrobe.edu.au ESMTP Sendmail 8.7.3/8.7.3; Mon, 25 Mar 1996 8:16:33 +1000 (EST)
helo bindi.bendigo.latrobe.edu.au
250 redgum.bendigo.latrobe.edu.au Hello bindi.bendigo.latrobe.edu.au, pleased to meet you
Mail from: pscott@bindi.bendigo.latrobe.edu.au
250 pscott@bindi.bendigo.latrobe.edu.au... Sender ok
Rcpt to: hjc@ironbark.bendigo.latrobe.edu.au
250 Recipient ok
Data
354 Enter mail, end with "." on a line by itself
Subject: Problem with ironbark?
From: pscott@bindi.bendigo.latrobe.edu.au
Reply-to: p.scott@latrobe.edu.au

Is there a problem?

Regards
.
250 IAA13317 Message accepted for delivery
quit
221 redgum.bendigo.latrobe.edu.au closing connection

Other Aspects

There are many subtleties involved in electronic mail. These include:

Email is (usually) accepted for delivery on an SMTP relay host, usually located in the same organisation as the sender. Such a system is sometimes also called an SMTP gateway. In the event that the destination system is not available (eg, is down or unreachable), the relay host "spools" the message, and attempts to deliver it at regular intervals.
A email address usually defines a mailbox, not a person (but see below). A mailbox is simply a file, in mailbox format, on the destination host. It is not necessarily true that a destination mailbox resides on the actual host to which a message was sent...
There are other forms of email address other than mailboxes. These include:
- mail forwarders
- mail aliases
- mailing lists
- automated mail systems, ie the destination is a process.

Email Attachments

The Multipart Internet Mail Extensions (MIME -- RFC1521 and RFC1522) specification extends the SMTP protocol to allow the mail message body to contain attachments or enclosures. This allows SMTP to be used to send files of arbitrary type.

The MIME specification adds several new header types to RFC822. In the most common usage, the following are added to the basic message header:

MIME-Version: 1.0
Content-Type: Multipart/Mixed; Boundary=NextBit

The message is then structured into one or more "message parts", using the "Boundary" string as a separator.The following shows an audio attachment to an ordinary text message. Note that non-ASCII data is usually encoded into an ASCII representation.

--NextBit
Content-Type: text/plain

Ordinary email mesage in plain ASCII text

--NextBit
Content-Type: audio/basic
Content-Transfer-Encoding: base64
    
...ASCII encoded data for the audio message

MIME Types and Encodings

The Content-Type: header in MIME specifies a "MIME type" for the data which follows. The MIME type is used to open a suitable application program to display the attached data. Some standard MIME types include:

text/plain ASCII text

text/html HTML text

image/gif GIF image

video/mpeg MPEG video

application/postscript PostScript document

application/octet-stream Arbitrary data

For non-ASCII (8 bit) data, common encodings include "quoted-printable" and "Base64".

In Base64 encoding, the message is subdivided into groups of 3 bytes (24 bits) in length. These 24 bits are then subdivided into 4 groups of 6 bits each. Each 6 bit group is represented as one of 64 printable ASCII characters, from the 95 printable characters in ASCII. Finally, each of the printable characters is sent as an 8 bit byte. Thus, 24 bits of data are sent as 32 bits of ASCII data in the encoded message.

The Post Office Protocol (POP)

SMTP is really only useful to deliver mail to multiuser hosts which are permanently available and connected to the network. It is not normally used to deliver mail directly to, for example, a user's PC or Mac desktop system. However, people who use these types of system still want to use email without the complexity of learning mainframe mail agent software (mutt, elm, pine, etc).

The Post Office Protocol (POP3) is designed to allow mail to be delivered to a mailbox on, eg, a Unix host using SMTP, but to later (at the recipient's convenience) download the contents of the mailbox to their system.

A POP client (such as Eudora or Netscape Mail) establishes a TCP connection (on port 110) to a server process on the (eg) Unix system where the mailbox resides. The user is authenticated (username/password), and the contents of her mailbox is downloaded for processing on her PC or Mac.

POP is universally used where a user has "dial up" Internet access from a commercial Internet Service Provider - the user's mailbox is maintained by the ISP.

This lecture is also available in PostScript format. The tutorial for this lecture is Tutorial #04.
[Previous Lecture] [Lecture Index] [Next Lecture]

Phil Scott

`text/plain`	ASCII text
`text/html`	HTML text
`image/gif`	GIF image
`video/mpeg`	MPEG video
`application/postscript`	PostScript document
`application/octet-stream`	Arbitrary data