Hypertext Transfer Protocol (HTTP)

In Lecture #2,, the World Wide Web was used to illustrate the idea of a layered communications architecture. In that lecture, the basic ideas of the original version (0.9, circa 1992) of HTTP were introduced.

To revise, in HTTP 0.9 the GET operation was used to obtain HTML "pages" (documents) from a server, eg:

GET /Index.html
    <HTML>
    <HEAD>
    <TITLE>La Trobe University</TITLE>
    </HEAD>
    <BODY>
    <H1>La Trobe University</H1>
    ..........etc

HTTP 0.9 actually defined several other operations. However, since HTTP 1.0 (RFC 1945) is now commonly used, we shall defer discussion of them.

HTTP 1.0

HTTP 1.0 uses, like its predecessor and like many other Internet application protocols, plain ASCII printable text for all protocol messages. The basic GET operation looks like:

GET /Index.html HTTP/1.0<newline><newline>

The protocol specifier included with the command indicates to the server process that the client (browser) understands HTTP 1.0 The response from the server is a MIME-compatible document (see later):

GET /Index.html HTTP/1.0

    HTTP/1.0 200 OK
    Server: Netscape-Communications/1.1
    Date: Monday, 24-Mar-97 00:08:16 GMT
    Last-modified: Tuesday, 18-Mar-97 13:44:09 GMT
    Content-length: 6779
    Content-type: text/html

    <HTML>
    <!--for more	 information, contact Webmaster@latrobe.edu.au-->
    <HEAD>
    ........(etc)

HTTP 1.0 Methods

The HTTP 1.0 protocol is specified in terms of methods, rather than simple commands. The available methods are:

GET: request to read a generalised object. The object can be a Web "page" (HTML document), an image, a sound sample or a wide range of other types.
HEAD: request to read the header of a Web page. This can contain much useful information about the content of the page, without the need to load the entire page.
PUT: request to store a Web page
POST: Append to a named resource (eg, a Web page)
DELETE: delete the Web page specified
LINK: connect two existing resources
UNLINK: Breaks an existing connection between two resources.

You can discover lots more about HTTP 1.0 at: http://www.w3.org/pub/WWW/Protocols/Specs.html

More on the HTTP 1.0 GET Method

GET is the primary workhorse method of HTTP 1.0. It extends the earlier HTTP 0.9 command by adding extra information about the browser type and environment. For example, the following is an actual page request captured from the local LAN:

GET /Index.html HTTP/1.0
Host: www.latrobe.edu.au
Accept_Language: en-AU
User-Agent: Cyberdog/1.2
Accept: */*
Accept: image/gif
Accept: image/jpeg

Note that these are in the same format as RFC 822 headers. There are many other GET request method header permissible in HTTP 1.0. For example:

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
From: webmaster@w3.org
Referer: http://www.latrobe.edu.au/Index.html

MIME Encoding for the Web

Documents returned by an HTTP 1.0 GET request method are MIME compatible. An example was given on an earlier slide.

Responses start with a status line indicating which version of HTTP the server is running, together with a result code and an optional message.

This is followed by a series of optional object headers; the most important of these are "Content-Type", which describes the type of the object being returned, and "Content-Length", which indicates the length.The headers are terminated by an empty line.

The server now sends any requested data. After the data have been sent, the server drops the connection.

Note that MIME header type "Content-Encoding" is not needed in HTTP because the protocol is designed to handle "8-bit" data. That is, any data at all can be sent after the blank line which signifies the end of the response headers.

HTML

Although it is not "core" knowledge in this unit, we should digress to discuss HTML.

HTML is a markup language. This means that the structure of the document (or Web page) is (or should be) described using embedded formatting codes. In HTML, these codes are delimited by the characters "<" and ">". If either of these characters must appear as part of the actual data, they are written as < and > respectively. It is possible to create perfectly useful Web documents using only a small fraction of the possible HTML codes. For example, your lecture notes and tutes in this unit are written using (for the most part) just the following markups:

<h1>Heading text</h1>: for page and section headings
<p> and <br>: for text spacing.
<em>text</em>: for emphasis
<kbd>text</kbd>: to indicate typewritten text
<hr>: to insert a horizontal line
<ul>, <ol> and <dl>: markups for lists
<img ...>: to include in-line images
<a ...>: to implement hyperlinks.

HTTP 1.0 Performance Issues

HTTP 1.0 has been criticised for poor performance and lack of scalability. There are several aspects to this:

HTTP 1.0 opens a new TCP connection for every single transaction. For example, if a Web page contains 10 <img ...> images, HTTP 1.0 must open a total of 11 TCP connections - one for the orignal page, and 10 for the images. Problems which arise from this include:
- There can be a moderately long overhead in initial connection establishment due to round-trip delays.
- TCP initiates connections using the so-called "slow start" algorithm. This is necessary for proper operation, but is very inefficient for short transfers
TCP is required to maintain "state" information about closed connections for 240 seconds, to ensure that stray packets from old connections won't be interpreted by a later connection. When a server is handling a large number of connections, this can require huge buffer space, and is very inefficient.

Because of these aspects, HTTP 1.0 is gradually being replaced by HTTP 1.1.

HTTP/1.1

HTTP 1.1 is now in widespread use. Some of its enhanced features include:

Persistent connections are now supported. A client can open a single TCP connection, and can pipeline HTTP requests-that is, can send several requests before existing requests have been satisfied. This gives a huge boost to performance.
Where proxy servers are in use (with or without the use of proxy caches), HTTP/1.1 methods will normally send the full URL of the desired document in the request. For example:
```
GET http://www.latrobe.edu.au/index.html HTTP/1.1
```
The protocol has extensive support for control of caching. Current practice is that documents can be cached (ie, a copy is kept in case another client requests the same document within a short time) at a proxy server. With large numbers of copies of documents, it is imperative that mechanisms exist to ensure that clients do not download outdated information.
Several new methods are defined, issues of encoding, authentication and lots more are addressed. Lastly, it's even backward compatible with HTTP 0.9 and HTTP/1.0

This lecture is also available in PostScript format. The tutorial for this lecture is Tutorial #05.
[Previous Lecture] [Lecture Index] [Next Lecture]

Phil Scott