Subjects -> Computer Networks -> Lectures -> Lecture #07

Lecture 7: Applications #3.2: HTTP

HyperText Transfer Protocol, v1.0

The original (0.9) version of HTTP was not in use for very long, being soon replaced by version 1.0. In its most basic form, a v1.0 GET request looks like:

GET /index.html HTTP/1.0<newline><newline>

The response from the server consists of a status line, then a number of plain text headers, followed by a blank line and then the requested data object. It's clearly a very similar format to an RFC822 email message:

GET /index.html HTTP/1.0

    HTTP/1.0 200 OK
    Server: Netscape-Enterprise/3.5.1C
    Date: Tue, 20 Mar 2001 11:48:39 GMT
    Content-type: text/html
    Last-modified: Fri, 16 Mar 2001 02:22:52 GMT
    Content-length: 11378

    <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
    <html>
    <head>
    ........(etc)

A Tour of the HTTP/1.0 Response Headers

HTTP/1.0 200 OK: An ordinary plain text status line -- note the "200-series" status.
Server: Netscape-Enterprise/3.5.1C Date: Tue, 20 Mar 2001 11:48:39 GMT Last-modified: Fri, 16 Mar 2001 02:22:52 GMT: Various entertaining bits of information. The "Last-modified:" header can be quite useful, see the HTTP/1.0 "HEAD" request, later.
Content-length: 11378 Content-type: text/html: These two headers follow (approximately) the MIME convention for identifying the type of data contained in the "body" of the response -- in this case, plain text which should be interpreted as HTML by the browser. Note that MIME email-header "Content-Encoding:" (used in MIME-encoded email messages) is not normally used in HTTP because the protocol is designed to handle "8-bit" data. That is, any data at all can be sent after the blank line which signifies the end of the response headers.

More on the `GET` Request

HTTP/1.0 permits the GET request (and other HTTP request types, see later) to additionally send a series of optional Request Headers along with the request. For example, here's a typical request to ironbark, snarfed from the local network:

GET /index.html HTTP/1.0
User-Agent: Mozilla/3.0 (X11; IRIX 5.3 IP12)
Host: ironbark.bendigo.latrobe.edu.au
Accept: image/gif, image/jpeg, */*
Referer: http://ironbark.bendigo.latrobe.edu.au/index.html

The request headers are terminated with a blank line -- hence the need for two newlines, as seen in the first slide of today's lecture. It's also possible for the request to contain a "message body", just like a response message -- we defer discussion of this until later in the unit.

Perhaps the most interesting optional request header is "If-modified-since:", which takes an HTTP standard GMT time/date string as its value. If the requested page has not, in fact, been modified in the specified period, it won't be returned -- instead, a "304 Not Modified" response is sent. This is called a Conditional-GET and is very useful in support of caching, of which more later.

Other HTTP/1.0 Request Types

The HTTP 1.0 protocol is formally specified in terms of "methods," rather than simple commands. The available methods are:

GET: We've already seen this "request to read a generalised object". The object can be a Web "page" (HTML document), an image, a sound sample or a wide range of other types.
HEAD: A request to return the response header only, without the content. This can contain much useful information about the requested entity, without the need to actually load it -- eg, how big it is.
POST: Originally defined as a request to "append to a named resource" (eg, a Web page), this method is extensively used in CGI-based systems, see later.
PUT: Request to store a Web page. Has only ever been used experimentally.
DELETE: Delete the Web page specified. I'm unaware of this having ever being used, so we can ignore it.
LINK: Connect two existing resources. Likewise, never used.
UNLINK: Breaks an existing connection between two resources. Not used.

HTTP/1.0 Performance Issues

HTTP/1.0 has been criticised for poor performance and lack of scalability. There are several aspects to this:

HTTP/1.0 opens a new TCP connection for every single transaction. For example, if a Web page contains 10 <img ...> images, HTTP/1.0 must open a total of 11 TCP connections -- one for the original page, and 10 for the images. Problems which arise from this include:
- There can be a moderately long overhead in initial connection establishment due to round-trip delays.
- TCP initiates connections using the so-called "slow start" algorithm. This is necessary for proper operation, but is very inefficient for short transfers -- TCP typically takes 10 to 20KB of transferred data to get "up to full speed". Both of these can cause HTTP/1.0 browsers to seem really slow.
TCP is required to maintain "state" information about closed connections for 240 seconds, to ensure that stray packets from old connections won't be interpreted as valid data by a later connection. When a server is handling a large number of connections, this can require huge buffer space, and is very inefficient.
HTTP/1.0 has very limited support for caching.

Because of these aspects, HTTP 1.0 is gradually being replaced by HTTP 1.1.

HTTP/1.1 Basics

HTTP/1.1 (rfc2616) is now in widespread use. It extends the older protocol in a number of areas, notably persistent connections/pipelining and support for caching.

To implement Persisent Connections, HTTP/1.1 introduced a new request (and also response) header called "Connection:". This can take two values: "close" (which means that this is not a persistent connection) and "keep-alive", which means that the TCP connection is held until either side sends a "Connection: close" header, indicating that it wishes to terminate.

The browser can utilise a persistent connection by sending multiple requests over the connection without stopping and waiting for each them to be satisfied before sending the next -- the reponses are "in the pipeline". Similarly, the server can respond with responses sent one after another another. This is possible because each request can be unambiguously identified, as can the responses, using the "Content-length:" headers. The huge wins here, obviously, are that there's no delay opening multiple TCP connections, and the slow-start algorithm has time to get up to full speed.

Support for Caching in HTTP/1.1

The World Wide Web has been spectacularly successful -- so successful that a huge proportion of Internet traffic is HTTP, ie Web pages and related objects (eg, images). Caching is a technique whereby copies of popular objects are kept in strategic locations, and supplied in lieu of the originals, saving huge amounts of traffic on the "backbone networks".

The Conditional-GET operation seen earlier allows support for caching at the browser level -- that is, the browser can keep a local copy of an object and check if it's up to date before displaying it.

HTTP/1.1 adds support for proxy server caches. A proxy server is an HTTP server which fetches Web objects (pages, images, etc) on behalf of its clients. Requested objects are always specified as full URLs in HTTP/1.1, so the first line of a GET request now looks like:

GET http://www.bendigo.latrobe.edu.au/index.html HTTP/1.1

HTTP Proxy Server
system diagram

Implementing Caching

It's obvious that the proxy server can check its local cache to see if a requested object has recently been fetched. It is slightly more subtle to discover if it's actually the same object. HTTP/1.1 adds some new response headers to ensure that caching works correctly:

Expires: Tue, 20 Mar 2001 02:22:52 GMT Pragma: no-cache: An object can be marked as having a limited lifetime, and once the specified date/time has passed must be re-fetched from the originating server. Also, an object can be flagged as "un-cachable". These were both present in HTTP/1.0.
Etag: "8802-2c72-3ab178fc": This is the "Entity Tag", and is used to discover, with somewhat greater certainty, if the object (or entity) in the local cache is exactly the same object (eg, isn't different in any way) as the object stored on the remote server. The client can use an If-None-Match: "8802-2c72-3ab178fc"
header with a GET request to specify the version of the object which it already has. This is a significant improvement over the HTTP/1.0 "Conditional-GET". Note that HTTP/1.1 has a large number of other operations which can be used with Entity Tags.

You can discover lots more about HTTP at: http://www.w3.org/pub/WWW/Protocols/Specs.html

The tutorial for this lecture is Tutorial #07.
La Trobe Uni Logo

[Previous Lecture] [Lecture Index] [Next Lecture]