Hypertext Transfer Protocol (HTTP)
In Lecture #2,, the World Wide Web was used to
illustrate the idea of a layered communications architecture. In that
lecture, the basic ideas of the
original
version (0.9, circa 1992) of HTTP were introduced.
To revise, in HTTP 0.9 the GET operation was used to obtain
HTML "pages" from a server, eg:
GET /Index.html
<HTML>
<HEAD>
<TITLE>La Trobe University</TITLE>
</HEAD>
<BODY>
<H1>La Trobe University</H1>
..........etc
HTTP 0.9 actually defined several other operations. However, since HTTP
1.0 (RFC 1945) is now commonly used, we shall defer
discussion of them.
HTTP 1.0
HTTP 1.0 uses, like its predecessor and like many other Internet
application protocols, plain ASCII printable text for all protocol messages.
The most basic GET operation looks like:
GET /Index.html HTTP/1.0<newline><newline>
The protocol specifier included with the command indicates to the server
process that the client (browser) understands HTTP 1.0
The response from the server is a MIME-compatible document
(see later):
GET /Index.html HTTP/1.0
HTTP/1.0 200 OK
Server: Netscape-Communications/1.1
Date: Monday, 24-Mar-97 00:08:16 GMT
Last-modified: Tuesday, 18-Mar-99 13:44:09 GMT
Content-length: 6779
Content-type: text/html
<HTML>
<!--for more information, contact Webmaster@latrobe.edu.au-->
<HEAD>
........(etc)
HTTP 1.0 Methods
The HTTP 1.0 protocol is specified in terms of methods, rather
than simple commands. The available methods are:
- GET
- request to read a generalised object. The object can
be a Web "page" (HTML document), an image, a sound sample or a wide range of other
types.
- HEAD
- request to read the header of a Web page. This can contain
much useful information about the content of the page, without the need to
load the entire page.
- PUT
- request to store a Web page
- POST
- Append to a named resource (eg, a Web page)
- DELETE
- delete the Web page specified
- LINK
- connect two existing resources
- UNLINK
- Breaks an existing connection between two resources.
You can discover lots more about HTTP 1.0 at:
http://www.w3.org/pub/WWW/Protocols/Specs.html
More on the HTTP 1.0 GET Method
GET is the primary workhorse method of HTTP 1.0. It extends the
earlier HTTP 0.9 command by adding extra information about the browser
type and environment. For example, the following is an actual page request (slightly
edited) captured from the local LAN:
GET /index.html HTTP/1.0
User-Agent: Mozilla/3.0 (X11; IRIX 5.3 IP6)
Host: ironbark.bendigo.latrobe.edu.au
Accept: image/gif, image/jpeg, */*
Note that these GET Request are in the same format as RFC 822 headers. There are many
other GET request method header permissible in HTTP 1.0. Some examples include:
If-Modified-Since: Sat, 29 Oct 1998 19:43:31 GMT
From: webmaster@w3.org
Referer: http://www.latrobe.edu.au/Index.html
MIME Encoding for the Web
Documents returned by an HTTP 1.0 GET request method are MIME
compatible.
An example was given on an earlier slide.
Responses start with a status line indicating which version of HTTP the
server is running, together with a result code and an optional message.
This is followed by a series of optional object headers; the most
important of these are "Content-Type", which describes the type of the
object being returned, and "Content-Length", which indicates the
length.The headers are terminated by an empty line.
The server now sends any requested data. After the data have been sent,
the server drops the connection.
Note that MIME header type "Content-Encoding" is not needed in HTTP
because the protocol is designed to handle "8-bit" data. That is, any data
at all can be sent after the blank line which signifies the end of the
response headers.
HTTP 1.0 Performance Issues
HTTP 1.0 has been criticised
for poor performance and lack of scalability. There are several
aspects to this:
- HTTP 1.0 opens a new TCP connection for every single transaction. For
example, if a Web page contains 10 <img ...> images, HTTP
1.0 must open a
total of 11 TCP connections - one for the orignal page, and 10 for the
images. Problems which arise from this include:
- There can be a moderately long overhead in initial connection
establishment due to round-trip delays.
- TCP initiates connections using the so-called "slow start"
algorithm. This is necessary for proper operation, but is very
inefficient for short transfers
- TCP is required to maintain "state" information about closed
connections for 240 seconds, to ensure that stray packets from old
connections won't be interpreted by a later connection. When a server is
handling a large number of connections, this can require huge buffer
space, and is very inefficient.
Because of these aspects, HTTP 1.0 is gradually being replaced by
HTTP 1.1.
HTTP/1.1
HTTP 1.1 is now in widespread use. Some of its enhanced features include:
- Persistent connections are now supported. A client can open a
single TCP connection, and can pipeline HTTP requests-that is, can
send several requests before existing requests have been satisfied.
This gives a huge boost to performance.
- Where proxy servers are in use (with or without the use of
proxy caches), HTTP/1.1 methods will normally send the full URL
of the desired document in the request. For example:
GET http://www.latrobe.edu.au/index.html HTTP/1.1
- The protocol has extensive support for control of caching.
Current practice is that documents can be cached (ie, a copy is
kept in case another client requests the same document within a
short time) at a proxy server. With large numbers of copies of
documents, it is imperative that mechanisms exist to ensure that
clients do not download outdated information.
- Several new methods are defined, issues of encoding, authentication
and lots more are addressed. Lastly, it's even backward compatible
with HTTP 0.9 and HTTP/1.0
Check out
this Web
page for some useful information on HTTP.
This lecture is also available in PostScript
format.
The tutorial for this lecture is Tutorial #06.
[Previous Lecture]
[Lecture Index]
[Next Lecture]
Phil Scott