Hypertext Transfer Protocol (HTTP)
In Lecture #2,, the World Wide Web was used to
illustrate the idea of a layered communications architecture. In that
lecture, the basic ideas of the
original
version (0.9, circa 1992) of HTTP were introduced.
To revise, in HTTP 0.9 the GET operation was used to obtain
HTML
"pages"
(documents) from a server, eg:
GET /Index.html
<HTML>
<HEAD>
<TITLE>La Trobe University</TITLE>
</HEAD>
<BODY>
<H1>La Trobe University</H1>
..........etc
HTTP 0.9 actually defined several other operations. However, since HTTP
1.0
(RFC 1945) is now commonly used, we shall defer
discussion of them.
HTTP 1.0
HTTP 1.0 uses, like its predecessor and like many other Internet
application
protocols, plain ASCII printable text for all protocol messages.
The basic GET operation looks like:
GET /Index.html HTTP/1.0<newline><newline>
The protocol specifier included with the command indicates to the server
process that the client (browser) understands HTTP 1.0
The response from the server is a MIME-compatible document
(see later):
GET /Index.html HTTP/1.0
HTTP/1.0 200 OK
Server: Netscape-Communications/1.1
Date: Monday, 24-Mar-97 00:08:16 GMT
Last-modified: Tuesday, 18-Mar-97 13:44:09 GMT
Content-length: 6779
Content-type: text/html
<HTML>
<!--for more information, contact Webmaster@latrobe.edu.au-->
<HEAD>
........(etc)
HTTP 1.0 Methods
The HTTP 1.0 protocol is specified in terms of methods, rather
than simple
commands. The available methods are:
- GET
- request to read a generalised object. The object can be a Web
"page" (HTML document), an image, a sound sample or a wide range of other
types.
- HEAD
- request to read the header of a Web page. This can contain
much useful information about the content of the page, without the need to
load the entire page.
- PUT
- request to store a Web page
- POST
- Append to a named resource (eg, a Web page)
- DELETE
- delete the Web page specified
- LINK
- connect two existing resources
- UNLINK
- Breaks an existing connection between two resources.
You can discover lots more about HTTP 1.0 at:
http://www.w3.org/pub/WWW/Protocols/Specs.html
More on the HTTP 1.0 GET Method
GET is the primary workhorse method of HTTP 1.0. It extends the
earlier HTTP 0.9 command by adding extra information about the browser
type and environment. For example, the following is an actual page request
captured from the local LAN:
GET /Index.html HTTP/1.0
Host: www.latrobe.edu.au
Accept_Language: en-AU
User-Agent: Cyberdog/1.2
Accept: */*
Accept: image/gif
Accept: image/jpeg
Note that these are in the same format as RFC 822 headers. There are many
other GET request method header permissible in HTTP 1.0. For example:
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
From: webmaster@w3.org
Referer: http://www.latrobe.edu.au/Index.html
MIME Encoding for the Web
Documents returned by an HTTP 1.0 GET request method are MIME
compatible.
An example was given on an earlier slide.
Responses start with a status line indicating which version of HTTP the
server is running, together with a result code and an optional message.
This is followed by a series of optional object headers; the most
important of these are "Content-Type", which describes the type of the
object being returned, and "Content-Length", which indicates the
length.The headers are terminated by an empty line.
The server now sends any requested data. After the data have been sent,
the server drops the connection.
Note that MIME header type "Content-Encoding" is not needed in HTTP
because the protocol is designed to handle "8-bit" data. That is, any data
at all can be sent after the blank line which signifies the end of the
response headers.
HTML
Although it is not "core" knowledge in this unit, we should digress to
discuss HTML.
HTML is a markup language. This means that the structure
of the document (or Web page) is (or should be) described using
embedded formatting codes. In HTML, these
codes are delimited by the characters "<" and ">". If either of
these
characters must appear as part of the actual data, they are written as
< and > respectively.
It is possible to create perfectly useful Web documents using only a small
fraction of the possible HTML codes. For example, your lecture notes and
tutes in this unit are written using (for the most part) just the
following markups:
- <h1>Heading text</h1>
- for page and section headings
- <p> and <br>
- for text spacing.
- <em>text</em>
- for emphasis
- <kbd>text</kbd>
- to indicate typewritten text
- <hr>
- to insert a horizontal line
- <ul>, <ol> and <dl>
- markups for lists
- <img ...>
- to include in-line images
- <a ...>
- to implement hyperlinks.
HTTP 1.0 Performance Issues
HTTP 1.0 has been criticised
for poor performance and lack of
scalability. There are several aspects to this:
- HTTP 1.0 opens a new TCP connection for every single transaction. For
example, if a Web page contains 10 <img ...> images, HTTP
1.0 must open a
total of 11 TCP connections - one for the orignal page, and 10 for the
images. Problems which arise from this include:
- There can be a moderately long overhead in initial connection
establishment due to round-trip delays.
- TCP initiates connections using the so-called "slow start"
algorithm. This is necessary for proper operation, but is very
inefficient
for short transfers
- TCP is required to maintain "state" information about closed
connections for 240 seconds, to ensure that stray packets from old
connections won't be interpreted by a later connection. When a server is
handling a large number of connections, this can require huge buffer
space, and is very inefficient.
Because of these aspects, HTTP 1.0 is gradually being replaced by
HTTP 1.1.
HTTP/1.1
HTTP 1.1 is now in widespread use. Some of its enhanced features include:
This lecture is also available in PostScript
format.
The tutorial for this lecture is Tutorial #05.
[Previous Lecture]
[Lecture Index]
[Next Lecture]
Phil Scott