apache
-- this is
what we run on ironbark.,
and redgum
however there are several other popular server packages, especially
those from Microsoft.
This specifies the application protocol (http://ironbark.bendigo.latrobe.edu.au/index.html
HTTP
)
used to fetch the object, the domain name where it is located and
the local filename of the object on that host
(/index.html
). The "magic" string
://
doesn't mean anything in particular except
to signify that it's a URL...
HTML is an example of a markup language -- documents are (in general) plain ASCII textfiles, with certain characters/symbols reserved to denote markup. Such languages have a long and venerable history in computing (eg starting with *roff, TeX, (see also here), LaTeX, SGML and subsequently XML.
<
" and ">
" -- the
"less than" and "greater than" characters, often (rather clumsily
IMHO) called "angle brackets". If either of these characters must
appear as part of the actual data, they are written as
<
and >
respectively.
<A HREF="...some URL...:">link text</A>
structure. This was (and is) revolutionary!
<TABLE>
markup, style sheets, client-side scripting, etc), seemlessly
mingling text and graphics into what has become an entirely new
form of media.
If you're interested to see some very simple hand-crafted HTML, have a look at the document source for these lecture notes...
To revise, in
HTTP/0.9 the GET
operation was used to obtain HTML
"pages" from a server, eg: the "home page" of ironbark at URL
http://ironbark.bendigo.latrobe.edu.au/index.html
We first establish a reliable (TCP) connection to the server process
waiting at port 80 (HTTP) on
ironbark.bendigo.latrobe.edu.au
. We then send the single
line request shown in italics
and receive in
response the HTML text, shown here in boldface
:
HTTP 0.9 actually defined a few other operations besidesGET /index.html <HTML> <HEAD> <TITLE>The Department of Information Technology at La Trobe University, Bendigo</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF"> <!-- ******** Department Header ***************--> <IMG SRC="/gifs/irbkname.short.gif" align="right" ALT="La Trobe University, Bendigo"> <font size="+2">La Trobe University, Bendigo</font> ..........etc
GET
. However, since HTTP/1.0 (RFC 1945) and HTTP/1.1 are now commonly
used, we shall defer discussion of them.
GET
request looks like:
The response from the server consists of a status line, then a number of plain text headers, followed by a blank line and then the requested data object. It's clearly a very similar format to an RFC822 email message:GET /index.html HTTP/1.0<newline><newline>
GET /index.html HTTP/1.0 HTTP/1.0 200 OK Server: Netscape-Enterprise/3.5.1C Date: Sun, 16 Mar 2004 11:48:39 GMT Content-type: text/html Last-modified: Fri, 14 Mar 2004 02:22:52 GMT Content-length: 11378 <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> ........(etc)
HTTP/1.0 200 OK
Server: Netscape-Enterprise/3.5.1C
Date: Sun, 16 Mar 2004 11:48:39 GMT
Last-modified: Fri, 14 Mar 2004 02:22:52 GMT
Last-modified:
" header is very useful, see
the HTTP/1.0 "Conditional-GET
"
and HEAD
" request
types.
Content-length: 11378
Content-type: text/html
Content-Encoding:
" (used
in MIME-encoded email messages) is not normally used in
HTTP because the protocol is designed to handle "8-bit" data.
That is, any data at all can be sent after the blank line which
signifies the end of the response headers.
GET
RequestGET
request (and other HTTP
request types, see later) to additionally send
a series of optional Request Headers along with the
request. For example, here's a typical request to ironbark, snarfed
from the local network (with some cosmetic editing):
The request headers are terminated with a blank line -- hence the need for two newlines, as seen in the first slide of today's lecture. It's also possible for the request to contain a "message body", just like a response message -- we defer discussion of this until later.GET /index.html HTTP/1.0 Accept: image/gif, image/jpeg, */* Host: ironbark.bendigo.latrobe.edu.au User-Agent: Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC) Referer: http://bindi.bendigo.latrobe.edu.au/index.html
If-modified-since:
", which takes an HTTP standard
GMT time/date string as its value.For example, in the above example we saw an HTTP response with the following header line:
The browser can cache this object (keep a local copy in case it's requested again soon), and use the local copy instead of going out to the network, possibly causing uneccessary delays. The HTTP request would then look like:Last-modified: Fri, 14 Mar 2004 02:22:52 GMT
If the requested page has not, in fact, been modified since the specified time, it won't be returned -- instead, a "GET /index.html HTTP/1.0 If-modified-since: Fri, 14 Mar 2004 02:22:52 GMT User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:0.9.4) Host: ironbark.bendigo.latrobe.edu.au ....etc, as before
304 Not
Modified
" response is sent, without a response body -- just
the headers. We return to the topic of caching in the next lecture.