Forms in HTML

In HTML version 2, the idea of "forms" (and various related data structures) were introduced. These provided the basis technology for the recent explosion in "electronic storefronts" on the Web as well as several other innovations.

A form in HTML is an area of a Web page which is used to gather input from a human user. The information which is gathered can then be returned to the page's owner using a SUBMIT action.

The form is, as expected, delimited by a <FORM> and </FORM> markup pair.

The <FORM> markup has two important attributes:

ACTION
specifies the action URL of this form. Typically this is the URL of an executable program, see the later discussion of CGI.
METHOD
specifies the way in which the ACTION URL is accessed. There are two methods, GET and POST.
Example:

<FORM
ACTION="http://ironbark.bendigo.latrobe.edu.au/cgi-bin/myprog METHOD="GET">

Form Elements

Data is collected in a form by the use of INPUT tags. Each INPUT tag has an associated TYPE attribute.

For example:

<INPUT TYPE="TEXT"
This INPUT type can take several further attributes, eg:
<INPUT TYPE="TEXT"  NAME="Name" MAXLENGTH="64" SIZE="20">
In a browser, this would be presented as a (scrollable) textbox, 20 characters wide (but able to accept 64 characters of input).

There are several other INPUT types:


Form Elements #2

There are two other markup tags used in forms:

SELECT
allows the user to select from an enumerated list of values. Each value is given by an OPTION markup tag, which can take a couple of extra attributes.
TEXTAREA
presents a multi-line text field into which the user can type information. It is specified as a number of ROWS and COLS and can have a NAME attribute and an initial value.
When form information is returned to the HTTP server, it is encoded into a format called:
    application/x-www-form-urlencoded
In this format, space characters are replaced by "+", and non-printing characters are given in hexadecimal format, thus: %HH, where the H characters are the hexadecimal digits of the byte. The fields of a form are encoded as name=value, with each pair separated by the "&" character. Fields with null values are (normally) not sent, nor are unselected CHECKBOXes and RADIO buttons.

Submission Methods

The two ways in which form data can be returned to the server are METHOD=GET and METHOD=POST.

GET
This method is preferred if the submission of the form is not going to have a lasting effect on the global state of the universe - that is, it does not have side effects. For example, it may query a database, returning the result as HTML. A HTTP GET request is issued to the ACTION URL specified in the <FORM> markup tag, with the urlencoded form information appended after a separating "?" character. This can generate very long URLs.
POST
This method should be used where processing of the form is intended to have side effects, eg, updating the contents of a database. In this case, a HTTP POST transaction is performed. The body of the transaction contains the urlencoded form data, as a single long line of text. The POST transaction is directed at the URL specified in the ACTION attribute of the <FORM> tag.
In "real life", GET and POST methods are used pretty much interchangeably, depending on the programmer's preference.

Common Gateway Interface (CGI)

CGI defines the way in which form data is presented to an application program by the HTTP server.

When a user clicks the SUBMIT button on a form, the HTTP server starts up the specified CGI program, and makes the form data available to it.

The subtle difference between GET and POST is the way in which a CGI program receives the form data. If the method was GET, the information is usually obtained by examining the contents of environment variables containing the form values as well as some extra useful stuff.

If the method was POST, the CGI program usually receives the form data on its standard input stream, with the extra stuff obtained, as before, from environment variables.

CGI programs can, as a rule, be written in any language (compiled or interpreted) supported on the system running the HTTP server.

On Unix servers, they are commonly written in Perl, C or as Bourne shell (/bin/sh) scripts.

A CGI program (almost) always generates (to standard output) a Web page which is returned to the browser.


Example CGI

Using the following HTML:
<html>
<head>
<TITLE>Test form for BITCNE Lecture #21 Example</TITLE>
</head>
<body bgcolor="#FFFFFF">

<h1>Test form for BITCNE Lecture #21 Example</h1>
<FORM action="/subjects/bitcne/cgi/L21CGI.cgi" method="GET">
Family Name: <INPUT type="text" name="family" size="20"><br>
Given Name: <INPUT type="text" name="given" size="20"><br>
<input type="submit" value="Get Information">
<input type="reset" value="Clear Form">
</FORM>
</BODY>
</HTML>
The <FORM> markup of this segment of HTML looks like:
Family Name
Given Name
Note that you can press the "submit" button if you like...

This <FORM> is processed by the following CGI program (in Perl):

#!/usr/local/bin/perl -w
require "common.pl";
$webmaster = "P\.Scott\@latrobe\.edu\.au";
&parse_form_data (*simple_form);

print "Content-type: text/html", "\n\n";
print "<HTML>";
print "<HEAD>";
print "<TITLE>This is what you sent me</TITLE></HEAD>";
print "<body bgcolor=#ffffff>";
print "<h1>This is what you sent me</h1>";

$family = $simple_form{'family'};
$given = $simple_form{'given'};

print "Given name: <strong>", $given, "</strong>.<br>", "\n";
print "Family name: <strong>", $family, "</strong>.<br>", "\n";

print "</BODY>";
print "</HTML>";

exit(0);

Web Applets

One of the Big New Things in computer networks is the idea of Applets, programs which run inside the Web browser.

Applets can make Web pages dynamic. Forms and CGI programs allow some two-way interaction, but with applets this interaction can occur in real time. In principle, there is no restriction on what an applet can do - everything from simple animations to full featured spreadsheets and word processors - all within the browser.

An applet is specified in a Web page as (eg):

<APPLET CODE="spread.class" WIDTH=200 HEIGHT=100> </APPLET>
This reserves a rectangular space in the browser for the applet to execute in, then loads the applet from the server and runs it.

It is extremely important that applets execute in a secure way - ensured by the browser vendor.


Java

Virtually all applets are coded in the Java language, developed by Sun Microsystems.

Java is an object-oriented language with characteristics derived from C, C++, Pascal and Smalltalk. It has some weak aspects, but is generally regarded as a very good general purpose programming language.

Source code written in Java is compiled to a special form of binary executable file called bytecode. No (common) computer executes bytecode directly - it is the binary format for an abstract computer architecture, and is "executed" by a software bytecode interpreter.

A Java bytecode interpreter is built-in to most modern browsers. In addition, some browsers implement a Just In Time compiler, which converts the bytecode binary format into the format of the native processor on the fly, for much faster execution than interpretation, which is intrinsically slow.

Java programs are usually developed in a development kit, containing a bytecode compiler, standard libraries and a standalone bytecode interpreter (applet executor).

Finally, there is no reason why applets cannot be developed in other source languages once suitable bytecode compilers become available.


This lecture is also available in PostScript format. The tutorial for this lecture is Tutorial #20.
[Previous Lecture] [Lecture Index] [Next Lecture]
Phil Scott