[1] There are alternatives to this: we already seen a technique (way back in the telnet lecture) generically called terminal emulation, whereby the sender of the data converts it to the specific format expected by the receiver before sending. The other approach is called (in some circles) receiver-makes-right. Here the receiving software, knowing the source of the data, converts it to its own format before proceeding. This obviously fails if the source can't be determined!
2529
" is clearly an integer. Note, however, that
even such a simple system has potential pitfalls: think of the
textfile conventions of Unix systems, PCs and Macs
vis-a-vis the telnet NVT "line-of-text" convention used in these
protocols.Protocol messages in these classic Internet application are structured to conform to a grammar -- a set of syntax rules. The receiver of such a message has to parse it to discover its meaning. This can be compared to the process whereby (eg) a Java source file is compiled to a byte-code equivalent. The problem here is that writing a parser is (still) considered to be a difficult programming problem, and developers tend to try and avoid them if possible...
In contrast, an ASN.1/BER bytestream can be interpreted using (in principle, at least) a somewhat simpler pattern matcher. Such software is, in general, easier to write -- it can be written using a "Finite State Machine" model, or could even be as simple as a sequence of nested IF-statements. The downside is a protocol that can't be tested using "human-readable" messages. TANSTAAFL.
<h2>Header</h2>
is clearly a structural
markup, whereas
<b><i>important text</i></b>
is (generally speaking) simply an indication of how the author would
like the text displayed.HTML has evolved (via mechanisms such as Cascading Style Sheets (CSS)) into the far richer XML (eXtensible Markup Language). In XML, the details of both the meaning of the markup tags, and the presentational aspects of the document have been separated from it. The document itself contains only semantic (or structural) information. Conceptually we have the notion of "Document as Database"
XML can be considered as a document-level canonical form. It has already been used extensively in the Web, both as an adjunt to HTML and as a replacement -- some modern browsers can already process XML documents. More importantly, it is becoming clear that more complex "Web Services" can, and will, be based on XML.
An RPC application is built (compiled), as usual, but with external (remote) procedures replaced with stub procedures. The RPC system arranges for the stub procedure to transparently send network messages to the remote procedure, and receive returned values. Thus development of networked applications is, in theory at least, not harder than development for a single machine. The Unix RPC system (originally developed at Sun Microsystems) uses a canonical form called XDR (eXternal Data Representation) data encoding system for sending data across the network. It is quite a complex specification: we will examine how one data type -- the integer is handled.
[2] "Sub-routine" is an historical generic term for a re-usable code-segment with formally specified parameter passing conventions. The term procedure was used for the same thing in Pascal, and function in C.
Take, for example, the integer
1003421
dec
(000f4f9d
hex). We assume that this
integer is stored at address X
in memory. In the
Little-Endian storage, shown at left, the byte at the "address of" the
integer has value 9d
hex. In Big-Endian
storage, shown at right, the byte at the "address of" the integer is
00
hex.
Software which desires to send (as raw bytes) such an integer as a parameter to a remote procedure cannot simply read the bytes from memory and transmit them, because the remote machine might use a different byte-order. In XDR, the solution is to (transparently) convert integers from their native format to Big-Endian format for transmission, and transparently convert them back at the other end to the appropriate native format. Hence, two non-Intel machines will incur no "translation overhead", whereas two Intel machines communicating will be required to convert the order at each end of the communications.
It will be readily seen that, as mentioned, XDR uses canonical forms for data transmission. More importantly, the required conversions occur within the RPC sub-system, so the programmer never needs to be aware of them. Their operation is transparent.
C++
and
Java -- changed the way in which programmers thought
about RPC. Instead of executing a remote procedure/function, the
conceptual model became that of "networked objects", and thus
invocation of their object methods across the network.The three major "frameworks" in this space are:
A separate project team, at Microsoft, decided to extend the basic idea of XML-based RPC to a much more elaborate protocol, calling it the "Simple Object Access Protocol (SOAP)". It has been submitted to W3C as a proposed standard. It can run over HTTP or SMTP (?), and allows arbitrary objects to be encoded (or serialized). SOAP has the backing of several influential companies (Microsoft, IBM, etc).
The (recently invented) expression "Web Services" is based on SOAP, and describes a range of proposed "Business-to-Business" XML-based services running over HTTP (port 80). Perhaps the most significant aspect of SOAP-based Web Services is that both the protocol (usually HTTP) and the core language (XML) are public standards, and are well understood. Even more significant is that SOAP builds on the knowledge gained from a decade of "The Web", and from this perspective alone is likely to succeed.