previous | start | next

Document Formats -- XML

We have concentrated, so far, on protocol formats, but the data (or document) is also interesting. For example, the (usually) ASCII HTML document is the basis of the World Wide Web. HTML is a curious mixture of structural (or semantic) markup, and markup elements used for in-line presentational formatting. For example, <h2>Header</h2> is clearly a structural markup, whereas <b><i>important text</i></b> is (generally speaking) simply an indication of how the author would like the text displayed.
 
HTML has evolved (via mechanisms such as Cascading Style Sheets (CSS)) into the far richer XML (eXtensible Markup Language). In XML, the details of both the meaning of the markup tags, and the presentational aspects of the document have been separated from it. The document itself contains only semantic (or structural) information. Conceptually we have the notion of "Document as Database"
 
XML can be considered as a document-level canonical form. It has already been used extensively in the Web, both as an adjunt to HTML and as a replacement -- some modern browsers can already process XML documents. More importantly, it is becoming clear that more complex "Web Services" can, and will, be based on XML.
 
Lecture 24: Data Formats and Encoding -- A Philosophy Lecture Copyright © 2003 P.Scott, La Trobe University Bendigo.



previous | start | next