What is XML?

by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page

Dateline: 04/20/99

back in the good old days...
Let's go way back to the early 1980s. You have just purchased your shiny new Personal Computer from IBM and a Pascal compiler from Borland to go with it. The Pascal development package includes a plain text editor that you can use to create Pascal source code. The facilities of the editor make it easy to create plain text and save it in a file.

a full screen editor
You decide to use the editor to create a memorandum for a project that you and an associate are working on. You write the memorandum using the text editor, print a copy, and take it to your associate for review before distributing it.

the sales memo
The purpose of the memo is to convince the boss to purchase some additional equipment for the project. Your associate is more sales oriented than you are. She takes out a red pencil and begins to
markup your plain text memorandum (note the use of the term "markup" here). When she has finished, you look at what she has written with her red pencil. She has suggested that some words in the memorandum should be in bold, some words should be underlined, and some words should be in Italics.

what, no word processor?
But this is a problem. Your plain text editor from the Borland Pascal 1.0 IDE doesn't have bold, underline, or italics capability. After all, a Pascal compiler doesn't care about such things. And you don't have anything resembling a word processor.

you can print bold, italics, and underline
So, you go back to your office to ponder the situation. The text editor is the closest thing that you have to a word processor. However, you do have a couple of things in your favor. First, you discover that your printer has the ability to display characters in plain, bold, Italics, or underlined.

Second, you are a pretty good programmer. You know how to read the text file from the disk and modify it on the way to the printer. In short, you know how to generate and insert the control codes that will cause the printer to print the characters in one of the four renderings listed earlier. What you lack is a scheme for telling your program when to cause the printer to change from one rendering to another.

you hit upon a scheme
You take note of the fact that nowhere in your memorandum are the left and right angle bracket characters used (< and >). You can use them as delimiters to tell your program to change the rendering of the other characters on the paper.

The scheme that you devise is to define some new character combinations that you insert into your plain text document. At the point where you want the printer to switch into bold mode, you enter the following text:
   
<bold>

At the point where you want the printer to switch from bold back to plain, you enter the following text:
   
</bold>

Then you write a program that will read the plain text document from the disk, with these new character combinations inserted, and transfer the characters to the printer.

rendering your text
Whenever the program encounters
<bold>, it will not send those characters to the printer. Rather, it will send the proper code to cause the printer to switch into bold mode.

Similarly, whenever the program encounters </bold>, it will not send those characters to the printer either. It will send the proper code to cause the printer to switch from bold back to plain.

really ugly text
When you view some of the text on your screen inside the editor it might look like the following:
   The <bold>big fat gray fox</bold> jumped over the little fence.

However, when you view the text as printed on your printer, it might look like the following.
   The big fat gray fox jumped over the little fence.

congratulations!
You have just invented your own markup language (I told you earlier to pay attention to the use of the term
markup). Your invention may have been the predecessor of HTML, XML, and SGML. Of course it needs a lot more work before it will be really useful. Also, you need to invent some jargon to go with your invention.

so, what is xml anyway?
I am going to answer this question with a quote from one of the world's leading authorities on XML, Peter Flynn,

"XML is the `Extensible Markup Language' (extensible because it is not a fixed format like HTML). It is designed to enable the use of SGML on the World Wide Web."

According to Peter, "It's actually slightly misnamed: XML itself is not a single markup language: it's a metalanguage to let you design your own markup language. A regular markup language defines a way to describe information in a certain class of documents (e.g. HTML). XML lets you define your own customized markup languages for many classes of document. It can do this because it's written in SGML, the international standard metalanguage for markup languages."

xml versus html
Why do we need XML when we already have HTML? As the structured data and documents that we are called upon to deliver over a network become more complex, we continue to need more powerful delivery vehicles. For several years we have been patching HTML to provide that increased power. Even though we can, and probably will, continue to make patches to HTML, I believe that the next quantum leap in structured data delivery will come, not from patches to HTML, but rather from a shift to XML where justified.

Richard G. Baldwin

 

coming attractions...

Next week I will introduce you to some of the jargon that surrounds your new invention, and will also get a little more technical in my discussion.

In the meantime, if you feel like you are ready for the really hard core stuff, you might want to jump ahead and take a look at Peter Flynn's FAQ.

Credits: These HTML pages were produced using the WYSIWYG features of Microsoft Word 97. The computer image used on this page was used with permission from the Microsoft Word 97 Clipart Gallery.

310913

Copyright 2000, Richard G. Baldwin

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin@austin.cc.tx.us
Baldwin's Home Page

-end-