by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page
Dateline: 07/11/99
This is the second article in a series involving SAX and Java. In this series, I will show you how to use SAX to convert an XML document into a set of Java objects, and how to convert a set of Java objects into an XML document.
You might wonder why anyone would want to do this. Well, I do it every day for the purpose of maintaining this site. I maintain all of the links in an XML database and have written several Java programs to manipulate and maintain that database. As of this writing, the database contains more than five-hundred links and their associated comments.
As you are aware, links on the web come and go, so the maintenance of more than five-hundred links spread out over a couple dozen HTML files could be a formidable task. However, by consolidating all of the links into a single XML document and using a Java program to automatically generate the required HTML files, I have converted an otherwise formidable task into one that is very manageable.
Whenever I need to modify or add to the database, rather than manipulating the XML file directly, I simply convert the XML file to a set of Java objects, manipulate those objects, and then convert the objects to a new XML file.
I wrapped up the previous article by telling you "My plan is to continue this discussion in the next article, showing you some of the Java code that can be used to convert the XML file into a set of Java objects using SAX."
So, without further discussion, let's continue down that path. You may find it helpful to review the previous article before diving into the details of this one.
This program uses the IBM parser (XML4J) along with the XML file from the previous article.
The program, consists of the following four source files:
I will discuss all four files during this series of articles. Taken together, they illustrate the conversion of an XML file to a set of Java objects, and the conversion of those objects back to a new XML file. The contents of the objects are displayed.
This program reads an XML file with a very specific format as described in the previous article. It creates a Java object for each record in the file. You will recall that the XML file contains the basis for a computerized exam, and each record represents an exam problem. The program stores the objects in a Vector container. Classes in the file named Sax02B.java are used for this purpose.
After generating the Vector object containing the XML records, the program displays each of the instance variables in each object whose reference is stored in the Vector. This is a very simple illustration of how the XML data can be processed after first having converted it into object format. In a real program, something much more significant than simply displaying the data (such as editing the data) would occur at this point.
Then the program generates an XML file containing a record for each element in the Vector, and writes it out to a disk file named junk.xml. Classes in the file named Sax02C.java are used for this purpose.
The conversion from objects to XML is very specific and fairly brute force. A more generalized approach using the Document Object Model will be illustrated later.
A class definition in the file named Sax02D.java is used by all of the classes that make up this program as a common class for representing an XML record as an object.
No particular effort was expended to make this program robust. In particular, if it encounters an XML file in the wrong format, it may throw an exception, or it may continue to run, but it probably will not work properly.
The program was tested using JDK 1.2 under Win95. It also requires the IBM XML4J parser, or some suitable substitute.
The program was tested with the XML file named Sax02.xml listed in the previous article. (Note that line breaks were manually inserted in that listing to force the text to fit in this narrow page format):
The program produced the following output on the screen. Line breaks were manually inserted here to force the text to fit in this format)
Problem Number: 1 Question: Which of the following are grown in a vegetable garden? Type: multiple Number choices: 4 Valid: 0,1,3 Item number 0: Carrots Item number 1: Cabbage Item number 2: Apples Item number 3: Lettuce Explanation: Carrots, cabbage, and lettuce are grown in a vegetable garden. Apples are grown in an orchard. Problem Number: 2 Question: Which one of the following requires XML entities? Type: single Number choices: 3 Valid: 1 Item number 0: Tom Item number 1: "<Mary & Sue>" Item number 2: Dick Explanation: Left and right angle brackets, ampersands, and quotation marks must be represented in XML by entities |
The program produced an output file named junk.xml containing the following. (Line breaks were manually inserted here to force the text to fit in this format): Since the program did not modify the data after reading the original XML file and before creating the new XML file, this output should be a replica of the original XML file.
<?xml version="1.0"?> <exam> <problem problemNumber="1"> <question>Which of the following are grown in a vegetable garden?</question> <answer type="multiple" numberChoices="4" valid="0,1,3"> <item>Carrots</item> <item>Cabbage</item> <item>Apples</item> <item>Lettuce</item> </answer> <explanation>Carrots, cabbage, and lettuce are grown in a vegetable garden. Apples are grown in an orchard. </explanation> </problem> <problem problemNumber="2"> <question>Which one of the following requires XML entities?</question> <answer type="single" numberChoices="3" valid="1"> <item>Tom</item> <item>"<Mary & Sue>"</item> <item>Dick</item> </answer> <explanation>Left and right angle brackets, ampersands, and quotation marks must be represented in XML by entities</explanation> </problem> </exam> |
The entire program consists of a driver file named Sax02.java and several helper files as listed earlier. I'm going to begin with the file containing the class definition used to instantiate objects to contain the XML data.
The class definition in this file provides a common object format for storage of the data from the XML file. The class is designed to contain an instance variable for each piece of data stored in a single exam problem in the XML file. This class has no methods. It is simply a container for the data extracted from the XML file.
This is a very specific class designed for a very specific XML format.
class Sax02D{ int problemNumber;//an attribute of <problem> String question;//the content of <question> String type;//an attribute of <answer> int numberChoices;//an attribute of <answer> String valid;//an attribute of <answer> //Each populated element in the following // array contains the content of one // <item> element in the XML file // with an arbitrary limit of five such // elements. String[] item = new String[5]; String explanation;//content of <explanation> }//end Sax02D |
Next I will discuss the driver file named Sax02.java by breaking it up into fragments.
The first fragment shows the beginning of the controlling class along with the declaration of three class variables.
class Sax02 { static Vector theExam = new Vector(); static Thread mainThread = Thread.currentThread(); static String XMLfile = "Sax02.xml"; |
The next fragment shows the beginning of the main() method. In this method, I spawn a new thread of type Sax02B that will parse the incoming XML file, creating an object for each problem specification in the exam, and storing a reference to each of those objects in the Vector mentioned above and referred to by theExam.
Then I invoke the start() method on the thread to start it running.
public static void main (String args[]){ try{ Sax02B fetchItObj = new Sax02B(XMLfile,theExam,mainThread); fetchItObj.start();//start the thread |
If the XML file is a long one, some time will pass before the theExam has been populated and is ready for use.
This is a typical producer/consumer scenario for which there are several control solutions. In this case, the Sax02B thread is the producer and the main thread is the consumer.
Because of the simplicity of this particular situation, I chose simply to put the main thread to sleep and let it sleep until the thread that is parsing the XML file awakens it. That thread will interrupt the main thread when it finishes parsing the XML file, which will cause the main thread to wake up and process theExam.
Thus, the main thread will sleep until the parse is completed. It will wake up when interrupted by the parser thread and will then process the data in the Vector.
If parsing is not completed during the first 100000 milliseconds, it will wake up and then go back to sleep. (However, that would be an awfully long time to complete the parse so it might be better to throw an exception under those conditions.)
try{ while(true){ Thread.currentThread().sleep(100000); }//end while }catch(InterruptedException e){ //Wake up and do something }//end catch |
At this point, each of the exam problems in the XML file has been converted into a Java object. References to those objects have been stored in a Vector object named theExam. This Vector object can be used for any number of useful purposes, such as editing the contents of the individual objects, sorting the problems, administering the test, etc.
In this sample program, I simply display all of the data in each object before converting the objects back to XML format and writing the XML data into a new disk file named junk.xml. This is accomplished in the next fragment.
Everything in this fragment is simple Java programming using the Enumeration interface, so I won't bother to provide an explanation. If this is new to you, you should review the appropriate lessons in my Java tutorials.
Enumeration theEnum = theExam.elements(); while(theEnum.hasMoreElements()){ Sax02D theDataObj = (Sax02D)theEnum.nextElement(); System.out.println( "Problem Number: " + theDataObj.problemNumber); System.out.println( "Question: " + theDataObj.question); System.out.println( "Type: " + theDataObj.type); System.out.println( "Number choices: " + theDataObj.numberChoices); System.out.println( "Valid: " + theDataObj.valid); //The XML file contains a // field that specifies the // number of choices that // will be presented for the // problem on a multiple-choice // test. That value // should specify the number // of data values in the // array. Use that value // to extract the data // values from the array. for(int cnt = 0; cnt<theDataObj.numberChoices; cnt++){ System.out.println("Item number " + cnt + ": " + theDataObj.item[cnt]); }//end for loop System.out.println( "Explanation: " + theDataObj.explanation); }//end while(theEnum.hasMoreElements()) |
The next fragment instantiates an object of type Sax02C and invokes the writeTheFile() method on that object to convert the data stored in the Vector object to XML format and write it into an output file named junk.xml. I will discuss the particulars of the Sax02C class used to accomplish this in a subsequent article.
Sax02C xmlWriter = new Sax02C(theExam,"junk.xml"); xmlWriter.writeTheFile(); }catch(Exception e){System.out.println(e);} }//end main }//end class Sax02 |
That completes the main() method and also completes the class definition for Sax02. I will discuss the process of converting the XML file to a set of objects in the next article.
coming attractions... |
My plan is to continue this discussion in the next article, showing you more of the Java code that can be used to convert the XML file into a set of Java objects using SAX.
Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.
This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.
111042
Copyright 2000, Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page
-end-