by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page
Dateline: 08/01/99
This is the third article in a series involving SAX and Java. In this series, I will show you how to use SAX to convert an XML document into a set of Java objects, and how to convert a set of Java objects into an XML document.
The article that I published immediately prior to this one explained how I use Java and XML to maintain this site, which currently contains more than 800 links to quality XML resources. This is a good example of why you might want to convert between Java objects and XML documents.
I wrapped up the previous article in this series by telling you " My plan is to continue this discussion in the next article, showing you more of the Java code that can be used to convert the XML file into a set of Java objects using SAX."
Let's continue down that path. Some of this material is a little difficult, so you may find it helpful to review the first and second articles in this series before diving into the details of this one.
The classes in this file convert the XML file to a set of objects of type Sax02D stored in a Vector. Note that this is a highly specialized class designed to accommodate a very specific XML format.
The first fragment shows required import directives simply to illustrate that the program imports packages that are part of the IBM parser library and are not part of the standard Java API.
import org.xml.sax.*; import org.xml.sax.helpers.ParserFactory; |
The purpose of the class named Sax02B is to parse a specified XML file, creating an object for each problem specification in the file, and writing the references to those objects in a Vector passed in as a parameter named theExam.
The main thread should go to sleep to await completion of the parse. When the parse is complete the constructor parameter named mainThread is used to interrupt the sleeping thread and wake it up.
The next fragment shows the beginning of the class definition including the declaration (and initialization) of some instance variables along with the constructor. All that the constructor does is to save the incoming parameters in the instance variables.
class Sax02B extends Thread{ String XMLfile; Vector theExam; Thread mainThread; static final String parserClass = "com.ibm.xml.parsers.SAXParser"; //-----------------------------------------// //Constructor Sax02B(String XMLfile,Vector theExam, Thread mainThread){ this.XMLfile = XMLfile; this.theExam = theExam; this.mainThread = mainThread; }//end constructor |
Note that the code in the previous fragment defines and initializes a String reference named parserClass that identifies the class from which the parser will be instantiated. The particular string used here identifies the IBM parser.
As you will recall, the run() method is where the action takes place in a Java thread. The next fragment shows the beginning of the run() method for this thread.
The first statement inside the run() method uses a SAX factory method along with the identification of the parser vender to create an object of type Parser. (This is actually an object of type Interface org.xml.sax.Parser.)
All SAX parsers must implement this interface. It allows applications to register handlers for different types of events and to initiate a parse from a URI, or a character stream.
public void run(){ try{ Parser parser = ParserFactory.makeParser(parserClass); |
The bottom line is that the makeParser() method of the ParserFactory class creates an instance (object) of a class that implements the Parser interface.
The object is based on a String that specifies the class libraries provided by the vendor of the SAX-based parser software.
This parser object can then be used to perform the routine processing of the XML file, generating a series of document events and potentially error events based on the information in the file.
The next fragment instantiates an object of the DocumentHandler type to handle events and errors. Note that DocumentHandler is an interface and is not a class.
I will explain how this object performs its work in conjunction with a discussion of the EventHandler class later.
DocumentHandler handler = new EventHandler(theExam,mainThread); |
The handler object instantiated above has the ability to handle both document events and error events. In one case, it listens for document events such as the start or end of an element. In the other case, it listens for events caused by errors in the XML data.
Document event methods and error event methods are declared in two different interfaces. The handler object instantiated above is of the type EventHandler. A superclass of that class implements both interfaces making it possible for an object of that type to listen for both types of events. However, it does give rise to the requirement to cast the handler object to type ErrorHandler before registering it on the parser object.
(Note that setDocumentHandler() and setErrorHandler() are listener registration methods and are not methods used to set properties as might be indicated by their names.)
parser.setDocumentHandler(handler); parser.setErrorHandler((ErrorHandler)handler); |
The single executable statement in the next fragment is what makes it all happen. This statement executes the parse() method on the object of type Parser to make a pass through the XML document specified by the parameter.
While making the pass through the document, this method generates a variety of document events and error events as the various tags, attributes, and data values in that document are encountered.
That completes the definition of the run() method of the class named Sax02B.
But, don't go away. There is much more that we need to cover in order to understand this program.
parser.parse(XMLfile); }catch(Exception e){System.out.println(e);} }//end run }//end class Sax02B |
The methods of this class are listeners for document events and error events. The next fragment shows the beginning of the class that includes the constructor and the declaration and initialization of some instance variables.
This class extends the class named HandlerBase. The class named HandlerBase, which is the default base class for handlers, implements the default behavior for four different SAX interfaces:
The first two of these interfaces are of interest to us in this lesson. We will pursue the other two interfaces in subsequent lessons.
The constructor simply stores the incoming parameters in the corresponding instance variables.
Three of the instance variables are initialized to a value of false. These are status flags used later by the program to keep track of the type of element being processed at any particular point in time. The type of element involved in each case is indicated by the name of the variable.
class EventHandler extends HandlerBase{ Vector theExam; //store objects in this Vector //wake this thread upon completion Thread mainThread; //create objects of this type Sax02D theDataObj; boolean inQuestionElement = false; boolean inItemElement = false; boolean inExplanationElement = false; int itemNumber; //-----------------------------------------// //Constructor EventHandler(Vector theExam, Thread mainThread){ this.theExam = theExam; this.mainThread = mainThread; }//end constructor //-----------------------------------------// |
The EventHandler class overrides the event handling methods of the DocumentHandler interface and the ErrorHandler interface to provide the desired functionality for the program.
The next fragment shows the first two overridden event-handling methods.
The Parser object invokes these two overridden methods when the parse process encounters the beginning or the end of the XML document.
The default versions of these two methods return quietly doing nothing. Application writers can override the startDocument() method to take specific actions at the beginning of a document (such as creating an output file).
Similarly the application writer can override endDocument() to take specific action at the end of a document (such as closing a file).
Note that these methods don't receive any parameters.
In this program, there is nothing that needs to be done at the beginning of a document. Therefore, I could have accepted the default version of startDocument() without overriding it. However, I elected to override it to do nothing simply for completeness of the illustration.
However, there is something that needs to be done at the end of the document. In particular, the main thread needs to be awakened.
This is accomplished in the endDocument() method by invoking the interrupt() method on the reference to the main thread. This causes the main thread to awaken from its sleep and throw an exception. The exception is ignored in this case since the objective is to awaken the main thread so that it can continue its task and display the contents of the objects just created from the incoming XML file.
//Handle event at beginning of document public void startDocument(){ //Not required. Nothing to do here. }//end startDocument() //-----------------------------------------// //Handle event at end of document. public void endDocument(){ //wake up the main thread mainThread.interrupt(); }//end endDocument() |
The next overridden handler method is invoked at the start of every element. For review, the beginning of an element might look like this in an XML document:
<poem PoemNumber="1" DummyAttribute="dummy value">
The boldface portions are commonly referred to as attributes. An element can contain none, one, or more attributes. Each attribute has a name and a value.
In this case, the element named poem contains two attributes named PoemNumber and DummyAttribute (the name of the attribute is unrelated to the name of the element).
Each attribute also has a value, which is enclosed in double quotation marks. In this case, the values for the two attributes are 1
and dummy value
respectively.
The event handler method that gets called when the parser encounters a new element is startElement(), as shown in the next fragment.
This method receives two parameters. The first parameter is a String containing the name of the element. The second parameter is a reference to an object of type AttributeList containing information about the attributes.
Only the beginning portion of the startElement() method is shown in the following fragment, as the method is fairly large. Overall, this method identifies the type of element and takes the appropriate action for each type.
Some identifications result in no action being taken because no action is required for that element type. Code that takes no action was included simply to illustrate how to take a particular action for those element types if needed.
This method contains a series of if-else statements that determine the type of element and deal with it appropriately based on its type.
The beginning of the if-else chain is shown in the following fragment. In this case, a test is made to determine if the element is of type exam. If so, no action is taken because there is no specific action needed at the start of an element of type exam.
(The exam type could be ignored in the startElement() method.)
public void startElement(String elementName, AttributeList atts) throws SAXException{ if(elementName.equals("exam")){ //Not required, nothing to do here. }//end if(elementName.equals("exam")) |
Several different actions are needed at the start of an element of type problem as reflected in the following fragment. This fragment
else if(elementName.equals("problem")){ itemNumber = 0;//initialize the item counter //instantiate a new data object theDataObj = new Sax02D(); //begin populating the object with // attribute value theDataObj.problemNumber = Integer.parseInt(atts.getValue( "problemNumber")); }//end if(elementName.equals("problem")) |
Three types of elements have text content that will need to be extracted and stored in the object. Those types are:
The method used to extract the text content from the XML file doesn't provide any indication of the element type to which the content belongs. Therefore, it is necessary for the program to determine and remember the type of element being processed before extracting the text content. This is accomplished by setting the three status flags discussed earlier to values of true or false.
The following fragment tests to determine if the element is of type question. If so, it sets the flag named inQuestionElement to a value of true. This flag will be consulted by the characters() method later when extracting text to determine where to put that text in the object being populated.
else if(elementName.equals("question")){ //set flag that identifies the type // of element inQuestionElement = true; }//end if(elementName.equals("question")) |
Elements of type answer contain one or more elements of type item. In addition, they contain attributes that are used to determine how to make use of the information contained in the elements of type item.
The next fragment tests to determine if the element is of type answer. If so, it extracts the values for the attributes named type, numberChoices, and valid, and saves that information in the appropriate fields of the object being populated.
else if(elementName.equals("answer")){ //populate data object with attribute values theDataObj.type = atts.getValue("type"); theDataObj.numberChoices = Integer.parseInt(atts.getValue( "numberChoices")); theDataObj.valid = atts.getValue("valid"); }//end if(elementName.equals("answer") |
The next fragments tests to determine if the element is of type item or type explanation. If so, it sets the status flag identifying the element type as one of those types.
The fragment also throws an exception if a match for the element type has not been found. Control should never reach that point unless the types of elements in the XML file are different from what this program was designed to process.
else if(elementName.equals("item")){ //set flag that identifies the type // of element inItemElement = true; }//end if(elementName.equals("item")) else if(elementName.equals("explanation")){ //set flag that identifies the type of // element inExplanationElement = true; }//end if(elementName.equals("explanation")) //should never reach here else throw new SAXException( "Invalid element name: " + elementName); }//end start element |
That ends the method that is invoked at the start of each element. When control reaches this point, attribute values have been extracted and stored in the object being populated. Also, in some cases, a flag has been set identifying the type of element that triggered the event.
Because it doesn't need to deal with attributes, the overridden endElement() event handler is simpler. This method is invoked when the parser encounters an end tag for an element.
The method receives a single parameter that is the name of the element.
This method identifies the type of element and takes the appropriate action for each type. Some identifications result in no action being taken because no action is required for that element type. In those cases, I could have simply ignored the element type. This code was included simply to illustrate how to take a particular action for those element types if needed.
As in the overridden startElement() method, this method executes a series of if-else statements to identify the type of element and take the appropriate action.
The next fragment shows the processing for the end of the element of type exam. In this case, no action is required, so the code essentially does nothing.
public void endElement (String elementName) throws SAXException{ if(elementName.equals("exam")){ //Not required. Nothing to do here. }//end if(elementName.equals("exam")) |
The program extracts the information from the XML file and creates one object for each element of type problem contained in that file. Therefore, the end of an element of type problem signals the need to store the object that has just been created and populated in the object of type Vector.
This is accomplished by invoking the addElement() method on the Vector object named theExam as shown in the next fragment.
else if(elementName.equals("problem")){ theExam.addElement(theDataObj); }//end if(elementName.equals("problem")) |
The next fragment shows the processing applied when the end of an element of either the question or answer types is encountered.
In the first case, the status flag is cleared to indicate that an element of the type question is no longer being processed.
In the second case of type answer, no processing is required. This type could simply have been ignored in this method.
else if(elementName.equals("question")){ inQuestionElement = false; }//end if(elementName.equals("question")) else if(elementName.equals("answer")){ //Not required. Nothing to do here. }//end if(elementName.equals("answer")) |
Code in the next fragment clears the flag that indicates that an element of type item is being processed when the end of an item element is encountered.
In addition, you will recall that an answer element can contain from one to five elements of type item. Subsequent processing needs to know which item is being processed. Therefore, when the end of an item element is encountered, the following code increments a counter that is keeping track of the item number.
else if(elementName.equals("item")){ inItemElement = false; itemNumber += 1; }//end if(elementName.equals("item")) |
The next fragment tests to determine if the element is of type explanation. If so, it clears the flag to indicate that an element of type explanation is no longer being processed.
In addition, the fragment throws an exception if a match for the element type was not found in the previous code within the endElement() method. This shouldn't happen unless the XML file contains an element type for which this program was not designed.
else if(elementName.equals("explanation")){ //Set flag showing that an element of // this type is no longer being processed inExplanationElement = false; }//end if(elementName.equals("explanation")) //should never reach here else throw new SAXException( "Invalid element name: " + elementName); }//end endElement() |
That ends the discussion of the endElement() event handler.
The content of an XML element is the text that appears between the beginning and ending tags. The next fragment shows the event handler that is invoked by the parser when the parser encounters content. The name of the content handler method is characters().
Note that the character data may arrive all together or may arrive in chunks. Therefore it is necessary to concatenate the chunks when reconstructing the content.
In a nutshell, this method receives a character array containing the content of an element. The overridden version of the method in this sample program first determines the type of element to which the content belongs, and then stores it in the object being populated according to the type of element.
The next fragment shows the beginning of the characters() method and shows the first of several if-else statements used to determine the type of element being processed to determine where to store the character data.
The fragment also shows how the program tests to determine if the chunk of characters that generated the event are the first characters received for that particular element. If not, the characters received are concatenated onto the characters previously received and stored in the object for that element type.
public void characters(char[] ch, int start,int length){ if(inQuestionElement){ //if processing question element if(theDataObj.question == null){ //if first chunk, save first chunk in // the data object theDataObj.question = new String(ch, start, length); }//end if(theDataObj.question == null) else{ //Not first chunk. Concatenate this // chunk with previous data in the // data object. theDataObj.question += new String(ch, start, length); }//end else }//end if(inQuestionElement) |
Processing of the content of elements of type item is complicated by the fact that each element of type answer can contain up to five elements of type item.
Recall that the number of the item is maintained in an instance variable of the object named itemNumber. The value of this instance variable is used to store the concatenated content string into an element of a five-element array in the object based on the item number.
Otherwise, this code is essentially the same as the code in the previous fragment.
else if(inItemElement){ //if processing item element if(itemNumber < 5){ //hard code the limit for brevity if(theDataObj.item[itemNumber] == null){ //This is first chunk. Store it in // data object theDataObj.item[itemNumber] = new String(ch, start, length); }//end if(theDataObj.item[itemNumber]... else{//Not first chunk. Concatenate it. theDataObj.item[itemNumber] += new String(ch, start, length); }//end else }//end if(itemNumber < 5) }//end if(inItemElement) |
The only other type of element that can contain content is type explanation. The following code fragment extracts the content for type explanation, concatenating the chunks if necessary, and stores it in the object being populated.
else if(inExplanationElement){ if(theDataObj.explanation == null){ //This is first chunk. Store it in // data object. theDataObj.explanation = new String(ch, start, length); }//end if(theDataObj.explanation == null) else{//Not first chunk. Concatenate it theDataObj.explanation += new String(ch, start, length); }//end else }//end if(inExplanationElement) }//end characters() |
That ends the discussion of the event handling method named characters().
That brings us to the methods that are declared in the interface named ErrorHandler. This interface, which declares three different handler methods, is the Basic interface for SAX error handlers.
A SAX application that needs to implement customized error handling, must implement this interface. Then it must register an object of the interface type with the SAX parser using the parser's setErrorHandler() method. The parser will then report all errors and warnings through this interface.
The code to accomplish this in this program is essentially the same as was explained in an earlier article on SAX. Therefore, I won't discuss that code further in this article.
There is one major area left to cover -- the creation of an XML file based on the contents of the objects stored in the Vector.
coming attractions... |
My plan is to continue this discussion in the next article, showing you more of the Java code that can be used to create the XML file based on the contents of the objects stored in the Vector.
Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.
This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.
010909
Copyright 2000, Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page
-end-