Data Structures in Java: Part 4, Purpose of Implementations and Algorithms

Baldwin explains that the core collection interfaces in the Java Collections Framework allow collections to be manipulated without regard for how they are implemented.  The framework provides nine concrete implementations of the interfaces.  The framework also provides various algorithms for manipulating the data in the collections.

Published:  May 28, 2001
By Richard G. Baldwin

Java Programming, Lecture Notes # 1356


Preface

This is the fourth lesson in a miniseries on Java data structures and the Java Collections Framework.  The first lesson in the miniseries was entitled Data Structures in Java: Part 1, Getting Started.

The purpose of this miniseries is to help you learn the essential features of Object-Oriented data structures in Java using the Collections Framework.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at Baldwin's Java Programming Tutorials.

The index on my site provides links to the lessons at Gamelan.com.

Preview

At least three things are included in a collections framework: The previous lesson discussed the purpose of the interfaces.  This lesson will discuss the purpose of the implementations and the algorithms in the Collections Framework.

Introduction

In the previous lesson, entitled Data Structures in Java: Part 3, Purpose of Framework Interfaces, we learned that the framework provides nine concrete implementations of the interfaces in the framework.  These nine implementation classes are available for immediate instantiation to produce objects to satisfy your collection needs.

We also learned that the framework provides three incomplete implementations.  These classes are available for you to use as a starting point in defining your own implementations.  Default implementations of many of the interface methods are provided in the incomplete implementations.

Discussion and Sample Program

Purpose of implementations

The implementations in the Java Collections Framework are the concrete definitions of the classes that implement the core collection interfaces.  For example, as of JDK 1.3, the concrete implementations in the Java Collections Framework are provided by the following nine classes.

Available for immediate use

These classes are available for immediate use to instantiate collection objects.

As you can see, there are two classes that obviously fall into the set category, two that obviously fall into the list category, and three that obviously fall into the map category.  I will have more to say about the details of these classes in subsequent lessons.

This leaves two additional classes whose names don't readily divulge the category in which they belong.

Vector and Hashtable classes

The classes Vector and Hashtable were part of Java even before the Java Collections Framework became available.  The Vector class can be used to instantiate objects that fall in the general list category.

The Hashtable class can be used to instantiate objects that fall in the map category.

These two classes have been upgraded to make them compatible with the Collections Framework.

Abstract implementations

In addition to the concrete implementations listed above, the following three classes partially implement the interfaces, but are not intended for instantiation.  Rather, they are intended to be extended into new concrete classes that you define.

Therefore, by either using one of the three classes listed above as a starting point, or by starting from scratch and fully implementing one or more of the interfaces, you can provide new concrete implementations to augment the framework to include collections that meet your special needs.

Purpose of algorithms

Algorithms are methods (not necessarily exposed) that provide useful capabilities, such as searching and sorting.  For example, the Collection interface declares an exposed method named contains.

The contains method

Different classes, different implementations

You can safely invoke the contains method on any object instantiated from a class that properly implements the Collection interface, even if you don't know the actual type of the collection object.

The manner in which the search will be performed will probably differ from one concrete implementation of the interface to the next.  For example, a TreeSet object will perform the search very rapidly with a time cost of only log(n) where n is the number of elements.  On the other hand, for the same number of elements, because of a different underlying data structure, a search on an ArrayList object will probably require more time than a search on a TreeSet object.  As the number of elements increases, the difference in time cost between the two will also increase.

Consider the sample program shown in Listing 1.  This program compares the search speed of the ArrayList class and the TreeSet class.  A detailed discussion of the program follows Listing 1.
 
/*File SpeedTest01
Copyright 2001 R.G.Baldwin
**************************************/

import java.util.*;

public class SpeedTest01{
  public static void main(
                        String args[]){
    new Worker().doIt();
  }//end main()
}//end class SpeedTest01

class Worker{
  public void doIt(){
    int size = 500000;
    //Create a TreeSet object
    Collection aTree = new TreeSet();
    
    //Populate the TreeSet object with
    // random values.  The add() method
    // for a set rejects duplicates.
    Random rnGen = new Random();
    for(int ct = 0; ct < size; ct++){
      aTree.add(new Double(
                  rnGen.nextDouble()));
    }//end for loop
    
    //Create and populate an ArrayList
    // object with the same random
    // values
    Collection aList = 
                  new ArrayList(aTree);
                  
    //Extract a value near the center
    // of the ArrayList object to use
    // as a test case.
    Object testVal = 
        ((List)aList).get(size/2);
    
    //Search for the test value in each
    // of the collection objects.  
    // Measure and display the time 
    // required to perform the search
    // in each case.
    long start = new Date().getTime();
    boolean found = 
               aList.contains(testVal);
    long stop = new Date().getTime();
    System.out.println(
         found + " " + (stop - start));
    
    start = new Date().getTime();
    for(int x = 0; x < 10000; x++){
      found = aTree.contains(testVal);
    }//end for loop
    stop = new Date().getTime();
    System.out.println(
         found + " " + (stop - start));

  }//end doIt()
}// end class Worker

Listing 1

The program begins by instantiating a TreeSet object and populating it with approximately 500,000 elements as shown in Listing 2 below.  The values encapsulated in the objects referred to by the elements in the collection are produced by a random number generator.

Recall that the add method of a Set object rejects duplicate elements, so there may be fewer than 500,000 elements in the object after it is populated.
 
  public void doIt(){
    int size = 500000;

    Collection aTree = new TreeSet();

    Random rnGen = new Random();
    for(int ct = 0; ct < size; ct++){
      aTree.add(new Double(
                  rnGen.nextDouble()));
    }//end for loop

Listing 2

One of the capabilities of the Collection Framework is to create a new Collection object and populate it with the contents of an existing Collection object of a different (or the same) actual type.

The code in Listing 3 instantiates an ArrayList object and populates it with the contents of the existing TreeSet object.  As a result, we then have two different Collection objects of different actual types containing the same elements.
 
    Collection aList = 
                  new ArrayList(aTree);

Listing 3

The objective of this program is to compare the times required to search for and find an element in each of the collections.  Thus, we need a target element to search for.

The code in Listing 4 below extracts a value near the center of the ArrayList object using an index to find and extract the value.  This is a very fast operation on a List object.  This value is saved in testVal to be used later for test purposes.

Note that the reference to the ArrayList object was saved as type Collection in Listing 3 above.

Note also that it was necessary to cast that reference to type List in Listing 4 below in order to invoke the get method on the reference.  This is because the Collection interface does not declare a method named get.  Rather, the get method is added to the List interface to define a more specialized form of collection.
 
    Object testVal = 
        ((List)aList).get(size/2);

Listing 4

The code in Listing 5 below invokes the contains method to search for the test value in each of the collections.  It uses the system clock to measure the time required to find the element in each case.  I will assume that you understand how to use the Date class for this purpose, and won't provide a detailed explanation.
 
    long start = new Date().getTime();
    boolean found = 
               aList.contains(testVal);
    long stop = new Date().getTime();
    System.out.println(
         found + " " + (stop - start));
    
    start = new Date().getTime();
    for(int x = 0; x < 10000; x++){
      found = aTree.contains(testVal);
    }//end for loop
    stop = new Date().getTime();
    System.out.println(
         found + " " + (stop - start));

  }//end doIt()

Listing 5

The output from the program was:

true 171
true 30

The first line of output applies to the ArrayList object, and the second line applies to the TreeSet object.

As we would expect, the test value was successfully found in both cases; hence the display of true in both cases.

Time required to search ArrayList

The output indicates that approximately 171 milliseconds were required to find the test value in the ArrayList object.

Time required to search TreeSet object

On the other hand, the time required to find the test value in the TreeSet object was so small that it wasn't even measurable within the granularity of the system clock (other experiments have caused me to believe that the granularity of the system clock on this machine is at least ten milliseconds).  Hence, the original reported time required to find the test value in the TreeSet object was zero.

In order to get a measurable time value to search the TreeSet object, I had to wrap the invocation of the contains method in a for-loop and search for the same value 10,000 times in succession.  Thus, the time required to find the test value in the TreeSet object was approximately 0.0030 milliseconds as compared to 171 milliseconds for the ArrayList object.

(I'll let you do the arithmetic to see if this makes sense in terms of the expected time cost to search the two different types of objects.  Don't forget the extra overhead of the for-loop.)

Different implementations

This is a graphic demonstration that even though both objects can be treated as type Collection, and the contains method can be invoked on either object in a polymorphic manner, the actual implementations of the two objects and the implementations of the contains methods in those two objects are different.

Each type of collection has advantages and disadvantages, depending on your needs.

Polymorphic behavior applies

The important point is that if you receive a reference to the collection object as type Collection, you can invoke the contains method on that reference without regard to the underlying structure of the collection object.  This is because polymorphic behavior applies.

Very briefly, polymorphic behavior means that the actual method that is executed is the appropriate method for that type of object regardless of the type of the reference to the object.  This is one of the great advantages of using the Java Collections Framework and passing collection objects among methods as interface types.

Sorting algorithms

Some of the implementations of the Java Collection Framework maintain their elements in a random order, and other implementations maintain their elements in a sorted order.  Thus, the framework also provides sorting algorithms.  However, the sorting algorithms used to maintain the order of the collections are not exposed in the way that the search algorithm is exposed (via the contains() method). Rather, the sorting algorithms are implicit in those implementations that need them, and are absent from those implementations that don't need them.

Now for a little quiz

Let's see if you are still awake.  Select the words in one pair of parentheses in the following statement that causes the statement to be true.

The interfaces in the Collections Framework make it possible to manipulate the contents of collections in a manner that is (dependent on) (independent of) the underlying implementation of each collection.

And the answer is ...

The interfaces in the Collections Framework make it possible to manipulate the contents of collections in a manner that is independent of the underlying implementation of each collection.  That is the beauty of basing the framework on interfaces that declare polymorphic methods.

I placed this question here to drive home the point that the methods declared in the Collection interface can be invoked on collection objects in a polymorphic manner.

That is to say, as a user of an object instantiated from a class that properly implements the Collection interface, you can invoke the methods declared in that interface on a reference to the object and be confident that the actual method that is invoked will be the version that is appropriate for the class from which the object was instantiated.  This is polymorphic behavior.

In the event that you need to invoke a method that is not declared in the Collection interface (such as the get() method in Listing 4 above), you can pass the reference as one of the more specialized sub-interfaces of Collection, such as Set.

Benefits of using the Collections Framework

The Java Tutorial from Sun lists and explains the following benefits of using the Java Collections Framework.

For a detailed explanation of these benefits, I am simply going to refer you directly to this excellent online book.

Summary

So, let's recap some of what we have learned in this and the previous lessons.

The core collection interfaces in the Collections Framework are shown in Listing 6.
 
  • Collection
    • Set
      • SortedSet
    • List
  • Map
    • SortedMap


Listing 6

The basic purpose of the core collection interfaces in the Java Collections Framework is to allow collections to be manipulated without regard for how the collections are implemented, provided of course that the implementations comply with the contracts.

The framework provides the following nine concrete implementations (classes) of the interfaces shown in Listing 1:

For example, the classes TreeSet and ArrayList are concrete implementations of the Collection interface as shown in the above list.  (Actually, they are concrete implementations of sub-interfaces of Collection.  JDK 1.3 doesn't provide any direct implementations of Collection).

A collection object instantiated from the class TreeSet and a collection object instantiated from the class ArrayList can each be viewed as being of the interface type Collection.

Methods having the same signatures can be used to manipulate either collection with confidence that the behavior of the method will be appropriate for the actual type of collection involved.

The framework also provides the following incomplete implementations of the core interfaces:

The purpose of these implementations is to provide you with a starting point for defining your own concrete implementations for more specialized collections.

What's Next?

In the next lesson, I will dig a little deeper into the details of the Java Collections Framework.  I will begin the next lesson with a sample program that illustrates the basic purpose of the core collection interfaces, which is to allow collections to be manipulated without regard for how the collections are implemented, as long as the implementation meets the contract of the interface.


Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin.richard@iname.com

-end-