Richard G Baldwin (512) 223-4758, baldwin@austin.cc.tx.us, http://www2.austin.cc.tx.us/baldwin/

Network Programming - The URL Class and the URLEncoder Class

Java Programming, Lesson # 554, Revised 02/20/98.

Preface

Students in Prof. Baldwin's Advanced Java Programming classes at ACC are responsible for knowing and understanding all of the material in this lesson.

The material in this lesson is extremely important. However, there is simply too much material to be covered in detail during lecture periods. Therefore, students in Prof. Baldwin's Advanced Java Programming classes at ACC will be responsible for studying this material on their own, and bringing any questions regarding the material to class for discussion.

Introduction

URL is an acronym for Uniform Resource Locator. It is also the name of a class in Java which is the primary subject for this lesson.

A URL is a pointer to a particular resource at a particular location on the Internet. A URL specifies the following:

Sometimes the name of the file can be omitted, in which case an HTTP server will usually append the file name index.html to the specified path and try to load that file. For example, we will write a simple HTTP server in a subsequent lesson that will attempt to deliver a file named index.html if the name of the file is omitted from the URL.

In addition to specifying the name of the file of interest, it is also sometimes possible to specify an anchor or reference that has been established inside the file. An example of how to take advantage of this capability was provided in an earlier lesson.

The general syntax of a URL is:
 
protocol://hostname[:port]/path/filename#ref
The port is optional, and is not normally required if you are accessing a server that provides the required service on a standard port.

Java provides two different ways to do network programming. The two ways are associated with several socket classes and several URL classes. The socket classes will be the subject of subsequent lessons. This lesson is concerned primarily with the URL class.

URL programming occurs at a higher level than socket programming, and in theory represents some very powerful ideas. The powerful ideas represented by the advanced features of the URL class require an understanding of the development of protocol handlers and content handlers.

In theory, you can open a connection to a resource on the web, specified as a URL object, and simply invoke the getContent() method on that URL object. The content of the resource will then be downloaded and will appear as an object on the client machine, even if it requires an application protocol that didn't exist when you wrote the program, and contains content that you didn't understand when you wrote the program.

This description may be a bit of an overstatement, but it is pretty close to the claims being made. This is a powerful idea, which may or may not bear fruit in the future.

A previous lesson discussed this concept in more detail, and also discussed the extent to which it has or probably will bear fruit. We won't repeat that discussion here. Suffice it to say that we may need to see more progress in the area of cooperation between the major players in the Java arena before these concepts bear much fruit for the general internet-using public.

On the other hand, it should be possible to use some of these concepts with specialized intranet application programs (rather than browsers and applets). If these advanced concepts are of interest, take a look at a good book on network programming, such as Java Network Programming by Elliotte Rusty Harold to learn how to write protocol handlers and content handlers.

In addition to supporting the advanced concepts mentioned to above, the URL class also provides a relatively mundane alternative way to connect one computer to another and transfer data on a stream basis. This capability is generally redundant with the capability provided by sockets. This lesson is primarily based on this capability.

First Sample Program

This program exercises four of the constructors and six of the methods of the URL class.

The program also illustrates the use of the URLEncoder class to convert a string containing spaces and other such characters into a x-www-form-urlencoded string format.

The program was tested using JDK 1.1.3 under Win95.

The output from the program is shown below. You should view this output while reviewing the code in the program.
 
Use simple string constructor for host URL
http www2.austin.cc.tx.us -1 / null
http://www2.austin.cc.tx.us/

Use simple string constructor for host plus file
http www2.austin.cc.tx.us -1 /baldwin null
http://www2.austin.cc.tx.us/baldwin

Use strings for protocol, host, and file
http www2.austin.cc.tx.us -1 /baldwin null
http://www2.austin.cc.tx.us/baldwin

Use strings for protocol host, and file
 and int for port
http www2.austin.cc.tx.us 80 /baldwin null
http://www2.austin.cc.tx.us:80/baldwin

Construct absolute URL from host URL and relative URL
http www2.austin.cc.tx.us -1 /baldwin/Index.html null
http://www2.austin.cc.tx.us/baldwin/Index.html

Now use URLEncoder to create 
 x-www-form-urlencoded String
http%3a%2f%2fspace+.tilde%7e.plus%2b.com

Code Fragments from First Sample Program

The following code fragment is a method named display() that I wrote to illustrate the use of some of the methods of the URL class, and also to serve the practical needs of displaying information contained in a URL object. This method receives a URL object as a parameter and displays its component parts separated by a space. Then it uses the overridden toString() method of the URL class to display the contents of the URL object as a single String object.

As you can see, there is a method available for extracting each of the parts of a URL that were discussed earlier. There is one exception to this statement. The getFile() method returns the path and the file name combined. The getRef() method returns the information referred to as an anchor or reference earlier.
 
  void display(URL url){//method to display parts of URL
    System.out.print(url.getProtocol() + " ");
    System.out.print(url.getHost() + " ");
    System.out.print(url.getPort() + " ");
    System.out.print(url.getFile() + " ");
    System.out.println(url.getRef());
    
    //Now display entire URL as a string.
    System.out.println(url.toString());
    System.out.println();
  }//end display
Now that we know what the display() method does, we can examine the code in the main() method of the class.

The first fragment illustrates the instantiation of a URL object using the version of the constructor that expects to receive the URL in string format. I have removed the exception handling code from these code fragments for brevity. You can view the exception handling code in the program listing in the next section.

All of the remaining code shown in these fragments is contained in the main() method of the class.

This fragment begins by instantiating an object of the controlling class that can be used to access the display() method. Then it instantiates a new URL object using the string-parameter version of the constructor and passes that object to the display() method. As described above, the display() method accesses the component parts of the URL object and displays them separated by a space. The output from this code fragment was:
 
http www2.austin.cc.tx.us -1 / null
http://www2.austin.cc.tx.us/
In this case, the -1 indicates that there was no port specification, and the null indicates that there was no file name specification in the URL passed to the constructor for the URL object.
 
    Url002 obj = new Url002();
    System.out.println(
             "Use simple string constructor for host URL");
    obj.display(new URL("http://www2.austin.cc.tx.us"));
The above code fragment is followed by several other code fragments which simply construct the URL object using other versions of the constructor which require the URL information in different formats. I am going to skip that code and move down to a more interesting case as shown in the following code fragment.

This code fragment uses a URL constructor that requires two parameters: a URL object and a String object. Here is part of the description of this constructor as extracted directly from the documentation for JDK 1.1.3.
 
public URL(URL context, String spec) throws MalformedURLException 

Creates a URL by parsing the specification spec within a specified context. If the context argument is not null and the spec argument is a partial URL specification, then any of the strings missing components are inherited from the context argument.

Let me try to explain this constructor in my own words. You can use this constructor to build an absolute URL from a relativeURL.

Assume, for example, that you have written your own method to display HTML files the way that they are displayed by a browser rather than simply as a a text file. Such files often contain links to relative URL's. In such a case, the link would be provided simply as a path and file name under the assumption that the path and file can be found relative to the base URL containing the HTML file.

According to Java Network Programming by Elliotte Rusty Harold,

The following code fragment constructs a base URL object pointing to

"http://www2.austin.cc.tx.us/baldwin/hello.html"

It then uses uses the version of the constructor currently under discussion to combine that base URL object with a relative URL given by

"/baldwin/Index.html"

to produce the following output URL object.
 
http www2.austin.cc.tx.us -1 /baldwin/Index.html null
http://www2.austin.cc.tx.us/baldwin/Index.html
Hopefully this example will illustrate how the constructor can combine a base URL object with a relative URL to produce a new URL object that is an absolute pointer to the relative URL.
 
      URL baseURL = new URL(
          "http://www2.austin.cc.tx.us/baldwin/hello.html");

      obj.display(new URL(baseURL,"/baldwin/Index.html"));
There is one more issue that we need to examine: the URLEncoder class. This class is provided to help deal with problems arising from spaces, special characters, non-alphanumeric characters, etc., that some operating systems may allow in file names but which may not be allowed in a URL.

If you need to create a URL object using a URL string that has these problems, you should first use the encode() method of the URLEncoder class to convert it into a proper URL string.

This class provides a static method named encode() that encodes a string representation of a URL into a format called "x-www-form-urlencoded" format according to the following rules. This method returns a String object.
 
To convert a String, each character is examined in turn: 

The ASCII characters 'a' through 'z', 'A' through 'Z', and '0' through '9' remain the same. The space character ' ' is converted into a plus sign '+'. All other characters are converted into the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the lower 8-bits of the character.

The following code fragments encodes the string given into the following URL string: http%3a%2f%2fspace+.tilde%7e.plus%2b.com.

What you see here is a representation of the URL where the special characters are represented by their hex value preceded by a percent character, except that a space character is represented by a plus character and a plus character is represented by its hexadecimal value, %2b.
 
      System.out.println(URLEncoder.encode(
                        "http://space .tilde~.plus+.com"));
Elliotte Rusty Harold provides a URLDecoder class in his Java Network Programming book that takes a URL string in the format shown above and converts it back to its String representation. He gives us permission to include his class in our code, and I started to include it in this tutorial. However, I decided that the writing of such a class would make a good exercise for the student, so I decided to leave it out of the tutorial and let you write it on your own.

Program Listing for First Sample Program

A complete listing of the program follows.
 
/*File  Url002.java Copyright 1998, R.G.Baldwin
Revised 01/19/98

This program exercises all four of the constructors and
six of the methods of the URL class.

The program also illustrates the use of the URLEncoder
class to convert a string containing spaces and other
such characters into x-www-form-urlencoded string format.

Tested using JDK 1.1.3 under Win95.

Output from the program is shown below.

Use simple string constructor for host URL
http www2.austin.cc.tx.us -1 / null
http://www2.austin.cc.tx.us/

Use simple string constructor for host plus file
http www2.austin.cc.tx.us -1 /baldwin null
http://www2.austin.cc.tx.us/baldwin

Use strings for protocol, host, and file
http www2.austin.cc.tx.us -1 /baldwin null
http://www2.austin.cc.tx.us/baldwin

Use strings for protocol host, and file
 and int for port
http www2.austin.cc.tx.us 80 /baldwin null
http://www2.austin.cc.tx.us:80/baldwin

Construct absolute URL from host URL and relative URL
http www2.austin.cc.tx.us -1 /baldwin/Index.html null
http://www2.austin.cc.tx.us/baldwin/Index.html

Now use URLEncoder to create 
 x-www-form-urlencoded String
http%3a%2f%2fspace+.tilde%7e.plus%2b.com
**********************************************************/

import java.net.*;

class  Url002{
  public static void main(String[] args){
    Url002 obj = new Url002();
    try{
      System.out.println(
             "Use simple string constructor for host URL");
      obj.display(new URL("http://www2.austin.cc.tx.us"));
      System.out.println("Use simple string constructor " +
                                     "for host plus file");
      obj.display(new URL(
                    "http://www2.austin.cc.tx.us/baldwin"));
      System.out.println(
               "Use strings for protocol, host, and file");
      obj.display(new URL(
                 "http","www2.austin.cc.tx.us","/baldwin"));
      System.out.println("Use strings for protocol " +
                      "host, and file\n and int for port");
      obj.display(new URL(
              "http","www2.austin.cc.tx.us",80,"/baldwin"));
      System.out.println("Construct absolute URL from " +
                              "host URL and relative URL");
      URL baseURL = new URL(
          "http://www2.austin.cc.tx.us/baldwin/hello.html");
      obj.display(new URL(baseURL,"/baldwin/Index.html"));

      System.out.println("Now use URLEncoder to create " +
                       "\n x-www-form-urlencoded String");
      System.out.println(URLEncoder.encode(
                        "http://space .tilde~.plus+.com"));
    }catch(MalformedURLException e){
      System.out.println(e);
    }//end catch
  }//end main
  //-----------------------------------------------------//
  
  void display(URL url){//method to display parts of URL
    System.out.print(url.getProtocol() + " ");
    System.out.print(url.getHost() + " ");
    System.out.print(url.getPort() + " ");
    System.out.print(url.getFile() + " ");
    System.out.println(url.getRef());
    
    //Now display entire URL as a string.
    System.out.println(url.toString());
    System.out.println();
  }//end display
}//end class  Url002
//=======================================================//

Second Sample Program

This program illustrates using a URL object to connect to a URL and reading a file from that URL as an input stream. This is not a significant use of the URL class. As we will see later, we can and will do the same thing using sockets.

Your computer must be online for this program to run properly. Otherwise, it will throw an exception of type UnknownHostException.

The program was tested using JDK 1.1.3 under Win95.

The output from the program is a display of the contents of the file named Test01.html in a raw text format. Thus, all of the HTML tags are visible.

As of 01/19/98, the output (with the insertion of manual line breaks) was as shown below. However I deleted some of the lines in the middle of the listing for brevity. I may modify the contents of this file from time to time, so if you compile and run this program later, you may get different results.
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>Note that several lines were removed from this 
listing for brevity.

<P>This test file is used to test certain network 
programming applications.</P>

</BODY>
</HTML>  

Code Fragments from Second Sample Program

The entire program is contained in the main() method. I will ignore the exception handling code while discussing this program. You can view that code in the program listing in the next section.

As you saw in the previous example program, the URL class has several different constructors, each of which can create a new URL object on the basis of URL information provided as parameters to the constructor. The constructors differ in terms of how the URL information is provided.

The first code fragment illustrates the version of the constructor that accepts the URL as a string. Other versions require the individual components of the URL to be passed as individual parameters.

This code fragment will create a URL object that points to the file named Test01.html in the directory named baldwin on the server at Austin Community College where I teach.

The URL object will not contain a port specification because I didn't provide a port number. Later when we use one of the methods of the URL class along with this URL object to make a connection, the connection will, by default, be made to port 80 which is the standard port for servers which support the HTTP protocol.

In other words, when the port is not provided (the URL object contains a port number of -1), the connection method of the URL class will use the protocol portion of the URL to decide which port to connect to.
 
URL url = new URL(
         "http://www2.austin.cc.tx.us/baldwin/Test01.html");
Once you have a URL object, there are a number of things that you can do with it, some exciting, and some not so exciting.

One of the things you can do with it is to open input and output streams that will be connected to the server software that is monitoring the port of interest. This is not particularly exciting because it essentially duplicates a capability of the socket programming classes that we will discuss later.

This code fragment opens a connection to the URL described by this URL object and returns an input stream object for reading data from the connection. This is the point where the port number defaults on the basis of the protocol specification in the URL object.

Be aware that only the boldface portion of this statement has to to do with URL processing. The remainder of the statement has to do with the more complex topic of I/O stream processing using the new reader and writer classes of JDK 1.1..

My thanks for clarifying this stream syntax go to Deitel and Deitel and their excellent book, Java How to Program, Second Edition.
 
BufferedReader htmlPage = 
       new BufferedReader(new InputStreamReader(
                              url.openStream()));
The remaining code in this program is completely straightforward. Data is read from the stream one line at a time and displayed as it is read. The readLine() method returns null when there is no more data to be read from the stream and the program terminates.
 
      while((dataLine = htmlPage.readLine()) != null){
        System.out.println(dataLine);
      }//end while loop
As I mentioned earlier, I omitted the exception handling code from this discussion. There are lots of opportunity for exceptions when doing network programming, so you might want to pay attention to that part of the code in the program listing that follows.

Program Listing for Second Sample Program

A complete listing of the program follows.
 
/*File Url003.java Copyright 1998, R.G.Baldwin
Revised 01/19/98

Illustrates connecting to a URL and reading a file from
that URL as an input stream.

Computer must be online for this program to run properly.
Otherwise, it will throw an exception of type 
UnknownHostException.

Tested using JDK 1.1.3 under Win95.

The output from the program is a display of the contents
of the file named Test01.html in a text format.  As of
01/19/98, the output (with the insertion of manual line
breaks) was:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
   <TITLE></TITLE>
   <META NAME="Author" CONTENT="">
   <META NAME="GENERATOR" CONTENT="Mozilla/3.01Gold 
   (Win95; I) [Netscape]">
</HEAD>
<BODY>

<P><B><I>Richard G Baldwin (512) 223-4758, 
<A HREF="mailto:baldwin@austin.cc.tx.us">
baldwin@austin.cc.tx.us</A>,
<A HREF="http://www2.austin.cc.tx.us/baldwin/">
http://www2.austin.cc.tx.us/baldwin/</A></I></B></P>

<H3 ALIGN=CENTER>
<A HREF="http://www2.austin.cc.tx.us/baldwin/">
Test File</A></H3>

<P>This test file is used to test certain network 
programming applications.</P>

</BODY>
</HTML>  
**********************************************************/

import java.net.*;
import java.io.*;

class   Url003{
  public static void main(String[] args){
    String dataLine;
    try{
      //Get a URL object
      URL url = new URL(
         "http://www2.austin.cc.tx.us/baldwin/Test01.html");
          
      //Open a connection to this URL and return an 
      // input stream for reading from the connection.
      BufferedReader htmlPage = 
                 new BufferedReader(new InputStreamReader(
                                        url.openStream()));
                     
      //Read and display file one line at a time.
      while((dataLine = htmlPage.readLine()) != null){
        System.out.println(dataLine);
      }//end while loop
    }//end try
    catch(UnknownHostException e){
      System.out.println(e);
      System.out.println(
                        "Must be online to run properly.");
    }//end catch
    catch(MalformedURLException e){System.out.println(e);}
    catch(IOException e){System.out.println(e);}

  }//end main
}//end class Url003
//=======================================================//
-end-