The material in this lesson is extremely important. However, there is simply too much material to be covered in detail during lecture periods. Therefore, students in Prof. Baldwin's Advanced Java Programming classes at ACC will be responsible for studying this material on their own, and bringing any questions regarding the material to class for discussion.
A URL is a pointer to a particular resource at a particular location on the Internet. A URL specifies the following:
In addition to specifying the name of the file of interest, it is also sometimes possible to specify an anchor or reference that has been established inside the file. An example of how to take advantage of this capability was provided in an earlier lesson.
The general syntax of a URL is:
protocol://hostname[:port]/path/filename#ref |
Java provides two different ways to do network programming. The two ways are associated with several socket classes and several URL classes. The socket classes will be the subject of subsequent lessons. This lesson is concerned primarily with the URL class.
URL programming occurs at a higher level than socket programming, and in theory represents some very powerful ideas. The powerful ideas represented by the advanced features of the URL class require an understanding of the development of protocol handlers and content handlers.
In theory, you can open a connection to a resource on the web, specified as a URL object, and simply invoke the getContent() method on that URL object. The content of the resource will then be downloaded and will appear as an object on the client machine, even if it requires an application protocol that didn't exist when you wrote the program, and contains content that you didn't understand when you wrote the program.
This description may be a bit of an overstatement, but it is pretty close to the claims being made. This is a powerful idea, which may or may not bear fruit in the future.
A previous lesson discussed this concept in more detail, and also discussed the extent to which it has or probably will bear fruit. We won't repeat that discussion here. Suffice it to say that we may need to see more progress in the area of cooperation between the major players in the Java arena before these concepts bear much fruit for the general internet-using public.
On the other hand, it should be possible to use some of these concepts with specialized intranet application programs (rather than browsers and applets). If these advanced concepts are of interest, take a look at a good book on network programming, such as Java Network Programming by Elliotte Rusty Harold to learn how to write protocol handlers and content handlers.
In addition to supporting the advanced concepts mentioned to above, the URL class also provides a relatively mundane alternative way to connect one computer to another and transfer data on a stream basis. This capability is generally redundant with the capability provided by sockets. This lesson is primarily based on this capability.
The program also illustrates the use of the URLEncoder class to convert a string containing spaces and other such characters into a x-www-form-urlencoded string format.
The program was tested using JDK 1.1.3 under Win95.
The output from the program is shown below. You should view this output
while reviewing the code in the program.
Use simple string constructor for host URL http www2.austin.cc.tx.us -1 / null http://www2.austin.cc.tx.us/ Use simple string constructor for host plus file http www2.austin.cc.tx.us -1 /baldwin null http://www2.austin.cc.tx.us/baldwin Use strings for protocol, host, and file http www2.austin.cc.tx.us -1 /baldwin null http://www2.austin.cc.tx.us/baldwin Use strings for protocol host, and file and int for port http www2.austin.cc.tx.us 80 /baldwin null http://www2.austin.cc.tx.us:80/baldwin Construct absolute URL from host URL and relative URL http www2.austin.cc.tx.us -1 /baldwin/Index.html null http://www2.austin.cc.tx.us/baldwin/Index.html Now use URLEncoder to create x-www-form-urlencoded String http%3a%2f%2fspace+.tilde%7e.plus%2b.com |
As you can see, there is a method available for extracting each of the
parts of a URL that were discussed earlier. There is one exception
to this statement. The getFile() method returns the path and
the file name combined. The getRef() method returns the information
referred to as an anchor or reference earlier.
void display(URL url){//method to display parts of URL System.out.print(url.getProtocol() + " "); System.out.print(url.getHost() + " "); System.out.print(url.getPort() + " "); System.out.print(url.getFile() + " "); System.out.println(url.getRef()); //Now display entire URL as a string. System.out.println(url.toString()); System.out.println(); }//end display |
The first fragment illustrates the instantiation of a URL object using the version of the constructor that expects to receive the URL in string format. I have removed the exception handling code from these code fragments for brevity. You can view the exception handling code in the program listing in the next section.
All of the remaining code shown in these fragments is contained in the main() method of the class.
This fragment begins by instantiating an object of the controlling class
that can be used to access the display() method. Then it instantiates
a new URL object using the string-parameter version of the constructor
and passes that object to the display() method. As described above,
the display() method accesses the component parts of the URL
object and displays them separated by a space. The output from this
code fragment was:
http www2.austin.cc.tx.us -1 / null http://www2.austin.cc.tx.us/ |
Url002 obj = new Url002(); System.out.println( "Use simple string constructor for host URL"); obj.display(new URL("http://www2.austin.cc.tx.us")); |
This code fragment uses a URL constructor that requires two parameters:
a URL object and a String object. Here is part of the description
of this constructor as extracted directly from the documentation for JDK
1.1.3.
public URL(URL context, String spec) throws MalformedURLException
Creates a URL by parsing the specification spec within a specified context. If the context argument is not null and the spec argument is a partial URL specification, then any of the strings missing components are inherited from the context argument. |
Assume, for example, that you have written your own method to display HTML files the way that they are displayed by a browser rather than simply as a a text file. Such files often contain links to relative URL's. In such a case, the link would be provided simply as a path and file name under the assumption that the path and file can be found relative to the base URL containing the HTML file.
According to Java Network Programming by Elliotte Rusty Harold,
"http://www2.austin.cc.tx.us/baldwin/hello.html"
It then uses uses the version of the constructor currently under discussion to combine that base URL object with a relative URL given by
"/baldwin/Index.html"
to produce the following output URL object.
http www2.austin.cc.tx.us -1 /baldwin/Index.html null http://www2.austin.cc.tx.us/baldwin/Index.html |
URL baseURL = new URL( "http://www2.austin.cc.tx.us/baldwin/hello.html"); obj.display(new URL(baseURL,"/baldwin/Index.html")); |
If you need to create a URL object using a URL string that has these problems, you should first use the encode() method of the URLEncoder class to convert it into a proper URL string.
This class provides a static method named encode() that encodes
a string representation of a URL into a format called "x-www-form-urlencoded"
format according to the following rules. This method returns a String
object.
To convert a String, each character is examined in turn:
The ASCII characters 'a' through 'z', 'A' through 'Z', and '0' through '9' remain the same. The space character ' ' is converted into a plus sign '+'. All other characters are converted into the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the lower 8-bits of the character. |
What you see here is a representation of the URL where the special
characters are represented by their hex value preceded by a percent character,
except that a space character is represented by a plus character and a
plus character is represented by its hexadecimal value, %2b.
System.out.println(URLEncoder.encode( "http://space .tilde~.plus+.com")); |
/*File Url002.java Copyright 1998, R.G.Baldwin Revised 01/19/98 This program exercises all four of the constructors and six of the methods of the URL class. The program also illustrates the use of the URLEncoder class to convert a string containing spaces and other such characters into x-www-form-urlencoded string format. Tested using JDK 1.1.3 under Win95. Output from the program is shown below. Use simple string constructor for host URL http www2.austin.cc.tx.us -1 / null http://www2.austin.cc.tx.us/ Use simple string constructor for host plus file http www2.austin.cc.tx.us -1 /baldwin null http://www2.austin.cc.tx.us/baldwin Use strings for protocol, host, and file http www2.austin.cc.tx.us -1 /baldwin null http://www2.austin.cc.tx.us/baldwin Use strings for protocol host, and file and int for port http www2.austin.cc.tx.us 80 /baldwin null http://www2.austin.cc.tx.us:80/baldwin Construct absolute URL from host URL and relative URL http www2.austin.cc.tx.us -1 /baldwin/Index.html null http://www2.austin.cc.tx.us/baldwin/Index.html Now use URLEncoder to create x-www-form-urlencoded String http%3a%2f%2fspace+.tilde%7e.plus%2b.com **********************************************************/ import java.net.*; class Url002{ public static void main(String[] args){ Url002 obj = new Url002(); try{ System.out.println( "Use simple string constructor for host URL"); obj.display(new URL("http://www2.austin.cc.tx.us")); System.out.println("Use simple string constructor " + "for host plus file"); obj.display(new URL( "http://www2.austin.cc.tx.us/baldwin")); System.out.println( "Use strings for protocol, host, and file"); obj.display(new URL( "http","www2.austin.cc.tx.us","/baldwin")); System.out.println("Use strings for protocol " + "host, and file\n and int for port"); obj.display(new URL( "http","www2.austin.cc.tx.us",80,"/baldwin")); System.out.println("Construct absolute URL from " + "host URL and relative URL"); URL baseURL = new URL( "http://www2.austin.cc.tx.us/baldwin/hello.html"); obj.display(new URL(baseURL,"/baldwin/Index.html")); System.out.println("Now use URLEncoder to create " + "\n x-www-form-urlencoded String"); System.out.println(URLEncoder.encode( "http://space .tilde~.plus+.com")); }catch(MalformedURLException e){ System.out.println(e); }//end catch }//end main //-----------------------------------------------------// void display(URL url){//method to display parts of URL System.out.print(url.getProtocol() + " "); System.out.print(url.getHost() + " "); System.out.print(url.getPort() + " "); System.out.print(url.getFile() + " "); System.out.println(url.getRef()); //Now display entire URL as a string. System.out.println(url.toString()); System.out.println(); }//end display }//end class Url002 //=======================================================// |
Your computer must be online for this program to run properly. Otherwise, it will throw an exception of type UnknownHostException.
The program was tested using JDK 1.1.3 under Win95.
The output from the program is a display of the contents of the file named Test01.html in a raw text format. Thus, all of the HTML tags are visible.
As of 01/19/98, the output (with the insertion of manual line breaks)
was as shown below. However I deleted some of the lines in the middle of
the listing for brevity. I may modify the contents of this file from time
to time, so if you compile and run this program later, you may get different
results.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD>Note that several lines were removed from this listing for brevity. <P>This test file is used to test certain network programming applications.</P> </BODY> </HTML> |
As you saw in the previous example program, the URL class has several different constructors, each of which can create a new URL object on the basis of URL information provided as parameters to the constructor. The constructors differ in terms of how the URL information is provided.
The first code fragment illustrates the version of the constructor that accepts the URL as a string. Other versions require the individual components of the URL to be passed as individual parameters.
This code fragment will create a URL object that points to the file named Test01.html in the directory named baldwin on the server at Austin Community College where I teach.
The URL object will not contain a port specification because I didn't provide a port number. Later when we use one of the methods of the URL class along with this URL object to make a connection, the connection will, by default, be made to port 80 which is the standard port for servers which support the HTTP protocol.
In other words, when the port is not provided (the URL object
contains a port number of -1), the connection method of the URL class
will use the protocol portion of the URL to decide which port
to connect to.
URL url = new URL( "http://www2.austin.cc.tx.us/baldwin/Test01.html"); |
One of the things you can do with it is to open input and output streams that will be connected to the server software that is monitoring the port of interest. This is not particularly exciting because it essentially duplicates a capability of the socket programming classes that we will discuss later.
This code fragment opens a connection to the URL described by this URL object and returns an input stream object for reading data from the connection. This is the point where the port number defaults on the basis of the protocol specification in the URL object.
Be aware that only the boldface portion of this statement has to to do with URL processing. The remainder of the statement has to do with the more complex topic of I/O stream processing using the new reader and writer classes of JDK 1.1..
My thanks for clarifying this stream syntax go to Deitel and Deitel
and their excellent book, Java How to Program, Second Edition.
BufferedReader htmlPage = new BufferedReader(new InputStreamReader( url.openStream())); |
while((dataLine = htmlPage.readLine()) != null){ System.out.println(dataLine); }//end while loop |
/*File Url003.java Copyright 1998, R.G.Baldwin Revised 01/19/98 Illustrates connecting to a URL and reading a file from that URL as an input stream. Computer must be online for this program to run properly. Otherwise, it will throw an exception of type UnknownHostException. Tested using JDK 1.1.3 under Win95. The output from the program is a display of the contents of the file named Test01.html in a text format. As of 01/19/98, the output (with the insertion of manual line breaks) was: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE></TITLE> <META NAME="Author" CONTENT=""> <META NAME="GENERATOR" CONTENT="Mozilla/3.01Gold (Win95; I) [Netscape]"> </HEAD> <BODY> <P><B><I>Richard G Baldwin (512) 223-4758, <A HREF="mailto:baldwin@austin.cc.tx.us"> baldwin@austin.cc.tx.us</A>, <A HREF="http://www2.austin.cc.tx.us/baldwin/"> http://www2.austin.cc.tx.us/baldwin/</A></I></B></P> <H3 ALIGN=CENTER> <A HREF="http://www2.austin.cc.tx.us/baldwin/"> Test File</A></H3> <P>This test file is used to test certain network programming applications.</P> </BODY> </HTML> **********************************************************/ import java.net.*; import java.io.*; class Url003{ public static void main(String[] args){ String dataLine; try{ //Get a URL object URL url = new URL( "http://www2.austin.cc.tx.us/baldwin/Test01.html"); //Open a connection to this URL and return an // input stream for reading from the connection. BufferedReader htmlPage = new BufferedReader(new InputStreamReader( url.openStream())); //Read and display file one line at a time. while((dataLine = htmlPage.readLine()) != null){ System.out.println(dataLine); }//end while loop }//end try catch(UnknownHostException e){ System.out.println(e); System.out.println( "Must be online to run properly."); }//end catch catch(MalformedURLException e){System.out.println(e);} catch(IOException e){System.out.println(e);} }//end main }//end class Url003 //=======================================================// |