... in Java by Richard G Baldwin

Richard G Baldwin (512) 223-4758, baldwin@austin.cc.tx.us, http://www2.austin.cc.tx.us/baldwin/

Network Programming - General Information

Java Programming, Lesson # 550, Revised 02/20/98.

Preface
Introduction
Background Information

Communication Protocol
Network Layers
Clients and Servers
IP, TCP, and UDP

IP Addresses
Domain Names
What is Your IP Address
Ports
Firewalls
Proxy Servers
Standards and Protocols
URL

Socket Classes and URL Class

Socket Programming
URL Programming

Preface

Students in Prof. Baldwin's Advanced Java Programming classes at ACC are responsible for knowing and understanding all of the material in this lesson.

This lesson, and the next several lessons will concentrate on network programming.

Introduction

One of the Java books that I have read recently makes the following analogy (or one very similar). Just because you may know how to speak conversational French doesn't mean that you know how to interpret an autopsy report written in French. In order to interpret the autopsy report, you must also know a good deal about the meaning of the medical terms used in such reports.

A similar situation exist for networking. It isn't very difficult to learn how to use the Java language to implement some network operations. However, in order to achieve much depth in this area, you probably also need to know something about the many other technical aspects of networking.

Many good books have been written on the technical details of networking and your are referred to one or more of those books to gain an in-depth knowledge of networking. In particular, I would refer you to Java Network Programming by Elliotte Rusty Harold.

In addition there are many other books that contain excellent sections on network programming. I would recommend that you take a look at the following:

Exploring Java by Patrick Niemeyer & Joshua Peck

Just Java 1.1 and Beyond, Third Edition by Peter van der Linden

Java Primer Plus by Tyma, Torok, and Downing

Java How to Program by Deitel and Deitel

For the most part, this and the next few lessons will be restricted to how you can use the programming capabilities of Java to write and execute network programs and won't attempt to go into overall network programming in any depth. However, a minimal amount of background information will be required, so we will attempt to provide that background in this lesson. Subsequent lessons will use this background along with the network programming capabilities of Java to write some simple, but interesting networking programs.

Background Information

For our purposes, a network is a group of computers and other devices that are connected in some fashion so as to be able to exchange data.

Each of the devices on the network can be thought of as a node, and each node has a unique address. The manner in which addresses are assigned will vary from one type of network to another, but in all cases, the address of each device must be unique so as to distinguish it from the other devices.

Addresses are numeric quantities that are easy for computers to work with, but are not easy for humans to remember. Therefore, some networks also provide names that humans can more easily remember than numbers.

Modern networks transfer data using a concept known as packet switching. This means that the data are encapsulated into packets which are transferred from the source to the destination. At the destination, it is necessary to extract the data from one or more packets and use it to reconstruct the original message.

Communication Protocol

In order for two or more computers connected to a network to be able to exchange data in an orderly manner, they must adhere to a mutually acceptable communication protocol. The protocol defines the rules by which they communicate.

Teaching your children to say please and thank you involves teaching them something about a protocol. If they occasionally forget to say please, however, they will probably get the cookie anyway.

If a computer protocol requires the participating computers to say please, and they forget to say please, they probably won't get the cookie.

There are many protocols available. For example, the HTTP protocol defines how web browsers and servers communicate and the SMTP protocol defines how email is transferred (we will write programs that implement part of the HTTP and SMTP protocols).

Note here that I have been discussing application protocols that operate at the surface level. We will also be making mention of lower-level protocols that operate below the application level. Fortunately, as high-level Java programmers, we don't have to be too concerned about the lower-level protocols. We'll let the systems people worry about them.

Network Layers

Networks are logically separated into layers ranging from the Application Layer at the top to the Physical Layer at the bottom. The technical details of network layering are beyond the scope of this lesson. Fortunately, you will be able to write useful network programs using Java without understanding the details of network layering.

The Application Layer is the layer that delivers data to the user. The layers below that are involved with getting data from the Application Layer at one end of the conversation to the Application Layer at the other end. For the most part, we will be concerned only with the Application Layer.

Clients and Servers

In these lessons, we will be concerned with networked communications that involve client computers and a server computers. How do we know which is which? For the purposes of our studies, it will be sufficient to say that the client always initiates the conversation, and the server waits and listens for a client to initiate a conversation.

IP, TCP, and UDP

We need to know something about the following acronyms:

IP

IP, which stands for Internet Protocol, is the protocol that will be involved below the Application Layer to move our data between a client and a server. Beyond knowing that it exists, we probably don't need to concern ourselves with the fact that IP is being used

In fact, in some situations, some other protocol may be used to move our data between a client and a server. As long as it works, we really don't care too much.

In a nutshell, IP is a network protocol that moves packets of data from a source to a destination. As the name implies, this is the protocol normally used on the Internet.

TCP

It is sometimes important to be able to have confidence that all packets that make up a message arrive at the destination undamaged and in proper order.

The Transmission Control Protocol (TCP) was added to IP to give each end of a connection the ability to acknowledge receipt of IP packets and to request retransmission of lost packets. Also TCP makes it possible to put the packets back together at the destination in the same order that they were sent.

Therefore, you will often hear people using both acronyms in the same breath, as in TCP/IP. The two work together to provide a reliable method of encapsulating a message into data packets, sending the packets to a destination, and reconstructing the message from the packets at the destination.

UDP

Sometimes it may not be important that all the packets arrive at the destination or that they arrive in the proper order. Further, sometimes, you may not want to incur the time delays and overhead cost associated with those guarantees.

For example, if one computer is sending date and time information to another computer every 100 milliseconds, and the data in the packets is displayed on a digital clock as it is received, you might prefer that each packet make the trip as quickly as possible even if that means that occasionally a packet will be lost or damaged.

The User Datagram Protocol (UDP) is available to support this type of operation. UDP is often referred to as an unreliable protocol because there is no guarantee that a series of packets will arrive in the right order, or that they will arrive at all.

As Java programmers, we have the choice of TCP or UDP, and we need to know enough about the characteristics of each to be able to make informed choices between them.

IP Addresses

We don't really need to know very much about IP to be able to use it, but we do need to know about the addressing scheme used in IP.

Every computer attached to an IP network has a unique four-byte (32-bit) address.

Thirty-two bits are sufficient to define a large number of unique addresses, but the manner in which addresses are allocated is wasteful, and many of the addresses that have been allocated are not being used.

Efforts are underway to expand the number of possible unique addresses to a much larger number. The planned number is the number of unique addresses that can be represented with a 128-bit address. Although I haven't taken the time to calculate the figure, Elliotte Rusty Harold reports it to be 1.6043703E32 in his book entitled Java Network Programming.

For human consumption, we usually convert the value of each of the four bytes to an unsigned decimal value and display them connected by periods to make them easier to remember. For example, as near as I can tell, as of this writing, the IP address of www.javasoft.com is 204.160.241.98.

Domain Names

What do I mean by www.javasoft.com?

Even though we can do some tricks to make the numeric IP addresses easier to remember, humans don't do a very good job of remembering long strings of numbers. Humans remember words and names better. Therefore, most IP addresses have a corresponding name known as a domain name. The domain name for the IP address 204.160.241.98 is www.javasoft.com.

The Domain Name System (DNS) was developed to translate between IP addresses and domain names. Whenever you log your browser onto the internet and attempt to connect to a server using its domain name, the browser first communicates with a DNS server to learn the corresponding numeric IP address. The numeric IP address (and not the domain name) is encapsulated into the data packets and used by the internet protocol to route those packets from the source to the destination.

We will learn how to use the Java InetAddress class to find the domain name corresponding to an IP address, and to find the IP address corresponding to a domain name.

What is Your IP Address

Do you have an IP address and a domain name?

If (like me) you use a commercial Internet Service Provider (ISP), you really don't have a fixed IP address or a fixed domain name. Rather, the ISP has a block of IP addresses reserved. When you dial up the ISP and log onto the Internet, the ISP temporarily assigns an IP address to you for the duration of that connection. If you disconnect and reconnect, chances are good that you will get a different IP address for that second session.

Ports

Each server computer that you may connect to will be logically organized into ports. These are not physical ports in the sense of the printer port on the back of your computer. Rather, they are simply logical sub-addresses which you provide to the operating system on the server so that the operating system can cause the appropriate server software to "answer the call." We will write a simple server software package that will service several different ports on independent threads.

One of my Java books refers to the IP address as being analogous to the telephone number of a company and the port to be analogous to the employee's telephone extension within that company.

Theoretically, there are 65,535 available ports. Port numbers between 1 and 1023 are predefined to be used for certain standard services. For example, if you want to connect with server software that communicates using the HTTP protocol, you would normally connect to port 80 on the server of interest.

Similarly, if you want to connect to a port that will tell you the time, you should connect to port 13. If you want to connect to a port that will simply echo whatever you send to it (usually for test purposes), you should connect to port 7. We will write Java applications that connect to all of these ports

In the interest of brevity, I am not going to attempt to provide a list of ports. However, you should be able to find all the information you might need about port numbers and the services they support by starting your favorite WWW search engine and searching for "well known ports".

Firewalls

You may have heard about firewalls. A firewall is the common name given to the equipment and associated software that is used to insulate the network inside of a company from the Internet at large outside the company. Typically, the firewall will restrict the degree to which computers inside the company can communicate with the Internet for security and other reasons.

Proxy Servers

You may also have heard about proxy servers. A proxy server acts as an interface between computers inside the company and the Internet at large.

Oftentimes the proxy server will have the ability to cache web pages for limited periods of time. For example, if ten people inside the company attempt to connect to the same Internet server and download the same web page within a (hopefully) short period of time, that page may be saved on the proxy server on the first attempt and then delivered to the next nine people without re-acquiring it from the outside web server. This can significantly improve delivery time and reduce network traffic into and out of the company. It can also result in the delivery of stale pages in some cases.

Standards and Protocols

At some point, you may be interested in obtaining technical information about Internet standards and protocol specifications. A good place to start looking for such information is http://ds.internic.net. Another good place to look is http://www.w3.org/pub/WWW/Protocols/.

These two URLs will probably provide you with enough reading material to keep you busy for awhile, and will also probably provide links where you can obtain additional information.

URL

URL is an acronym for Uniform Resource Locator (it is also the name of a class in Java). A URL is a pointer to a particular resource at a particular location on the Internet. A URL specifies the following:

protocol used to access the server (such as http),
name of the server,
port on the server (optional)
path, and
name of a specific file on the server (sometimes optional)
anchor or reference point within the file (optional)

Sometimes the name of the file can be omitted, in which case an HTTP browser will usually append the file name index.html to the specified path and try to load that file. For example, as of this writing, you can connect to my home page on the HTTP server at Austin Community College using either of the following URLs.

http://www2.austin.cc.tx.us/baldwin/index.html

http://www2.austin.cc.tx.us/baldwin/

In addition to specifying the name of the file of interest, it is also sometimes possible to specify an anchor or reference that has been established inside the file.

For example, as of this writing, the file named index.html on my web page at the college contains several anchors inside the file. One of those anchors is identified as KnockKnock.

If you would like to cause your browser to download the file named index.html and then go directly to the anchor where the "KnockKnock" applet is located in the file, point your browser to the following URL:

http://www2.austin.cc.tx.us/baldwin/index.html#KnockKnock
(Please be aware that I occasionally make changes to the material on my web page so you may get different behavior if you perform the above experiments at some time in the future.)

The general syntax of a URL is:

protocol://hostname[:port]/path/filename#ref
The port is optional, and is not normally required if you are accessing a server that provides the required service on a standard port. The browser (or other software being used to connect) should know which port supports the specified protocol and should connect to that port by default.

You could fill in the optional port number and use the following URL to access the KnockKnock reference on my page on port 80 (if you want to do some extra typing).

http://www2.austin.cc.tx.us:80/baldwin/index.html#KnockKnock
However, if you were to change the 80 to a 25, you would not be able to connect and successfully communicate with the server because the server does not support the HTTP protocol on port 25.

Socket Classes and URL Class

Java provides two different approaches for doing network programming, as least insofar as the web is concerned. The two approaches are associated with

Socket, DatagramSocket, and ServerSocket classes
the URL, URLEncoder, and URLConnection classes.

Socket Programming

Socket programming primarily makes use of two socket classes named Socket and DatagramSocket along with the ServerSocket class. The first two socket classes represent TCP and UDP communications respectively.

Generally, the two socket classes are used to implement both clients and servers , while the ServerSocket class is only used to implement servers. We will see numerous examples of socket programming in this series of lessons.

Socket programming provides a low-level approach by which you can connect two computers for the exchange of data. One of those is generally considered to be the client while the other is considered to be the server.

Although the distinction between client and server is becoming less clear each day, there is one fundamental distinction that is inherent in the Java programming language. The client initiates conversations with servers. Servers block and wait for a client to initiate a conversation.

The governing application-level protocol will determine what happens after the connection is made and the conversation has begun. The fact that the two computers can connect doesn't necessarily mean that they can communicate. In order to communicate, they must implement some mutually acceptable application protocol

For example, the fact that I can dial a telephone number for a telephone located in France doesn't mean that I can communicate with the person who answers the phone. I don't know how to speak the French language. Unless the person who answers the phone speaks English, very little communication is likely to take place.

Socket programming has been around for quite a while in the Unix world. Java simply makes it easier by encapsulating much of the complexity of socket programming into classes, and allowing you to approach the task on an object-oriented basis.

According to some authors, some of the generality and capability that Unix socket programmers have enjoyed has been lost in the encapsulation process.

Basically, socket programming makes it possible for you to cause data to flow in a full-duplex mode between a client and a server. This data flow can be viewed in almost exactly the same way that we view data flow to and from a disk: as a stream of bytes.

As with most stream data processing, the system is responsible for moving the bytes from the source to the destination. It is the responsibility of the programmer to assign meaning to those bytes.

Assigning meaning takes on a special significance for socket programming. In particular, as mentioned above, it is the responsibility of the programmer to implement a mutually acceptable communication protocol at the application level to cause the data to flow in an orderly manner.

An application protocol is a set of rules by which the programs in the two computers can carry on a conversation and transfer data in the process. For example, we will write a program using the SMTP mail protocol to send an email message to someone.

We will also write a program that implements a very abbreviated form of the HTTP protocol to download web pages from a server and display them.

We will also write a program that functions as an (abbreviated) HTTP server to deliver web pages to a client and also supports the echo protocol for both TCP and UDP programming.

Each of these programs will involve adherence to a fairly simple protocol (at least the part that we implement will be fairly simple).

In addition, we will also write a program that obtains the date and time from another computer. In this case, the protocol will be about as simple as it can possibly be. In this case, the client will simply make the connection and listen for a string containing the date and time. This will be sort of like dialing the local time service, except that we won't have to listen to an advertisement before getting the time.

The bottom line is that with socket programming, it is easy to write code that will cause a stream of bytes to flow in both directions between a client and a server. This is no more difficult than causing a stream of bytes to flow in both directions between memory and a file on a disk.

However, getting the bytes to flow is the easy part. Beyond that, you must do all of the programming to implement an application protocol that is understood by both the client and the server.

URL Programming

URL programming occurs at a higher level than socket programming, and in theory represents a very powerful idea.

In theory, by using the URL class, you can open a connection to a resource on the web, specified by a URL object, and simply invoke the getContent() method on that URL object. The content of the resource will then be magically downloaded and will appear as an object on the client machine, event if it requires an application protocol that didn't exist when you wrote the program, and contains content that you didn't understand when you wrote the program.

This description may be a bit of an overstatement, but it is pretty close to the claims being made. This is a powerful idea, which may or may not bear fruit in the future.

If fully implemented by browsers, the idea means that you can place new and unusual material on a web site along with special content handlers and protocol handlers. Then a cooperating browser will use those special handlers to move that material from the web site to the client and interpret its content once it get there without a requirement to install software (such as plug-ins) on the client computer on a permanent basis.

Unfortunately, this is what Peter van der Linden has to say about this topic in his excellent book entitled Just Java 1.1 and Beyond, Third Edition (emphasis added by baldwin).

"If a browser doesn't recognize a media type, it should be able to download the code to process it from the same place it got the file. If they ever get this working, it will be ... a good thing."
Will they ever get it working? I don't know. If it depends on cooperation among all the major players, including the major browser vendors - probably not. Therefore, I don't plan to spend much time on the topic of protocol and content handlers until I see some evidence that it is working to such an extent that it is practically useful.

That is not to say that you couldn't use the capability right now if you were developing an intranet and wanted the clients to have access to new and unusual content. It would be necessary for you to provide the appropriate protocol and content handlers, and it would probably be necessary for the clients to run Java applications written by you instead of standard browsers to access the data.

Also, the URL class provides an alternative way to connect one computer to another and transfer data on a stream basis, so we will see some examples of retrieving data from a server by obtaining a URL connection, and then opening and servicing I/O streams between the client and the server. We will see some sample programs that make use of this technique, but we will also see that it is redundant with the socket programming approach.

-end-