Saturday, March 10, 2007

The Java Classloader #2

Now that we understand the importance of dynamic linking, assume that you are running a dynamically linked C++ program which needs to use a class that is in an external library. The runtime locates the external library using the PATH environment variable.

Java programs do not use the PATH environment variable to locate classes, they use the CLASSPATH variable instead. CLASSPATH contains a list of directories, zip files or jar files. You are probably thinking, why Java does not use PATH? Why did it invent another environment variable, the CLASSPATH? One reason why Java did ot choose to use the PATH variable, is because PATH already contains entries needed by other executables. Using PATH may slow down the system, because it will have to process extra entries. OK, so if that convinces you, let's have a look at what the CLASSPATH contains. The CLASSPATH contains a list of directories, which contain class files that will be needed by the Java runtime (Note: Core Java classes like the String class are found automatically by the runtime. We do not need to add entries in the CLASSPATH to locate them). Zip files and jar files are also allowed because if you think about it, they are nothing but archived and compressed directories.

Consider a scenario where we are running a Java based Student Registration software. During execution some class tries to instantiate the Student class. Since a Java program is not distributed as one large executable file, the JVM needs to locate and load the Student class. The JVM iterates through all the entries in the CLASSPATH and searches for the class file in every entry. If the class cannot be found, the JVM throws a ClassNotFoundException. If a classfile exists in a directory or jar file not listed in the CLASSPATH, then as far as the JVM is concerned, it does not exist. The responsibility of locating and loading classes is assigned to the Java Classloader.

There are a few holes in this approach. Can you find them? What if an external library we are using also has a Student class. Which class will the Classloader load? Our Student class or the Student class from the external library? Is there an unambiguous way to determine the right class? Well, we asked the JVM to load a class called Student, it has no way of knowing which one of the multiple Student classes in the CLASSPATH is the right class. Clearly we must have a unique attribute that differentiates all the Student classes. This unique attribute is called a namespace which is represented as a package name in Java. Every Java class must exist in a namespace which is specified using the package keyword. Even classes that do not belong to any specific package, belong to the default package.

The code below shows how we can create a Student class which belongs to the 'edu.scit.studentreg' package.

package edu.scit.studentreg;

public class Student {

}

The use of packages differentiates this class from another class which is also called Student but belongs to the namespace 'com.oracle.studentreg'.

package com.oracle.studentreg;

public class Student {

}

Now when we want to instantiate a Student class we will instantiate it by using the fully qualified name of the class.

edu.scit.studentreg.Student s1 = new edu.scit.studentreg.Student();

or

com.oracle.studentreg.Student s2 = new com.oracle.studentreg.Student();

Because we have used the fully qualified class name, the Classloader knows exactly which Student class we are reffering to. But wait... a Student class is created in a file called Student.class. We do not use the package name in the name of the file. How does the Classloader know which class file is the right one? Even though the class file contains the package name, searching for Student.class in every directory and subdirectory in the CLASSPATH, to determine if it contains the right package name is a very time consuming process. We need a better way of organizing class files, so the JVM can locate them quickly. The convention used is to match the directory heirarchy in which a class file is put with the package name of the class. Let us understand this concept with an example we have used above. Where should we put the class edu.scit.studentreg.Student? First of all we determine a base directory in which we will put this and many other classes. If we decide to put our classes in c:\scit\classes, then we have to follow the convention relative to c:\scit\classes. We must create a directory called 'edu' (in c:\scit\classes) then we create a directory called 'scit' in 'edu' and a directory called 'studentreg' in 'scit'. The file Student.class is placed inside the 'studentreg' directory. What do you think should go in the CLASSPATH? c:\scit\classes or c:\scit\classes\edu\scit\studentreg? We put c:\scit\classes.

When the Classloader needs to locate the class edu.scit.studentreg.Student, it will look at the first entry in the classpath. Suppose it is c:\scit\classes. The JVM will now try to locate the class by zeroing in to the appropriate location based on the package name of the class. First it looks for a directory edu in c:\scit\classes, if it finds the directory then it looks for scit inside edu and studentreg inside scit. Once in the studentreg directory it looks for a file called Student.class. If found the file is loaded, othewise the next entry in the classpath is searched. If the file is not found in any of the directories specified in the classpath, then a ClassNotFoundExceptin is thrown.

This is how the Classloader locates and loads class files.

Even if you have understood how the Classloader loactes class files, I would like to suggest that you practice a couple of times of understand the concept better. To best understand these concepts, I strongly recommend that you work with Notepad and the command line.

  1. Create any class and associate it with the package example.code , and compile the class. Ensure that the Java as well as the class file exist in a proper directory heirarchy. Add the appropriate directory to the CLASSPATH and run the program from a totally different directory.
  2. Change the location of the class file and run the program.
  3. Remove the directory from the CLASSPATH, and run the program from the parent directory of the directory which you had put in your CLASSPATH. Does the program run? If not, try removing all CLASSPATH entries and run the program again.

Please note, while removing CLASSPATH entries, do not remove them from the environment of the operating system. Each command prompt gets the CLASSPATH that is set for that user, however, we can modify the CLASSPATH of only that command prompt without affecting the OS's environment.

In the next post, we will look at some more nuances of the Classloader.



Notes: This text was originally posted on my earlier blog at http://www.adaptivelearningonline.net

No comments: