Java Virtual Machine is a virtual machine, an abstract computer that has its own ISA, own memory, heap, stack etc. It is an abstract computing machine or virtual machine interface that manages system memory and drives Java code or applications in a run-time environment. JVM converts the java code into machine language and is detailed by a specification that formally describes what is required in a JVM implementation, which runs on the host Operating system and demands resources from it.
What is JVM?
A specification: The Java virtual machine is an abstract/virtual computer defined by a specification. The garbage-collection algorithm used and any internal optimization of the Java virtual machine instructions (refereeing to the translation of the algorithms into machine code) are not specified. The main reason for this action of not specifying is to not unnecessarily constrain implementers. Java virtual machine of abstract specification having some concrete implementation can run java application. The JVM is a specification and can have different implementations according to the need of the user, as long as the user adheres to the specs.
An implementation: JVM implementation is known as JRE (Java Runtime Environment) i.e. it creates a corresponding environment for the execution of code by implementing the specifications which are defined in JVM. JRE consists of java binaries and other classes to execute the program. For example, One of Oracle’s JVMs is named HotSpot, the other, inherited from BEA Systems is JRockit. Clean-room implements include OpenJ9, Kaffe and Skelmir’s CEE-J. As Oracle owns the Java trademark so, they may allow its use to certify implementation suites as fully compatible with Oracle’s specification.
Runtime Instance A runtime instance of the Java virtual machine has a set purpose to run one Java application. A runtime instance is born when the application is started. The respective runtime instance dies when the application is terminated.
What does it do?
The JVM performs the following operation:
- Loads code – Performed by the class loader
- Verifies code – Performed by the bytecode verifier
- Executes code – Performed by the runtime interpreter
- Provides runtime environment – JRE
JVM provides definitions for the:
- Memory area
- Class file format
- Register set
- Garbage-collected heap
- Fatal error reporting etc.
JVM Architecture
ClassLoader
Classloader in Java is a subsystem of JVM which is used to load class files. Whenever we execute the java program, it is loaded first by the classloader. When a .java source file is compiled it is converted into byte code as a .class file. When the respective class is used in the program, the class loader loads the respective .class file into the main memory. The class that contains the main() method is the first one to be loaded into the memory.
The three phases in the class loading process: loading, linking, and initialization.
1) Loading
Loading is a process that involves taking the binary representation or the bytecode of a class or interface with a particular name, and generating the original class or interface from that.
The three built-in class loaders available in Java are:
- Bootstrap ClassLoader: It is the first classloader, which is the superclass of the Extension classloader. The rt.jar file contains all class files of Java Standard Edition for example java.lang package classes, java.net package classes, java.util package classes, java.io package classes, java.sql package classes etc are loaded by the Bootstrap ClassLoader.
- Extension ClassLoader: It is the immediate child classloader of Bootstrap and parent classloader of System classloader. The jar files located inside $JAVA_HOME/jre/lib/ext directory are loaded by Extension ClassLoader.
- System/Application ClassLoader: It is the immediate child classloader of the Extension classloader. The classfiles from the classpath are being loaded by the respective classloader. By default, classpath is set to the current directory. By using “-cp” or “-classpath” switch the classpath can be changed. It is also known as the Application classloader.
2) Linking
When a class is loaded into the memory, it undergoes the linking process in which the respective class or interface combines with the different elements and dependencies of the program.
Linking includes the following steps:
- Verification: In this phase the structural correctness of the .class file by checked against a set of constraints or rules. When the verification of the respective .class file fails for some reason, we get a VerifyException. For instance, if the code has been built using Java 11, but is being executed on a system that has Java 8 installed, the verification phase will fail.
- Preparation: In this phase, allocation of memory for the static fields of a class or interface is taken by JVM, and JVM initializes the class or interfaces with default values. For instance, assume that you have declared the following variable in your class:
private static final boolean enabled = true;
At the time of the preparation phase, JVM allocates memory for the variable enabled and sets the value of the respective variable to the default value for a boolean, which is false.
- Resolution: In this phase, symbolic references which are used are replaced with direct references present in the runtime constant pool. For instance, if you have references to other classes or constant variables present in other classes, they are resolved in this phase and replaced with their actual references.
3) Initialization
Initialization is the process of executing the initialization method of the class or interface (known as <clinit>). The process includes calling the class’s constructor, executing the static block, and assigning values to all the static variables. This is the final stage of class loading.
For instance, when we declared the following code earlier:
private static final boolean enabled = true;
During the preparation phase, the variable enabled was set to its default value of false. The respective variable is assigned its actual value of true, in the initialization phase.
Note: Sometimes multiple threads try to initialize the same class at the same time, which can lead to concurrency issues, as JVM is multi-threaded. To ensure that the program works properly in a multi-threaded environment, threads should be handled safely.
Runtime Data Area
The six components of Runtime Data Area are as follows:
1) Class(Method) Area
Method Area is created when the JVM starts up and is common to all threads. It stores per-class structures such as the runtime constant pool, field and method data, the code for methods, the code for constructors, etc. Implementations of the JVM may choose to ignore GC as the JLS does not specify if this area needs to be garbage collected. JLS does not mandate anything related to this so this may or may not expand as per the application’s needs.
2) Run-Time Constant Pool
The JVM maintains a per-class/per-type data structure that acts as the symbol table while linking the loaded classes.
JVM throws an OutOfMemoryError, If the memory available in the method area is not sufficient for the program start up.
For instance, assume that you have the following class definition:
public class School {
private String name;
private int id;
public School(String name, int id) {
this.name = name;
this.id = id;
}
}
In this code example, the field level data are name and id and the constructor details are loaded into the method area. There is only one method area per JVM that is created on the virtual machine start-up.
3) Heap
It is the runtime data area in which objects are allocated, which is shared among all the threads and contains objects, classes’ metadata, arrays, etc. It is created when the JVM starts and is eliminated when the JVM shuts down. The amount of heap your JVM demands from the OS can be controlled using certain flags. As heap plays an important role in performance, care has to be taken not to demand too less or too much of the memory. To free up space, the Garbage collector manages this space and continually removes dead objects.
For example, assume that you are declaring:
Student student = new Student();
In this code example, an instance of Student is created which is loaded into the heap area.
There is only one heap area per JVM that is created on the virtual machine start-up.
Note: The data stored here is not thread-safe as the method and heap areas share the same memory for multiple threads.
4) Stack
Java Stack holds frames, local variables and partial results, and plays a part in method invocation and return It is local to each thread and stores parameters, local variables and return addresses during method calls. If a thread demands more stack space than is permitted a StackOverflow error can occur. If the stack is permitted to be dynamically expandable, OutOfMemory error can still occur. Each individual thread has a private JVM stack which is created at the same time as the thread. A new frame is created each time a method is invoked and the respective frame is destroyed when its method invocation completes.
The Stack Frame is divided into three parts:
- Local Variables – Each frame contains an array of variables known as its local variables. The local variables and their values are stored here. During the compile-time, the length of the respective array is determined.
- Operand Stack – Each frame contains a last-in-first-out (LIFO) stack known as its operand stack. Any intermediate operations are performed in this runtime workspace. During the compiling-time, the maximum depth of this stack is determined.
- Frame Data – The symbols corresponding to the method are stored here. In case of exceptions, the catch block information is also stored.
For example, assume that you have the given code:
double calculateNormalisedMark(List<Answer> answer) {
double mark = getMark(answer);
return normalizeMark(mark);
}
double normalizeMark(double mark) {
return (mark – minmark) / (maxmark – minmark);
}
In this code example, The Local Variables array contains variables like answer and mark. The Operand Stack contains the variables and operators required to perform the mathematical calculations of subtraction and division.
Note: It is inherently thread-safe as the Stack Area is not shared.
5) Program Counter Register
PC (program counter) register is local to each thread and contains the address of the JVM instruction that the thread is currently executing, it is like a pointer to the instruction which is currently executing in the sequence of instructions in a program.
6) Native Method Stack
When a thread calls up a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. It consists of all the native methods used in the given application. A native method can likely access the runtime data areas of the virtual machine which depends upon the native method interface, but can also do anything else it wants. To execute a Native Method Stack, we need to integrate some native program codes into Java applications.
Execution Engine
It contains:
The execution engine is the JVM component that handles the function of executing the byte code which is assigned to the run time data areas in JVM via class loader. Once the class loader has loaded the respective classes, the JVM begins executing the code in each class. Executing code involves managing access to system resources. The three main components for executing Java Classes of the execution engine
The bytecode needs to be converted into machine language instructions before executing the program. The JVM uses an interpreter or a JIT compiler for the execution engine.
- A virtual processor
- Interpreter: The interpreter reads and executes the loaded bytecode instructions line by line. The interpreter is comparatively slower, due to the line by line execution. Another disadvantage of the interpreter is that, every time a new interpretation is required when a method is called multiple times.
- Just-In-Time(JIT) compiler: JIT compiles parts of the byte code that have similar functionality at the same time, and hence reduces the amount of time needed for compilation and improving performance. When it is known that semantically the Java code has not changed, JIT stored compiled code avoids recompilation of Java programs across sessions or instances. “Compiler” refers to a translator from the instruction set of a Java virtual machine (JVM) to the instruction set of a specific CPU. The JIT compiler compiles the entire bytecode and changes it to native machine code. Performance of the system improves as native machine code is used directly for repeated method calls.
The JIT Compiler has the following components:
- Intermediate Code Generator – It generates intermediate code
- Code Optimizer – It optimizes the intermediate code for better performance
- Target Code Generator – It converts intermediate code to native machine code
- Profiler – It finds the hotspots (code that is executed repeatedly)
To understand the difference between interpreter and JIT compiler, assume that you have the code as:
int sum = 10;
for(int i = 1 ; i <= 10; i++) {
sum += i;
}
System.out.println(sum);
An interpreter will fetch the value of sum from memory for each iteration in the loop, then add the value of i to it, and write it back to memory. This is a costly operation and time consuming because it is accessing the memory each time it enters the loop for the results.
Whereas, the JIT compiler will recognize the HotSpot in the given example, and will perform optimizations on it. In the PC register for the thread JIT compiler will store a local copy of sum and will keep adding the value of i to it in the loop. It will write the value of the sum back to memory when the loop is complete.
Note: A JIT compiler takes more time to compile the code as compared to the interpreter to interpret the code line by line. Using the interpreter is better if running a program only once.
How the execution engine manages system resources?
System resources can be divided into two main categories: memory and everything else.
One of the responsibilities of JVM is to dispose of unused memory, and garbage collection is the mechanism that does that disposal. The JVM also allocates and maintains the referential structure that the developer takes for granted. For example, the JVM’s execution engine is responsible for taking something like the new keyword in Java and turning it into an OS-specific request for memory allocation.
Beyond memory, resources for file system access and network I/O are managed by the execution engine. This is no mean task as the JVM is interoperable across operating systems. The execution engine must be responsive to each OS environment and to each application’s resource needs. That is how the JVM is able to handle in-the-crucial demands.
Garbage Collector
Garbage collection is the process of automatically reclaiming the runtime unused memory by collecting unreferenced objects from the heap area and by destroying them. GC (Garbage Collector) carries this process.
The process is carried in two phases:
- Mark – The GC identifies the unused objects in memory
- Sweep – The GC removes the objects identified during the previous phase.
The JVM executes Garbage Collection automatically at regular intervals and does not need to be handled separately. It can be triggered by calling System.gc(), but the chances of execution are not guaranteed.
The JVM contains 3 different types of garbage collectors:
- Serial GC – It is designed for small applications running on single-threaded environments and is the simplest implementation of GC. The number of thread used for garbage collection is one. When it executed, it initiates a “stop the world” event where the entire application is paused. The JVM argument which is used for Serial Garbage Collector is -XX:+UseSerialGC
- Parallel GC – This is the default implementation of GC in the JVM, and is also known as Throughput Collector. Multiple threads are used for garbage collection, but it still pauses the application when running. The JVM argument which is used for Parallel Garbage Collector is -XX:+UseParallelGC.
- Garbage First (G1) GC – G1GC is designed for multi-threaded applications that have a large heap size available which is more than 4GB. It uses multiple threads to scan them by partitioning the heap into a set of equal size regions. G1GC performs garbage collection by identifying the regions with the most garbage to the least and then executing the garbage collection in the respective order. The JVM argument which is used for G1 Garbage Collector is -XX:+UseG1GC
Note: There is also a different type of garbage collector called Concurrent Mark Sweep (CMS) GC. However, the use has been discontinued.
Java Native Interface
Java Native Interface (JNI) is a foreign function interface programming framework that provides an interface to communicate with native applications (programs specific to a hardware and operating system platform) and libraries that are written in other languages such as C, C++ and assembly. JNI framework JNI offers a set of standard interface functions which Java uses to send output to the Console or interact with OS libraries.
Common JVM Errors
- ClassNotFoundException – This occurs when the Class Loader is trying to load classes using Class.forName(), ClassLoader.loadClass() or ClassLoader.findSystemClass() but the definition for the class with the specified name is not found.
- NoClassDefFoundError – This occurs when a compiler has successfully compiled the class, but the respective class file is not being located by the ClassLoader during the runtime.
- OutOfMemoryError – This occurs when the JVM is out of memory, and no more memory could be made available by the garbage collector because of which it cannot allocate an object.
- StackOverflowError – This occurs when the JVM runs out of space while creating new stack frames during processing a thread.