JVM Architecture
Technical Overview
The Java Virtual Machine is a software-based abstract machine that provides a runtime environment for executing Java bytecode. It is defined by the JVM Specification — a formal document published by Oracle — but the specification intentionally leaves implementation details open to allow differentiated runtimes. HotSpot (Oracle/OpenJDK), OpenJ9 (Eclipse/IBM), GraalVM, and Azul Zing/Zulu are all conforming JVM implementations with radically different internal mechanics.
The core promise of the JVM is bytecode portability: a .class file compiled once runs on any conforming JVM regardless of the underlying ISA or operating system. The price is an indirection layer between source semantics and hardware execution.
Prerequisites
- Familiarity with compiled vs interpreted language models
- Basic understanding of process memory layout (stack, heap, code segments)
- Understanding of class-based object-oriented programming
- Basic knowledge of operating system process management
JVM Architecture Diagram
+---------------------------------------------------------------+
| Java Source (.java) |
| | |
| javac compiler |
| | |
| Bytecode (.class) |
+---------------------------------------------------------------+
| JVM Process |
| |
| +-----------------+ +--------------------------------+ |
| | Class Loader | | Runtime Data Areas | |
| | Subsystem | | | |
| | | | +----------+ +-----------+ | |
| | Bootstrap CL | | | Heap | | Method | | |
| | Extension CL |--->| | (Objects)| | Area / | | |
| | Application CL | | | | | Metaspace | | |
| | | | +----------+ +-----------+ | |
| | Loading | | | |
| | Linking | | +----------+ +-----------+ | |
| | Initialization | | | JVM Stack| | PC Regs | | |
| +-----------------+ | | (per thr)| | (per thr) | | |
| | +----------+ +-----------+ | |
| +-----------------+ | | |
| | Execution Engine| | +----------+ | |
| | | | | Native | | |
| | Interpreter | | | Method | | |
| | JIT Compiler | | | Stack | | |
| | GC | | +----------+ | |
| +-----------------+ +--------------------------------+ |
| |
| +-----------------+ |
| | JNI / Native |<-- Calls into OS / native libraries |
| | Interface | |
| +-----------------+ |
+---------------------------------------------------------------+
Core Content
JVM Specification vs Implementation
The JVM Specification defines: - The class file format (.class) exactly — magic number, version, constant pool layout, method descriptor encoding - The bytecode instruction set (201 opcodes in Java SE 21) - The execution semantics of each instruction - The memory model (Java Memory Model — happens-before ordering, volatile semantics) - Required runtime behavior (exceptions, class initialization order)
The specification does not define: - How the heap is laid out internally - Which GC algorithm to use - Whether to JIT-compile code and when - Thread scheduling details - Internal data structure sizes
HotSpot vs OpenJ9 make completely different choices in all unspecified areas, which is why their performance characteristics differ dramatically under load.
Class Loader Subsystem
Class loading is a three-phase process: loading (read the binary data for the class), linking (verify, prepare, resolve), and initialization (run static initializers).
The parent delegation model is central to classloader security. When a classloader is asked to load a class, it first delegates to its parent. Only if the parent cannot find the class does the child loader attempt to load it. The hierarchy:
Bootstrap ClassLoader (loads rt.jar / java.base — C++ code in JVM)
|
Extension ClassLoader (loads lib/ext — Java code)
|
Application ClassLoader (loads -classpath — Java code)
|
Custom ClassLoader (OSGi, Tomcat, etc. — defined by the application)
This prevents a malicious class on the application classpath from shadowing java.lang.String. Bootstrap loader, written in native code, has no parent — it is the root.
Bytecode verification occurs during the linking/verification phase. The verifier performs data-flow analysis on the bytecode to prove type safety without executing the code — it checks that stack operand types match instruction expectations, that branches target valid instructions, and that local variable types are consistent. This provides the safety guarantee that prevents type confusion attacks via crafted bytecode.
Runtime Data Areas
Heap: Shared across all threads. Holds all object instances and arrays. Divided into young generation (Eden + two Survivor spaces) and old generation. Managed by the GC subsystem. Size controlled by -Xms (initial) and -Xmx (maximum).
Method Area / Metaspace: Holds per-class data — runtime constant pool, field/method descriptors, method bytecode, and JIT-compiled method code. In Java 8+, the Method Area is implemented as Metaspace, which lives in native memory (outside the Java heap) and grows dynamically. Controlled by -XX:MaxMetaspaceSize.
JVM Stack: One stack per thread. Each method invocation creates a stack frame containing: local variable array, operand stack, and a reference to the runtime constant pool. Stack depth controlled by -Xss (typically 512KB–1MB per thread). StackOverflowError is thrown when this limit is exceeded.
Program Counter Register: One per thread. Holds the address of the currently executing JVM instruction. Undefined for native methods.
Native Method Stack: Stores native method invocations (JNI calls). Analogous to the JVM stack but for C/C++ code invoked via JNI.
Class File Format
Every .class file begins with the magic number 0xCAFEBABE, followed by a minor and major version number. The structure:
ClassFile {
u4 magic; // 0xCAFEBABE
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[]; // pool of strings, numbers, type refs
u2 access_flags; // public, final, abstract, interface, etc.
u2 this_class; // index into constant pool
u2 super_class;
u2 interfaces_count;
u2 interfaces[];
u2 fields_count;
field_info fields[];
u2 methods_count;
method_info methods[]; // each has Code attribute with bytecode
u2 attributes_count;
attribute_info attributes[]; // SourceFile, LineNumberTable, etc.
}
The constant pool is a key structural element: it is a table of symbolic references (class names, method names, descriptors, string literals, numeric constants) that the bytecode references by index. This is how java.lang.String.valueOf(int) appears in bytecode as a CONSTANT_Methodref pointing to entries in the pool.
JVM Startup Sequence
- JVM native code initializes internal data structures
- Bootstrap classloader initializes (loads
java.lang.*,java.util.*, etc.) - Extension and application classloaders are created (as Java objects)
- The main class specified on the command line is loaded
- The main class is linked and initialized (static blocks run)
- The
public static void main(String[] args)method is invoked - The JVM runs until all non-daemon threads complete or
System.exit()is called
Shutdown hooks are threads registered via Runtime.getRuntime().addShutdownHook(Thread t). They run when the JVM begins normal shutdown. Used for resource cleanup: flushing buffers, releasing file locks, deregistering from service meshes. They do not run on Runtime.halt() or SIGKILL.
JVM Flags and Tuning
# Heap sizing
-Xms512m # initial heap size
-Xmx4g # maximum heap size
-Xss256k # per-thread stack size
# GC selection
-XX:+UseG1GC
-XX:+UseZGC
-XX:+UseShenandoahGC
# GC logging (Java 9+)
-Xlog:gc*:file=gc.log:time,uptime
# JIT diagnostics
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
# Metaspace
-XX:MaxMetaspaceSize=256m
-XX:MetaspaceSize=64m # initial metaspace commit
# Debugging
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/java
GraalVM Native Image
GraalVM Native Image performs ahead-of-time (AOT) compilation of a Java program to a standalone native executable. The compilation process:
- Points-to analysis (reachability analysis): determines which classes, methods, and fields are reachable at runtime
- Heap snapshotting: initializes selected classes at build time; their state is embedded in the image as a pre-built heap
- LLVM-style compilation of reachable code to native machine code
The result is an executable with millisecond startup time (no JVM warmup), small RSS footprint, and no JIT overhead. The tradeoff: no runtime JIT optimization (peak throughput is lower than a warmed-up HotSpot), and dynamic class loading is heavily restricted. Reflection, serialization, JNI, and dynamic proxies require explicit configuration (reflect-config.json, resource-config.json).
Historical Context
The JVM was introduced in 1995 with Java 1.0. The original HotSpot JVM, developed at Longview Technologies (later acquired by Sun in 1997), introduced adaptive compilation — the insight that profiling-guided JIT could outperform static AOT compilation by focusing optimization energy on hot code. OpenJ9 originated as IBM's J9 JVM, open-sourced in 2017. GraalVM was created at Oracle Labs and publicly released in 2019.
Java 8 (2014) was the last version to use PermGen (permanent generation) for the method area; Java 8 replaced it with Metaspace to eliminate java.lang.OutOfMemoryError: PermGen space from class-heavy applications like OSGi containers.
Production Examples
Heap pressure diagnosis:
# Check live objects and GC overhead
jstat -gcutil <pid> 1000 10
# Heap dump analysis
jmap -dump:format=b,file=heap.hprof <pid>
# Then analyze with Eclipse MAT or VisualVM
ClassLoader leak: In application servers (Tomcat, JBoss), redeployment creates new classloaders but if a static reference (thread-local, JDBC driver) holds a reference to a class loaded by the old classloader, that entire classloader — and all its loaded classes — cannot be GC'd. This is the classic Metaspace leak pattern.
Monitoring classloader activity:
-XX:+TraceClassLoading # prints each class load event
-XX:+TraceClassUnloading
Debugging Notes
jstack <pid>prints all thread stack traces — use for deadlock diagnosisjmap -histo <pid>prints live object histogram by class — identifies leak suspectsjcmd <pid> VM.native_memoryshows native memory breakdown when NMT is enabled (-XX:NativeMemoryTracking=summary)- OOM errors have distinct types:
Java heap space(heap full),Metaspace(class metadata full),unable to create new native thread(OS thread limit),Direct buffer memory(off-heap NIO exceeded) -verbose:classis the low-level version of class loading trace; useful when classloader delegation is behaving unexpectedly
Security Implications
- The bytecode verifier is a critical security boundary — without it, type confusion would allow arbitrary memory access in the JVM process
- The parent delegation model prevents classpath hijacking of core platform classes
- Deserialization vulnerabilities (CVE-2015-4852, Apache Commons Collections) exploit Java's object deserialization to achieve RCE — the JVM's dynamic class loading makes it possible to execute code embedded in serialized streams
- JVM security managers (deprecated in Java 17, removed in Java 24) provided sandboxing for applets and plugin environments
- Native Image eliminates entire attack surfaces: no dynamic class loading, no reflection without explicit declaration, reduced attack surface for deserialization gadget chains
Performance Implications
- JVM startup cost (class loading, JIT compilation reaching steady state) is 100ms–10s for typical applications
- JNI calls have measurable overhead due to crossing the JNI boundary (type marshaling, GC root pinning)
- Metaspace is in native memory: on systems with aggressive virtual memory limits (
ulimit -v), Metaspace growth can cause OOM at the OS level before the JVM heap fills - Thread stack size × thread count = significant memory. A server with 1000 threads at 1MB each uses 1GB in stacks alone before any heap allocation
Failure Modes
OutOfMemoryError: Java heap space— heap exhausted; examine GC logs, heap histogramOutOfMemoryError: Metaspace— too many classes loaded (classloader leak in app server)StackOverflowError— unbounded recursion; reduce-Xssis the wrong fix — fix the recursionOutOfMemoryError: unable to create new native thread— OS thread limit; reduce thread pool sizes or use virtual threads (Java 21+)- JVM crash (hs_err_pid.log) — typically JNI bug, native library crash, or hardware fault; analyze the crash log's register state and stack trace
Modern Usage
Java 21 introduced Virtual Threads (Project Loom), implemented entirely in the JVM: millions of lightweight threads mapped to a small pool of OS threads. The JVM scheduler parks virtual threads when they block on I/O, mounting them onto carrier OS threads only when they are runnable. This eliminates the thread-per-request scalability limit without requiring async/reactive programming styles.
GraalVM's Truffle framework allows implementing language runtimes (Python, Ruby, R, JavaScript) as AST interpreters that get JIT-compiled by the Graal JIT compiler through partial evaluation — effectively turning interpreter overhead into compiled code.
Future Directions
- Project Valhalla: value types in the JVM — flat, identity-free objects that can be stored inline in arrays without pointer indirection, eliminating pointer-chasing overhead for numeric and record types
- Project Lilliput: shrink object headers from 128 bits to 64 bits, reducing heap overhead for object-heavy workloads by 10–20%
- CRaC (Coordinated Restore at Checkpoint): JVM-level checkpoint/restore (via CRIU) to snapshot a warmed-up JVM and restore it with near-instant startup, combining Native Image startup speed with JIT peak throughput
Exercises
- Write a custom classloader that loads a
.classfile from an encrypted blob (decrypt at load time). Verify that it participates correctly in parent delegation forjava.lang.*classes. - Use
javap -verboseto inspect a compiled class file. Identify the constant pool entries referenced by a method invocation bytecode (invokevirtual). - Trigger a Metaspace leak by repeatedly defining new classes in a loop using a fresh classloader each iteration. Monitor with
-Xlog:gc*andjstat -gcmetacapacity. - Register a shutdown hook that detects whether the JVM is exiting cleanly vs via
Runtime.halt(). Observe its behavior underkill -9(SIGKILL) — hooks do not run. - Benchmark a JNI-heavy workload vs a pure-Java equivalent. Quantify JNI boundary crossing overhead at varying call frequencies.
References
- Tim Lindholm et al., The Java Virtual Machine Specification, Java SE 21 Edition. Oracle, 2023.
- Cliff Click & Michael Paleczny, "A Simple Graph-Based Intermediate Representation." ACM SIGPLAN Workshop on Intermediate Representations, 1995.
- Christian Wimmer & Michael Franz, "Linear Scan Register Allocation on SSA Form." CGO 2010.
- OpenJDK HotSpot Runtime source:
src/hotspot/share/— particularlyruntime/,memory/,classfile/ - GraalVM Native Image documentation: https://www.graalvm.org/latest/reference-manual/native-image/
- Project Loom: https://openjdk.org/projects/loom/