Garbage Collection
Technical Overview
Garbage collection is automatic memory management: the runtime reclaims heap memory occupied by objects that are no longer reachable from any live reference. The alternative — manual memory management — requires the programmer to explicitly allocate and free each object, introducing a class of bugs (use-after-free, double-free, memory leaks) that consistently accounts for 60–70% of security vulnerabilities in C/C++ codebases (Microsoft Security Response Center data). GC trades this bug class for runtime overhead and unpredictable pause times.
Modern GC algorithms have evolved from simple stop-the-world mark-sweep (Java 1.0) through generational collectors (Java 1.2) to today's concurrent, region-based, low-latency collectors (ZGC, Shenandoah) that achieve sub-millisecond pause times even on terabyte heaps.
Prerequisites
- Understanding of heap memory management and pointer semantics
- Basic graph theory (reachability analysis is graph traversal)
- Familiarity with CPU cache behavior and memory bandwidth
- JVM architecture background (see 01-jvm-architecture.md)
Generational Heap Diagram
+---------------------------------------------------------------+
| JVM Heap |
| |
| Young Generation Old Generation |
| +------------------------------------------+ +---------+ |
| | Eden Space | | | |
| | (new allocations — bump pointer) | | Old | |
| | | | Gen | |
| | +--------------+ +--------------+ | | | |
| | | Survivor S0 | | Survivor S1 | | | Tenured | |
| | | (from space) | | (to space) | | | objects | |
| | +--------------+ +--------------+ | | | |
| +------------------------------------------+ +---------+ |
| |
| Minor GC: Eden + S0 -> live objects copied to S1, age++ |
| Major GC / Full GC: Old Gen collected (expensive) |
| |
| Objects surviving N minor GCs (default N=15) are promoted |
| to Old Gen (tenuring threshold: -XX:MaxTenuringThreshold) |
+---------------------------------------------------------------+
Core Content
Why Garbage Collection
Manual memory management in C is error-prone by construction:
Use-after-free (UAF): Code reads or writes memory after it has been freed. The freed memory may have been reallocated to another object. Result: data corruption or type confusion. This class of bug is exploitable — an attacker can control what data occupies the freed slot.
Double-free: Calling free(ptr) twice on the same pointer. The heap allocator's internal metadata (free-list pointers embedded in the freed chunk) can be corrupted into an arbitrary-write primitive.
Memory leak: Allocated memory that is never freed. In long-running servers, leaks cause gradual RSS growth until the process is killed by the OOM killer. In short-lived programs, leaks are harmless (OS reclaims all memory on process exit).
Buffer overflow: Adjacent in that it involves manual size tracking; writing past the end of an allocated buffer is distinct from incorrect lifetime management but occurs in the same class of languages.
GC eliminates UAF and double-free entirely. Leaks still occur in GC'd languages — they manifest as object references that are retained longer than necessary (e.g., a static collection that accumulates objects indefinitely). These are semantic leaks, not memory corruption bugs.
GC Fundamentals: Reachability and Roots
An object is live if it is reachable via a chain of references from a root. Roots are:
- Local variables and operand stack slots in all active thread stack frames
- Static fields of loaded classes
- JNI global references (C code holding Java objects)
- References from system classes (String interning table, finalizer queue)
The GC's job: identify all live objects (reachability traversal), and reclaim everything else.
Mark-Sweep
Phase 1 — Mark: Starting from roots, traverse all reference edges. Mark each reachable object (set a bit in the object header or a separate bitmap).
Phase 2 — Sweep: Linearly scan the heap. Any object without the mark bit set is garbage; its memory is added to the free list.
Problem: Fragmentation. After many mark-sweep cycles, the heap becomes a checkerboard of live objects and free holes. Allocation of a large object fails even if total free memory is sufficient because no contiguous free region is large enough. The allocator must search the free list for a suitable hole (first-fit, best-fit, next-fit strategies).
Before GC: [A][B][C][D][E][F][G]
live dead live dead live dead live
After sweep: [A][ ][C][ ][E][ ][G]
(fragmented free list)
Mark-Compact
After marking, instead of simply sweeping, the collector slides all live objects toward the start of the heap, filling gaps. All pointers to moved objects are updated.
Benefits: No fragmentation. Allocation is again a simple bump pointer (increment a pointer, no free-list search). Cost: Three passes over the heap minimum (mark, compute new locations, update references, move objects). Slow for large heaps. Causes long stop-the-world pauses.
Copying GC (Semi-Space)
Split the heap into two equal halves: from-space and to-space. Allocation uses a bump pointer in from-space — extremely fast (compare pointer, branch, increment pointer). When from-space is exhausted:
- Trace all live objects reachable from roots
- Copy each live object into to-space (compact as you go)
- Update all pointers to refer to the new locations in to-space
- Swap the roles of from-space and to-space
Result: No fragmentation, fast allocation. Cost: Half the heap is unusable (wasted as the "other" semi-space). Live data is copied, which is proportional to live set size, not heap size. If the live set is small, copying GC is very fast.
Generational Hypothesis
Empirical observation across many workloads: most objects die young. Iterator objects, temporary strings, boxed numbers, short-lived closures — allocated in one method, dead by the next. A small fraction of objects are long-lived: caches, global configuration, session state.
Generational GC exploits this by maintaining separate generations: - Young generation (nursery): New objects allocated here. Collected frequently (minor GC). Because most objects are dead, the amount of live data to copy is small — minor GC is fast. - Old generation (tenured): Objects that survive several minor GCs are promoted here. Collected rarely (major GC / full GC). Old-gen collection is expensive because it processes more data.
The challenge: old-gen objects can reference young-gen objects. The GC must track old-to-young references to treat them as additional roots during minor GC, without scanning the entire old gen. This is solved with a card table (coarse-grained dirty bit for 512-byte card regions) or remembered set (per-region set of inbound references).
G1GC (Garbage First)
G1GC, the default GC since Java 9, is designed for large heaps (4GB–64GB) with predictable pause time targets.
Region-based heap: The heap is divided into equal-sized regions (1MB–32MB, power of 2). Regions are not fixed to a generation — any region can be young-gen, old-gen, or humongous (large objects spanning multiple contiguous regions). This allows G1 to adapt the young/old ratio dynamically.
Concurrent marking: G1 runs a concurrent marking cycle (overlapping with application threads) to identify the liveness of old-gen regions. This uses a tri-color invariant with a SATB (Snapshot-At-The-Beginning) write barrier.
Predicted pause time: G1 builds a model of how long it takes to collect each region (based on live data density and collection cost). During each GC pause, it collects the set of regions whose collection cost fits within the pause time target (-XX:MaxGCPauseMillis=200). It prefers to collect regions with the most garbage first — hence "Garbage First."
Mixed GC: After a concurrent marking cycle identifies old-gen regions with high garbage fraction, subsequent GC pauses are "mixed" — they collect the young gen plus a portion of old-gen regions.
G1 Heap Layout:
+---+---+---+---+---+---+---+---+---+---+---+---+
| Y | Y | O | O | F | Y | H | H | O | F | Y | O |
+---+---+---+---+---+---+---+---+---+---+---+---+
Y = Young O = Old F = Free H = Humongous
Each box = one region (e.g., 4MB)
ZGC
ZGC (Z Garbage Collector), available production-ready in Java 15+, achieves sub-millisecond GC pauses (typically <0.5ms) for heaps up to 16TB.
Colored pointers: ZGC stores metadata bits in the object pointer itself (using the upper bits of a 64-bit address, which are unused on current hardware). These bits encode: the object's finalizable status, remapped status (has it been moved to its new location), and marked status. Load barriers read these bits on every pointer load.
Load barrier: On every Java reference load (obj.field), ZGC inserts a small code sequence to check whether the pointer's metadata bits indicate that a concurrent relocation or marking is needed. If so, the barrier handles it. This is a per-object-reference cost, not a per-GC-cycle cost. The barrier is optimized to a handful of instructions in JIT-compiled code.
Concurrent relocation: ZGC can move objects while the application threads run — without stopping them. It uses the load barrier to forward old pointers to new locations on the fly.
Sub-ms pauses: ZGC's stop-the-world pauses are limited to: root scanning (a few milliseconds, depends on thread count and JNI global count) and a short synchronization phase. All marking, relocation, and remapping are concurrent.
Cost: ZGC uses more CPU (barrier overhead ~5–15%, load barrier is on every reference load, not just stores as in G1's write barrier) and more memory (forwarding tables, colored pointer metadata overhead) compared to G1.
Shenandoah GC
Shenandoah (Red Hat, OpenJDK) also targets sub-millisecond pauses using concurrent evacuation, but via a different mechanism: a Brooks pointer (or forwarding pointer) embedded in each object header. When Shenandoah relocates an object, it leaves a forwarding pointer at the old location pointing to the new location. A read/write barrier on every access checks if the object has a forwarding pointer and follows it.
Shenandoah is available in OpenJDK and is particularly favored in GraalVM and Red Hat builds. It achieves comparable latency to ZGC with slightly different throughput characteristics.
GC Tuning Flags
# G1GC tuning
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200 # pause target (ms)
-XX:G1HeapRegionSize=8m # region size
-XX:G1NewSizePercent=20 # min young gen %
-XX:G1MaxNewSizePercent=60 # max young gen %
-XX:InitiatingHeapOccupancyPercent=45 # start concurrent marking
# ZGC
-XX:+UseZGC
-XX:+ZGenerational # generational ZGC (Java 21+)
-XX:SoftMaxHeapSize=28g # soft heap limit for ZGC
-XX:ZUncommitDelay=300 # uncommit unused memory after 5min
# GC logging
-Xlog:gc*:file=/var/log/gc.log:time,uptime,level
-Xlog:safepoint:stdout # safepoint pause logging
GC Log Analysis
Key metrics to monitor from GC logs:
- GC pause duration: Time the application was stopped. P99 matters more than average.
- GC frequency: Minor GC every 100ms is usually fine; every 10ms indicates allocation pressure.
- Heap occupancy before/after GC: If old gen never drops much, you have a leak or under-sized heap.
- Time spent in GC:
jstat -gcutil <pid>reportsGCT(total GC time) andYGC/FGCcounts. - Promotion rate: Rate at which objects graduate from young to old gen. High promotion rate stresses the old gen collector.
Tools: GCEasy.io, GCViewer, the JVM's built-in -Xlog:gc* output.
Historical Context
John McCarthy's LISP (1960) introduced the first garbage collector using mark-sweep. Generational garbage collection was proposed by David Ungar in his 1984 paper "Generation Scavenging" and was a foundational innovation for making GC practical in production. The Baker real-time copying collector (1978) introduced incremental GC. Java 1.0 shipped with a simple serial stop-the-world mark-sweep collector. Java 2 introduced generational collection. The CMS collector (Concurrent Mark Sweep, Java 1.4.1–14) was the first production concurrent old-gen collector in HotSpot. G1 replaced CMS as the default in Java 9.
Production Examples
# Identify GC pressure live
jstat -gcutil $(pgrep java) 1000
# Output:
# S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
# 0.00 98.54 89.33 62.45 94.01 90.61 1462 8.534 2 0.221 8.755
# Find objects contributing to old gen growth
jcmd <pid> GC.heap_info
jmap -histo:live <pid> | head -30
# Profile allocation sites (Java Flight Recorder)
java -XX:StartFlightRecording=duration=60s,filename=rec.jfr,\
settings=profile MyApp
# Analyze with JDK Mission Control
Debugging Notes
- High
FGCcount injstatoutput signals a problem (full GC should be rare) GCLockerpauses (JNI critical section preventing GC) appear in GC logs asGCLocker Initiated GC— reduce JNI critical section duration- Humongous object allocation in G1 bypasses the young gen; objects larger than 50% of a region size go directly to old gen, triggering concurrent marking. Large arrays or Strings can cause this.
- ZGC with colored pointers requires that object references not be corrupted; if native code (JNI) stores a Java reference stripped of its tag bits, the load barrier may malfunction
Security Implications
- GC finalizers (
finalize()method, deprecated in Java 9, removed in Java 18 default behavior) enabled resurrection attacks — infinalize(), the object can storethisin a global reference, escaping collection. Usejava.lang.ref.Cleanerinstead. - GC behavior is a timing oracle: the time between two GC cycles and their duration can leak information about heap allocation patterns. Relevant in speculative execution and timing side-channel research.
- Unsafe direct memory access (
sun.misc.Unsafe.allocateMemory) bypasses GC entirely — leaks and corruption are possible. Used in DirectByteBuffer, off-heap caches (Chronicle Map, Ignite, Hazelcast).
Performance Implications
- Allocation rate (bytes/sec) is the primary GC driver. Reducing allocation is the most impactful GC optimization.
- Object pooling is a common but dangerous optimization: pools defeat generational GC by keeping objects alive across many minor GCs, promoting them to old gen, and increasing old-gen collection pressure.
- False sharing in card tables: concurrent threads writing to adjacent objects that fall in the same 512-byte card will both dirty that card, causing GC to scan the region unnecessarily.
- Large live sets increase GC time proportionally (mark time scales with live set, not heap size for concurrent collectors).
Failure Modes
- GC thrash (OutOfMemoryError): Application spends >98% of time in GC, recovering <2% heap per cycle. The JVM throws
OutOfMemoryError: Java heap space. Diagnosis: GC log shows near-zero allocation growth between GCs; heap histogram identifies the dominant live objects. - Metaspace OOM: Classloader leak; Metaspace grows until
-XX:MaxMetaspaceSizeis reached. - Direct memory OOM: Off-heap
ByteBufferor Unsafe allocation exhausts system memory without hitting Java heap limit. - Promotion failure (G1): Young gen collection cannot promote surviving objects to old gen because old gen is full. Results in a full GC.
-XX:G1ReservePercentand larger heap are the fixes.
Modern Usage
Java 21's Generational ZGC (-XX:+UseZGC -XX:+ZGenerational) adds generational support to ZGC — tracking young and old generations within ZGC's concurrent framework. This dramatically reduces ZGC's CPU overhead for allocation-heavy workloads (the most common case) while maintaining sub-millisecond pause times.
Virtual threads (Loom) change GC pressure patterns: virtual threads are stack-less when parked (their stack is serialized to heap as a continuation). Millions of parked continuations in the heap are live GC roots of a new kind, requiring careful GC scheduling.
Future Directions
- Epsilon GC (
-XX:+UseEpsilonGC): No-op GC — never collects. For benchmarking (measure true allocation cost without GC interference) and extremely short-lived apps that don't need reclamation. - Adaptive region sizing in G1: Dynamic adjustment of region size based on allocation patterns
- ML-driven GC tuning: Research into using online learning to predict optimal GC pause targets and heap sizing based on workload patterns, reducing the need for manual tuning
- Hardware GC offload: IBM has explored GPGPU-based GC marking; research prototypes exist for hardware-assisted concurrent marking using CPU hardware counters
Exercises
- Write a Java program that allocates objects at a controlled rate. Use
-Xlog:gc*to observe minor GC frequency. Increase the allocation rate until you trigger old-gen promotion and eventual full GC. Find the allocation rate threshold. - Implement a mark-sweep GC for a toy language (use a linked list of fixed-size cells). Add compaction. Measure fragmentation (average allocation search time) before and after compaction.
- Simulate a generational GC scenario: create a long-lived
ArrayListand repeatedly add then remove short-lived objects. Observe with G1GC how promotion rate affects old-gen growth vs. a scenario without the long-lived collection. - Compare ZGC vs G1GC pause times on a latency-sensitive HTTP workload (use JMH with a Blackhole consumer). Run each GC under 80% heap occupancy. Plot the P50/P99/P999 pause distributions.
- Write a finalizer-based object resurrection and demonstrate why
finalize()is unreliable for resource cleanup: show that the object can be GC'd with its finalizer never called (create heap pressure, callSystem.runFinalization()and observe non-determinism).
References
- Richard Jones, Antony Hosking, Eliot Moss. The Garbage Collection Handbook. CRC Press, 2011. The definitive reference.
- David Ungar, "Generation Scavenging: A Non-Disruptive High Performance Storage Reclamation Algorithm." ACM SIGPLAN Notices, 1984.
- David Detlefs et al., "Garbage-First Garbage Collection." ISMM 2004.
- Per Liden & Stefan Karlsson, "The Z Garbage Collector: An Scalable Low-Latency Garbage Collector." OpenJDK JEP 333/376 design notes.
- Aleksey Shipilev, "The Black Magic of (Java) Method Dispatch." — extensive blog on JVM internals including GC interaction.
- OpenJDK GC source:
src/hotspot/share/gc/—g1/,z/,shenandoah/