Skip to content

LLVM Architecture

Technical Overview

LLVM is a collection of modular compiler infrastructure libraries. Its central design principle is a clean separation between language front-ends, an optimization middle-end (operating on a common IR), and machine back-ends. Any language front-end that emits LLVM IR gets all LLVM's optimizations and all its target backends for free. Any new target backend immediately supports all languages that compile to LLVM IR.

LLVM IR is not assembly — it is a typed, explicit SSA-form intermediate representation with first-class functions, explicit control flow graphs, and no target-specific assumptions. An LLVM IR file (hello.ll) is as human-readable as pseudo-assembly, can be round-tripped through a text representation, and is the common currency exchanged between every component in the toolchain.

Prerequisites

  • Compiler pipeline fundamentals (see 01-compilation-pipeline.md)
  • SSA form basics: each variable defined exactly once, phi nodes at control flow merges
  • Basic understanding of register machines and ISAs
  • Familiarity with compiling C/C++ with clang

LLVM Architecture Diagram

                         LLVM Architecture

  Language Front-Ends          Middle-End            Back-Ends
  +----------------+
  | Clang          |
  | (C/C++/ObjC)   |
  +-------+--------+
          |
  +-------+--------+    +---------------+    +----------------+
  | Rust (rustc)   +--->+               +--->+ x86 / x86-64   |
  +----------------+    |  LLVM IR      |    +----------------+
  +----------------+    |  (SSA form)   |    +----------------+
  | Swift (swiftc) +--->+               +--->+ AArch64 / ARM  |
  +----------------+    |  +----------+ |    +----------------+
  +----------------+    |  | Optimizer| |    +----------------+
  | Julia          +--->+  | Passes   | +--->+ RISC-V         |
  +----------------+    |  | (opt)    | |    +----------------+
  +----------------+    |  +----------+ |    +----------------+
  | Kotlin/Native  +--->+               +--->+ WebAssembly    |
  +----------------+    +---------------+    +----------------+
  +----------------+                         +----------------+
  | MLIR dialects  |    +---------------+    | NVPTX (CUDA)   |
  +----------------+    | LLVM JIT      |    +----------------+
                        | (MCJIT / ORC) |    +----------------+
                        +---------------+    | SPIR-V (GPU)   |
                                             +----------------+

Core Content

Historical Context

LLVM began as Chris Lattner's PhD research project at the University of Illinois at Urbana-Champaign (UIUC) in 2000. The original paper, "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation" (CGO 2004), introduced the concept of persistent IR across compilation, optimization, and link time — enabling profiling-based re-optimization of deployed programs ("lifelong" refers to the software's deployment lifetime).

Apple hired Chris Lattner in 2005 to use LLVM as the foundation for Apple's compiler infrastructure. Apple needed to replace the then-dominant GCC, which had an incompatible license (GPL v3) and a monolithic architecture that prevented embedding in Xcode's IDE features. LLVM's library architecture allowed Apple to use the same compiler code for IDE syntax highlighting, error recovery, code completion, and background compilation — all features that GCC's architecture made impossible.

Clang, the C/C++/Objective-C front-end, was developed by Apple starting in 2007 and open-sourced in 2007. By 2010, Clang/LLVM was production quality and dramatically faster to compile than GCC, with better error messages and diagnostics.

Today (2025), LLVM is the compiler backend for: Clang (C/C++/ObjC), Rust (rustc), Swift (swiftc), Julia (JIT), Kotlin/Native, WASM Emscripten, CUDA (through NVPTX backend), and dozens of research and domain-specific language compilers.

LLVM IR: Static Single Assignment

LLVM IR is the beating heart of LLVM. Key properties:

SSA form: Each variable (identified by a %name or a numbered %N) is assigned exactly once in the text. SSA enables precise dataflow analysis: to find all uses of a value, follow its def-use chain. To find where a value comes from, follow the use-def chain. No aliasing at the IR level.

Typed: Every value has a type. i32 = 32-bit integer, i64 = 64-bit integer, double = IEEE 754 double, i8* = pointer to byte (pre-opaque-pointer era), ptr = opaque pointer (current default). Operations are type-checked at the IR level.

Explicit CFG: Functions are lists of basic blocks. Each basic block ends in a terminator instruction (ret, br, switch, invoke). The control flow graph is explicit — there are no implicit fall-throughs.

Phi nodes: At basic block entries where control flow merges, phi nodes select a value based on which predecessor block was entered:

define i32 @max(i32 %a, i32 %b) {
entry:
  %cmp = icmp sgt i32 %a, %b
  br i1 %cmp, label %return_a, label %return_b

return_a:
  br label %done

return_b:
  br label %done

done:
  ; phi selects %a if we came from return_a, %b if from return_b
  %result = phi i32 [ %a, %return_a ], [ %b, %return_b ]
  ret i32 %result
}

Memory model: LLVM IR has alloca for stack allocation, load and store for memory access. All other operations are purely functional (no side effects). This separation of memory from computation is critical for alias analysis and optimization.

Full LLVM IR example (a simple function):

; ModuleID = 'hello.c'
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [13 x i8] c"Hello, %s!\0A\00"

declare i32 @printf(ptr noundef, ...)

define dso_local i32 @main(i32 %argc, ptr %argv) {
entry:
  %call = call i32 (ptr, ...) @printf(ptr @.str, ptr @name)
  ret i32 0
}

LLVM Pass Infrastructure

LLVM's optimization and analysis passes are the modular components that operate on LLVM IR.

Analysis passes: Compute information about the IR without modifying it. Examples: - DominatorTreeAnalysis: Which basic blocks dominate which others (BB A dominates BB B if every path from entry to B goes through A) - LoopAnalysis: Identify natural loops in the CFG - AliasAnalysis: Determine whether two pointers may refer to the same memory location - ScalarEvolution: Compute mathematical representations of induction variable values

Transform passes: Modify the IR. They use the results of analysis passes. - InstCombine: Combines chains of simple instructions into simpler equivalents. Dozens of algebraic identities: x + 0 → x, x * 1 → x, (x - y == 0) → (x == y). - SCCP (Sparse Conditional Constant Propagation): Propagates constants through the CFG, pruning infeasible branches. - GVN (Global Value Numbering): Eliminates redundant computations. - LICM (Loop Invariant Code Motion): Hoists invariant computations out of loops. - LoopVectorize: Vectorizes loops using SIMD. - Inliner: Inline function calls based on a cost model. - DeadArgumentElimination: Remove unused function arguments.

Pass Manager: Coordinates which passes run, in what order, and manages caching of analysis results. LLVM has two pass managers: - Legacy Pass Manager (pre-LLVM 13): Module and function pass managers with explicit dependency specification. Functional but limited in optimization opportunity. - New Pass Manager (default since LLVM 13): Redesigned with lazy analysis caching, better pipeline composition, and lower overhead. The new PM is specified as a textual pipeline: clang -fexperimental-new-pass-manager -O2 (now default).

Running specific passes:

# Run specific optimization passes
opt -S -passes='instcombine,gvn,licm' hello.ll -o optimized.ll

# Run the full O2 pipeline
opt -S -O2 hello.ll -o optimized.ll

# Print pass pipeline used at O2
clang -O2 -mllvm -debug-pass=Structure hello.c 2>&1 | head -50

LLVM Backend

The LLVM backend converts optimized IR to target machine code through a sequence of lowering phases:

SelectionDAG: IR is first converted to a SelectionDAG (Directed Acyclic Graph) per basic block. The DAG represents the computations with explicit data dependencies. Target-specific lowering occurs here: calling conventions, memory access patterns, special intrinsics.

Instruction selection: DAG nodes are pattern-matched against the target's instruction definitions (written in TableGen format — a declarative language for describing instruction sets). Each pattern match selects one or more target instructions. LLVM uses a tree-pattern matching algorithm (bottom-up rewriting system).

GlobalISel (Global Instruction Selection): A newer framework that operates on the full function (not per-basic-block DAGs) for better optimization opportunity. Gradually replacing SelectionDAG for newer targets.

Register allocation: After instruction selection, code uses virtual registers. Register allocation maps virtual to physical. LLVM implements several algorithms: - Greedy (default): A sophisticated allocator using live range splitting, spilling heuristics, and iterative refinement. - Basic: Fast, simple, used in unoptimized builds. - PBQP (Partitioned Boolean Quadratic Problem): Models allocation as PBQP for special constraints.

Instruction scheduling: Reorder instructions to minimize pipeline stalls and improve instruction-level parallelism. LLVM has two schedulers: pre-RA (before register allocation) and post-RA.

Machine code emission: The final stage emits ELF, Mach-O, or COFF object files. LLVM's MC (Machine Code) layer handles object file format generation, relocation encoding, and debug info (DWARF) emission.

Clang as LLVM Frontend

Clang is not just "a wrapper around LLVM" — it is a complete compiler front-end including its own lexer, parser, semantic analyzer, and AST representation (the clang::AST). Clang's AST is more faithful to the source language than LLVM IR — it preserves source locations, template instantiation information, and source-level types. This makes Clang's AST the basis for clangd (language server), clang-tidy (linter), clang-format, and AST-based refactoring tools.

Clang's IRGen (IR Generation) phase lowers the Clang AST to LLVM IR. This phase handles C++ specific lowering: vtable construction and layout, exception table generation (for try/catch), RTTI (runtime type information), constructor/destructor call ordering.

Producing LLVM IR from C:

clang -emit-llvm -S -O0 hello.c -o hello.ll   # text IR
clang -emit-llvm -O2 hello.c -o hello.bc      # bitcode
llvm-dis hello.bc -o hello.ll                  # bitcode → text
llvm-as hello.ll -o hello.bc                   # text → bitcode

LLVM for JIT Compilation

LLVM provides two JIT frameworks:

MCJIT: The original, simpler JIT. Compiles a full LLVM Module to machine code in memory and exposes function pointers. Used by older Julia versions, LLDB's expression evaluator, and PostgreSQL's JIT for query execution.

ORC JIT (On-Request Compilation): The modern, layered JIT framework. ORC stands for "On-Request Compilation." It uses a lazy compilation model — functions are compiled on first call, not upfront. ORC supports: - Lazy compilation (stub replaces function, compiles on first call) - Concurrent compilation (compile functions in parallel) - Symbol linking between JIT'd and native code - Debugging support (DWARF integration, GDB JIT interface)

ORC is used by Julia (since 1.7), LLDB, and research JIT compilers.

JIT usage in PostgreSQL (a real production use case): PostgreSQL 11+ uses LLVM ORC JIT to compile expression evaluation code for hot queries. A WHERE clause age > 18 AND country = 'US' is compiled to native code that directly compares integers and string bytes without the overhead of the interpreter dispatch loop. Speedups of 2–10x for compute-heavy queries.

LLVM Backends

LLVM supports 20+ target backends as of 2025. Key ones:

x86 / x86-64: Most mature and heavily optimized backend. Supports all x86 SIMD extensions: SSE, SSE2, SSE4, AVX, AVX2, AVX-512. Clang targeting x86-64 is the reference implementation for most LLVM features.

AArch64 (ARM 64-bit): Used for Apple Silicon, AWS Graviton, Android, and server ARM. Apple's aggressive use of LLVM for all macOS/iOS tooling has made this backend very mature.

RISC-V: Rapidly growing. All RISC-V compiler work centers on LLVM due to the ISA's clean extensibility matching LLVM's target abstraction.

WebAssembly: The LLVM WebAssembly backend produces .wasm files. Emscripten uses Clang + LLVM's WASM backend to compile C/C++ to WebAssembly. WASM is stack-based, not register-based — the backend handles the translation.

NVPTX: Compiles LLVM IR to NVIDIA PTX assembly, which is then assembled by ptxas to GPU microcode. Clang CUDA support uses this. OpenCL and SYCL compilers targeting NVIDIA also use NVPTX.

SPIR-V: For Khronos group GPU APIs (OpenCL, Vulkan compute). LLVM can emit SPIR-V via a dedicated backend or via the separate SPIRV-LLVM-Translator project.

MLIR: Multi-Level IR

MLIR (Multi-Level Intermediate Representation) is an LLVM sub-project that provides a framework for defining custom IR dialects and transformations. Motivation: LLVM IR is a single abstraction level (roughly assembly-level). Domain-specific languages (ML frameworks, hardware synthesis languages, scientific computing) have higher-level structure (tensors, loops over multidimensional arrays, dataflow) that is destroyed when lowered to LLVM IR, preventing domain-specific optimization.

MLIR allows defining IR dialects at arbitrary abstraction levels: - affine dialect: Polyhedral loop transformations on affine loops - linalg dialect: Linear algebra operations on tensors - tensor dialect: Multi-dimensional tensor types - func dialect: Functions and calls (the LLVM IR analog)

Dialects can be progressively lowered: linalgaffinescf (structured control flow) → llvm dialect (maps 1:1 to LLVM IR). Each lowering step can apply dialect-specific optimizations that would be impossible at LLVM IR level.

MLIR is used by: TensorFlow/MLIR (Google), ONNX-MLIR, Halide (via Halide→MLIR migration), Flang (LLVM Fortran frontend), and CIRCT (hardware synthesis).

Production Examples

# View LLVM IR for a C function
cat > test.c << 'EOF'
int square(int x) { return x * x; }
int sum_squares(int a, int b) { return square(a) + square(b); }
EOF

# Without optimization — two separate function calls
clang -emit-llvm -S -O0 test.c -o test_O0.ll
cat test_O0.ll

# With optimization — square() inlined, optimized to a mul + add
clang -emit-llvm -S -O2 test.c -o test_O2.ll
cat test_O2.ll

# Show optimization remarks (why did the compiler make these decisions?)
clang -O2 -Rpass=inline -Rpass-missed=inline test.c -c 2>&1

# Run the loop vectorizer with verbose diagnostics
clang -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize test.c -c 2>&1

Debugging Notes

  • opt -verify verifies that LLVM IR is well-formed (types are consistent, CFG is valid, SSA is correct). Run this before and after custom passes to catch IR corruption.
  • LLVM pass debugging: -debug-only=instcombine prints debug output from a specific pass (requires LLVM built with LLVM_ENABLE_ASSERTIONS=ON)
  • llvm-mca (Machine Code Analyzer) simulates the CPU execution of a code snippet using LLVM's scheduling models — shows throughput, latency, and bottlenecks per instruction
  • Assertion failures in LLVM builds typically mean the IR is malformed; the assertion message names the violated invariant

Security Implications

  • LLVM sanitizers: AddressSanitizer (ASan), MemorySanitizer (MSan), ThreadSanitizer (TSan), UndefinedBehaviorSanitizer (UBSan) are LLVM passes that instrument IR to detect memory errors, data races, and UB at runtime. These are production-quality security tools.
  • Safe Stack (-fsanitize=safe-stack): Separates the stack into a safe stack (for variables that don't have their address taken) and an unsafe stack (for variables that could be overflowed). Mitigates stack buffer overflow ROP attacks.
  • Shadow Call Stack (-fsanitize=shadow-call-stack): Maintains a separate stack for return addresses, preventing return address overwrites from corrupting control flow.
  • CFI (Control Flow Integrity) (-fsanitize=cfi): LLVM's CFI implementation inserts checks before indirect calls to verify the target is a valid function of the expected type. Defeats most call-oriented ROP chains.

Performance Implications

  • LLVM compilation time scales with IR size and optimization level. -O0 is fast (~0.1s for small files); -O3 -flto on a large C++ TU can take minutes.
  • LLVM's IR size is ~10x the source code size before optimization (many temporaries, explicit casts, expanded templates). After optimization, it may be 1/2 the original IR size.
  • The new pass manager (default since Clang 14) has lower overhead than the legacy PM for large files due to better analysis caching.

Failure Modes

  • Miscompilation: The LLVM optimizer makes an incorrect transformation based on a buggy analysis or wrong assumption. Very rare in released LLVM versions; tracked in LLVM bug tracker as P1 bugs.
  • IR verification failures: Custom passes that produce invalid IR (unmatched types, undefined values used, malformed phi nodes) cause crashes downstream. Always run opt -verify after a custom pass.
  • Backend assertion failure: Target-specific code generation hits an unhandled case. Usually means a frontend emitted IR that the backend didn't expect for that target.

Modern Usage

Rust's rustc uses LLVM as its primary code generation backend (an alternative, Cranelift, is used for debug builds). The Rust compiler emits LLVM IR from its own MIR (Mid-level IR). LLVM's LTO is used for Rust's "thin LTO" mode, which provides significant speedups with manageable link times.

The Zig programming language uses LLVM IR as its compilation target, betting that LLVM's maturity and backend diversity justify the dependency.

Future Directions

  • MLIR → LLVM IR migration: LLVM is gradually moving more optimization work into MLIR dialects. The func, arith, and memref dialects in MLIR now cover much of what previously required LLVM IR manipulation directly.
  • BOLT post-link optimizer: BOLT (Binary Optimization and Layout Tool) restructures compiled binaries based on runtime profile data (PGO at the binary level). Meta uses it on production binaries for 5–10% speedup.
  • GPU-first compilation: MLIR's GPU dialects (gpu, nvgpu) are becoming the preferred route for GPU code generation, with LLVM's NVPTX/AMDGPU backends as the final step.
  • LLVM in the browser: LLVM compiled to WebAssembly (via Emscripten) enables browser-based compilation tools. Compiler Explorer (godbolt.org) runs Clang/LLVM in a server process; near-future versions may run client-side.

Exercises

  1. Install LLVM and compile a C function at -O0 and -O2. Use llvm-diff to compare the IR. Identify which optimizations (from opt --print-passes) were responsible for each difference.
  2. Write a minimal LLVM pass (in C++) that counts the number of add instructions in each function and prints the count. Register it with the new pass manager. Run it on a test file.
  3. Use llvm-mca to analyze a tight inner loop (e.g., a SIMD-vectorized array sum). Identify the throughput bottleneck (memory, arithmetic, or instruction scheduling). Compare the analysis against measured performance via perf stat.
  4. Compile a function to WASM using clang --target=wasm32-wasi. Disassemble the .wasm binary with wasm-objdump -d. Observe how the LLVM WebAssembly backend maps SSA IR to WASM's stack machine.
  5. Extend the PostgreSQL JIT experiment: write a benchmark that runs an expression-heavy SQL query (e.g., SELECT sum(a*b + c*d) FROM t) with JIT enabled and disabled (SET jit = off). Compare throughput. Then use EXPLAIN (ANALYZE, JIT VERBOSE) to see which expressions were JIT-compiled.

References

  • Chris Lattner & Vikram Adve, "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation." CGO 2004.
  • LLVM Language Reference Manual: https://llvm.org/docs/LangRef.html
  • LLVM Programmer's Manual: https://llvm.org/docs/ProgrammersManual.html
  • LLVM Pass Framework: https://llvm.org/docs/NewPassManager.html
  • MLIR documentation: https://mlir.llvm.org/docs/
  • Chris Lattner, "The Architecture of Open Source Applications: LLVM." https://aosabook.org/en/llvm.html
  • "MLIR: Scaling Compiler Infrastructure for Domain Specific Computation." CGO 2021.