Skip to content

05 — WebAssembly

Overview

WebAssembly (WASM) is a binary instruction format for a stack-based virtual machine. It was designed as a portable compilation target for high-level languages, enabling deployment on the web at near-native performance. First released in 2017 as a cross-browser standard, WebAssembly has expanded beyond the browser into a general-purpose sandboxed runtime for serverless computing, edge execution, plugin systems, and embedded environments. It represents the convergence of two decades of failed attempts to bring non-JavaScript languages to the web and the lessons learned from every prior sandbox model.


Historical Context: The Long Road to WASM

The web needed a way to run computationally intensive code long before WebAssembly existed. Every approach prior to WASM had fundamental flaws:

Netscape Plugins (NPAPI, 1995–2016): Allowed native code (C/C++ DLLs) to run inside the browser via the Netscape Plugin API. Flash, Java Applets, Silverlight, QuickTime, and Acrobat all used NPAPI. The security model was essentially nonexistent — plugins ran with full user-level privileges. A single Flash exploit could compromise the entire OS. NPAPI was deprecated by Chrome in 2015 and removed in 2017.

Google Native Client (NaCl, 2011–2019): A sandboxed runtime for native x86/ARM code inside Chrome. NaCl used static analysis and software fault isolation (SFI) to ensure native code could not escape its sandbox. The sandbox worked well but NaCl was Chrome-only, required architecture-specific compilation, and never achieved broad web adoption.

asm.js (2013): A strict subset of JavaScript designed to be compiled efficiently by JIT engines. Emscripten (a C/C++ to JavaScript compiler built on LLVM) targeted asm.js. Performance was impressive — within 2× of native — but the output was enormous text files, parsing was slow, and the approach relied on JIT behavior that couldn't be standardized.

WebAssembly unified and superseded all of these. It is a proper standard (W3C), supported by Chrome, Firefox, Safari, and Edge since March 2017. It ships as a compact binary format that decodes faster than JavaScript parses, runs in all major browsers without plugins, and has a well-defined sandbox.


WebAssembly Design Principles

Binary Format

WASM is not text. The .wasm file is a binary format with a defined encoding for all constructs. A WASM module starts with a 4-byte magic (\0asm) and a version number, followed by sections.

WASM Module Structure:
┌─────────────────────────────────────┐
│ Magic: 0x00 0x61 0x73 0x6D (\0asm) │
│ Version: 0x01 0x00 0x00 0x00       │
├─────────────────────────────────────┤
│ Type Section     │ function signatures │
├─────────────────────────────────────┤
│ Import Section   │ imported functions/memory/globals │
├─────────────────────────────────────┤
│ Function Section │ maps function index → type index │
├─────────────────────────────────────┤
│ Memory Section   │ initial/max page counts │
├─────────────────────────────────────┤
│ Export Section   │ exported function names/indices │
├─────────────────────────────────────┤
│ Code Section     │ function bodies (bytecode) │
├─────────────────────────────────────┤
│ Data Section     │ initial memory contents │
└─────────────────────────────────────┘

Stack-Based Virtual Machine

WASM is a stack machine: instructions implicitly pop operands from and push results onto a virtual value stack. There are no named registers (though modern compilers map WASM values to machine registers during compilation).

;; WebAssembly text format (WAT) for: (a + b) * 2
local.get $a     ;; push a
local.get $b     ;; push b
i32.add          ;; pop a, b; push a+b
i32.const 2      ;; push 2
i32.mul          ;; pop (a+b), 2; push result

Type System

WASM has four numeric types: i32, i64, f32, f64. WASM 2.0 added v128 (SIMD vector). Reference types (funcref, externref) enable interaction with host objects. There are no pointer types; memory access is via byte offsets into linear memory.

WASM is statically typed at the bytecode level — all function signatures are declared in the type section and all stack operations are type-checked at decode time. This is fundamentally different from JavaScript and enables fast validation (a module can be decoded and type-checked in a single linear pass).

Sandboxing Properties

WASM's security model derives from several design decisions:

  1. Linear memory isolation: WASM code can only access its own linear memory region. All memory reads/writes are bounds-checked at compile time (static offset + dynamic base). Out-of-bounds access traps (throws a WASM trap exception). No pointer escapes outside.

  2. No undefined behavior: Unlike C/C++, integer overflow in WASM is defined (wrapping), integer division by zero traps, and uninitialized local variables are always zero-initialized.

  3. Structured control flow: WASM has no arbitrary goto. Control flow is expressed via structured blocks, loops, and br/br_if/br_table instructions that jump to labeled block boundaries. This makes code validation and analysis tractable.

  4. No syscalls: WASM cannot make system calls directly. All interaction with the host (browser or WASI runtime) goes through explicitly imported functions. The runtime controls exactly what capabilities are exposed.


Compilation Pipeline

Source Code (C, C++, Rust, Go, etc.)
         │
         ▼
 ┌────────────────────┐
 │  Compiler Frontend │  Clang / rustc / Go compiler
 │  (language-specific│  → LLVM IR (or equivalent)
 │   parsing, type    │
 │   checking)        │
 └──────┬─────────────┘
        │ LLVM IR
        ▼
 ┌────────────────────┐
 │  LLVM Backend      │  Optimization passes (vectorization,
 │  (for C/C++:       │  inlining, DCE, loop unrolling)
 │   Emscripten;      │  → WASM binary (.wasm file)
 │   for Rust:        │
 │   wasm-pack/cargo  │
 │   --target wasm32) │
 └──────┬─────────────┘
        │ .wasm binary
        ▼
 ┌────────────────────┐
 │  WASM Runtime      │  Browser: V8 (Liftoff + TurboFan),
 │  (compile to       │  SpiderMonkey, JavaScriptCore
 │   native code)     │  Non-browser: Wasmtime, Wasmer
 └──────┬─────────────┘
        │ x86-64 / ARM64 machine code
        ▼
   Execution (near-native speed)

Emscripten (C/C++)

Emscripten is a complete compiler toolchain: Clang frontend + LLVM backend targeting WASM, plus a runtime library that emulates POSIX APIs (file I/O via virtual filesystem, pthreads via SharedArrayBuffer + Atomics). It compiles C/C++ projects by replacing:

gcc main.c -o main           # native
emcc main.c -o main.js       # WASM + JS glue
emcc main.c -o main.wasm     # WASM only (WASI)

wasm-pack (Rust)

Rust compiles directly to WASM with the wasm32-unknown-unknown target. The wasm-pack tool handles: compilation, wasm-bindgen code generation (for JavaScript bindings), npm packaging, and testing:

wasm-pack build --target web   # produces pkg/ directory with .wasm + .js

The wasm-bindgen macro generates JavaScript bindings for Rust types:

use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn add(a: u32, b: u32) -> u32 { a + b }

WASM in the Browser

When a browser loads a .wasm file:

  1. Streaming compilation: V8 compiles WASM while it downloads — the binary format is designed for single-pass streaming compilation. WebAssembly.instantiateStreaming() uses this.

  2. Liftoff (baseline): V8's WASM baseline compiler produces machine code quickly with minimal optimization. Code is ready to execute within milliseconds.

  3. TurboFan (optimizing): In the background, TurboFan applies heavier optimizations and replaces Liftoff code. Unlike JavaScript, WASM doesn't need profiling — all types are statically known — so TurboFan can optimize aggressively immediately.

  4. Execution: The compiled machine code runs in the WASM linear memory region. All linear memory accesses are bounds-checked against the memory limits.

Browser WASM Execution:
  Download .wasm
       │
       ├──► Liftoff compile (fast)  ──► Execute immediately
       │
       └──► TurboFan compile (background) ──► Swap in optimized code

JavaScript ↔ WASM Interop

WASM and JavaScript share the same heap (sort of: WASM has linear memory, JS has the V8 heap). Passing complex data requires copying:

const { memory, processData } = wasmModule.exports;

// Write data into WASM linear memory:
const uint8 = new Uint8Array(memory.buffer);
uint8.set(myData, offset);

// Call WASM function with pointer:
const result = processData(offset, myData.length);

This copy overhead limits WASM performance for workloads that exchange large data structures with JavaScript frequently.


WASM Linear Memory Model

WASM memory is a contiguous array of bytes. The initial size and maximum size are declared in the memory section (in 64KB pages). JavaScript can access WASM memory via ArrayBuffer:

WASM Linear Memory (4GB max addressable with 32-bit):
┌──────────────────────────────────────────────────────────┐
│ 0x0000 │ Stack (grows down from near top)                │
│        │                                                  │
│        │  ↓                                              │
│        │                    (unused)                      │
│        │  ↑                                              │
│        │ Heap (malloc/free — via dlmalloc in Emscripten) │
│ 0x0400 │ Global data (.data / .bss)                      │
│ 0x0000 │ [Reserved / NULL guard page in some runtimes]   │
└──────────────────────────────────────────────────────────┘

Bounds checking: every i32.load/i32.store instruction checks addr + offset <= mem.size. On x86-64, V8 can use signal-based bounds checking: map a guard region after WASM memory and let SIGSEGV serve as the trap — eliminating the explicit check instruction for most accesses.


WASM vs JavaScript Performance

WASM is not universally faster than JavaScript. Performance depends on the workload:

Workload WASM advantage Reason
Numeric computation (codecs, crypto, physics) 2–5× faster No GC, typed memory, SIMD
Image/video processing 3–10× faster SIMD (v128), no boxing
DOM manipulation Comparable or slower Must cross WASM/JS boundary for every DOM call
Startup time Slower initially Compilation overhead vs JS parse
String-heavy code Comparable String encoding/decoding at boundary
Game logic (value types) 2–4× faster Dense typed arrays vs JS objects

The classic benchmark: porting a compute-intensive C++ library (e.g., zlib, SQLite, OpenCV) to WASM via Emscripten yields near-native performance. Porting a DOM-heavy framework yields little improvement and worse DX.

AutoCAD on the Web: Autodesk ported AutoCAD's C++ rendering engine (millions of lines of code) to WASM. The browser version runs at roughly 60–70% of native performance for rendering.


WASM Outside the Browser: WASI

WebAssembly System Interface (WASI) is a standardized set of syscall-like interfaces for WASM running outside the browser. WASI replaces POSIX, providing:

  • File I/O (with capability-based access control: only explicitly granted directories).
  • Clocks and timers.
  • Random number generation.
  • Process environment.
  • Sockets (WASI-sockets proposal, landing in 2024).

WASI runtimes (Wasmtime, Wasmer, WasmEdge) execute .wasm files compiled with WASI:

# Compile Rust with WASI target:
cargo build --target wasm32-wasi

# Run with Wasmtime:
wasmtime target/wasm32-wasi/debug/myapp.wasm

Serverless and Edge Uses

Cloudflare Workers: Each Worker is a WASM module (or JavaScript). Deployed globally to 300+ PoPs. WASM Workers start in ~5ms (vs ~50–500ms for container cold starts). The WASM sandbox replaces the OS sandbox — no container overhead.

Fastly Compute@Edge: Same model. WASM provides the isolation boundary instead of a VM or container. Enables per-request isolation: different WASM instances per request with no shared state risk.

WasmCloud: WASM as a universal application component model for distributed systems, using the WASM Component Model (WASI 0.2+) for composable services.


WASM Security Properties in Depth

WASM's security model is stronger than native code in several ways:

  1. Memory safety by default: Buffer overflows in WASM do not corrupt metadata — they trap. There is no equivalent of exploiting a heap overflow to overwrite function pointers because WASM function tables are separate from linear memory.

  2. Control flow integrity (CFI) for free: Indirect calls in WASM go through the function table with a type check. You cannot use a funcref of type (i32) -> i32 to call a function of type () -> void. This eliminates a whole class of ROP (Return-Oriented Programming) attacks.

  3. No executable stack: Linear memory is never executed. Code is in the code section, which is not addressable from linear memory. You cannot inject shellcode into WASM's memory and jump to it.

Remaining attack surfaces: - Bugs in the WASM runtime itself (V8, Wasmtime): A JIT bug or memory safety bug in the runtime can escape the WASM sandbox. Wasmtime is written in Rust to minimize this. - Logic bugs in WASM code: Buffer overflows within WASM linear memory can still corrupt application-level data structures. - Spectre in WASM: WASM's high-performance execution (including SIMD and fine-grained timing) makes it a viable Spectre gadget in cross-origin contexts. See COOP/COEP.


Failure Modes

Failure Symptom Cause
WASM trap WebAssembly.RuntimeError: unreachable OOB access, div-by-zero, stack overflow
OOM RangeError: WebAssembly.Memory.grow: Maximum memory size exceeded WASM heap exhausted
Slow instantiation Blocking page load Synchronous WebAssembly.instantiate (use streaming)
High memory use RAM exhaustion on mobile C heap + WASM memory are additive
Import mismatch TypeError: import object field 'X' is not a function JS/WASM interface mismatch

Future Directions

Component Model (WASI 0.2+): Defines a higher-level ABI for WASM components: typed interfaces beyond bytes, composable modules, language-neutral. Enables WASM to become the "universal plugin format."

WASM GC: Allows languages with garbage collectors (Kotlin, Dart, OCaml) to target WASM without embedding their own GC. GC objects live in the host runtime's GC heap instead of WASM linear memory.

Threading: WASM threads use SharedArrayBuffer and Atomics for cross-thread shared memory. Full multi-threading in WASM is now standard (Chrome 74+, Firefox 79+).

Tail Calls: WASM tail call proposal enables proper tail recursion, making functional language compilation more practical.


Exercises

  1. Compile a simple C program (e.g., a prime sieve) to WASM using Emscripten. Compare the execution time in WASM vs native vs JavaScript.
  2. Compile a Rust library to WASM using wasm-pack. Call it from JavaScript. Measure the overhead of passing a 1MB ArrayBuffer across the boundary.
  3. Examine the .wasm binary of a compiled module using wasm2wat (from WABT). Identify the type section, import section, and a function body.
  4. Set up a Wasmtime environment. Compile a Rust WASI program that reads a file. Run it with restricted directory access (demonstrate capability-based security).
  5. Investigate the Spectre risk of SharedArrayBuffer in WASM. Implement a simple Spectre proof-of-concept timing measurement using WASM's memory.grow as a timer source.

References

  • WebAssembly Specification: https://webassembly.github.io/spec/
  • Haas, A. et al. "Bringing the Web up to Speed with WebAssembly." PLDI 2017.
  • Emscripten Documentation: https://emscripten.org/docs/
  • Rust WASM Book: https://rustwasm.github.io/book/
  • WASI Design Document: https://github.com/WebAssembly/WASI/blob/main/docs/WASI-overview.md
  • Cloudflare Workers WASM: https://developers.cloudflare.com/workers/platform/languages/webassembly/
  • Wasmtime Security Model: https://docs.wasmtime.dev/security.html