Skip to content

Node.js Event Loop

Technical Overview

Node.js is a JavaScript runtime built on three core components: the V8 JavaScript engine (execution and JIT), libuv (cross-platform asynchronous I/O, event loop, thread pool), and a set of Node.js C++ bindings that expose OS capabilities to JavaScript. The architecture enables high-throughput I/O-bound services on a single OS thread without the overhead of thread-per-request context switching, by using the operating system's native async I/O facilities (epoll on Linux, kqueue on macOS/BSD, IOCP on Windows).

The result is that a single Node.js process can handle tens of thousands of concurrent connections — a feat that would require tens of thousands of OS threads under a blocking I/O model, each consuming ~1MB of stack space and generating excessive context-switch overhead.

Prerequisites

  • Understanding of OS async I/O: epoll, kqueue, select
  • JavaScript event-driven programming (callbacks, Promises, async/await)
  • Basic understanding of thread pools and OS threads
  • Familiarity with V8 JIT (see 02-jit-compilation.md for V8 context)

Event Loop Phase Diagram

                  Node.js Event Loop (libuv)
                  +--------------------------+
                  |                          |
    +---------+   |  +--------------------+  |
    | timers  |<--+--| Phase 1: Timers    |  |
    | phase   |   |  | setTimeout()       |  |
    |         |   |  | setInterval()      |  |
    +---------+   |  +--------------------+  |
                  |           |               |
                  |  +--------------------+  |
                  |  | Phase 2: Pending   |  |
                  |  | Callbacks          |  |
                  |  | (deferred I/O errs)|  |
                  |  +--------------------+  |
                  |           |               |
                  |  +--------------------+  |
                  |  | Phase 3: Idle /    |  |
                  |  | Prepare (internal) |  |
                  |  +--------------------+  |
                  |           |               |
                  |  +--------------------+  |
                  |  | Phase 4: Poll      |  |  <-- epoll/kqueue wait
                  |  | I/O callbacks      |  |      for I/O events
                  |  | (network, file via |  |
                  |  | thread pool)       |  |
                  |  +--------------------+  |
                  |           |               |
                  |  +--------------------+  |
                  |  | Phase 5: Check     |  |
                  |  | setImmediate()     |  |
                  |  +--------------------+  |
                  |           |               |
                  |  +--------------------+  |
                  |  | Phase 6: Close     |  |
                  |  | Callbacks          |  |
                  |  | socket.destroy()   |  |
                  |  +--------------------+  |
                  |           |               |
                  |           v               |
                  | nextTick queue + microtask|
                  | queue drained between     |
                  | each phase                |
                  +--------------------------+

Core Content

Node.js Architecture

V8 provides JavaScript execution. It compiles JavaScript to native machine code via the Ignition → TurboFan pipeline. In Node.js, the event loop calls back into V8 to execute JavaScript callbacks.

libuv (libuv.so) is the cross-platform async I/O library. It implements: - The event loop itself - The async file I/O thread pool (4 threads by default, configurable via UV_THREADPOOL_SIZE) - Timer management (setTimeout, setInterval) - DNS resolution (uses the thread pool for dns.lookup) - setImmediate scheduling

Node.js C++ bindings: C++ code (in src/ of the Node.js source) wraps OS APIs and V8 primitives. The bindings register JavaScript-callable functions via V8's C++ API and handle the marshaling between V8 values and native types.

Event Loop Phases in Detail

Phase 1 — Timers: Executes callbacks registered with setTimeout() and setInterval() whose threshold time has passed. Timer callbacks are not guaranteed to fire at exactly the specified delay — they fire on the next loop iteration after the delay has elapsed, subject to other work in the loop.

Phase 2 — Pending Callbacks: Executes I/O callbacks that were deferred to the next loop iteration (error callbacks from the previous iteration, TCP errors, etc.).

Phase 3 — Idle / Prepare: Internal libuv phases, not used by userspace code.

Phase 4 — Poll: The core I/O phase. The event loop: 1. Calculates how long to block in the OS I/O poll (epoll_wait / kevent / GetQueuedCompletionStatus) 2. Blocks until I/O events arrive or the timeout expires (whichever comes first) 3. Executes I/O completion callbacks for network sockets, resolved thread-pool tasks (file I/O), etc. 4. Stays in the poll phase until the callback queue is exhausted or a system-dependent limit is reached

The poll phase is where the process sleeps when there is no work to do, consuming zero CPU. For a Node.js HTTP server with no active requests, the process is blocked here.

Phase 5 — Check: Executes setImmediate() callbacks. setImmediate fires after poll, meaning it runs after I/O callbacks in the current iteration but before timers in the next. This makes it useful for deferring work until after the current I/O handlers complete.

Phase 6 — Close Callbacks: Socket or handle close callbacks (socket.on('close', ...)).

Microtask queues (drain between each phase transition, not just between full iterations): - process.nextTick() queue: drains completely before advancing to the next phase - Promise microtasks (Promise.then): drain after the nextTick queue

This means process.nextTick callbacks can starve the event loop if they recursively schedule more nextTick callbacks.

libuv Thread Pool

Network I/O (TCP, UDP sockets) uses the OS's native non-blocking I/O (epoll/kqueue/IOCP) — no thread pool involved. The kernel tells libuv when a socket is readable/writable; the callback is scheduled in the poll phase.

Thread pool operations (blocking work offloaded to worker threads): - fs module operations (most of fs.readFile, fs.stat, fs.write, etc.) - dns.lookup() (uses getaddrinfo(), which is a blocking C library call) - crypto module operations (pbkdf2, randomFill, scrypt) - User-defined work via worker_threads or napi_create_async_work

Default thread pool size: 4 threads (UV_THREADPOOL_SIZE=4). This means at most 4 blocking file I/O or DNS operations can proceed simultaneously. In applications with heavy fs or dns.lookup usage, this is frequently the bottleneck. Increase with UV_THREADPOOL_SIZE=64 (or up to 128 on newer libuv).

The thread pool interaction with the event loop: 1. JavaScript calls fs.readFile(path, cb) 2. Node.js C++ code submits a work item to the libuv thread pool 3. One of the 4 pool threads executes the blocking open()/read() syscall 4. When complete, the result is placed in the poll phase completion queue 5. The event loop's poll phase picks it up and calls the JavaScript callback

Single-Threaded Performance Model

The single-threaded model provides: - No lock contention: Only one JavaScript execution context runs at a time. JavaScript code cannot have data races. No mutex for shared data structures. - No context switching overhead: A thread-per-connection model with 10,000 connections involves 10,000 OS threads. Each context switch is ~5–10 µs. Under load, context switching can consume 30–50% of CPU time. Node.js has one thread for all 10,000 connections. - Predictable execution order: The event loop's phase model makes callback ordering deterministic within a tick.

The cost: CPU-bound work blocks the event loop for all concurrent connections. A 100ms synchronous computation causes 100ms latency spikes for every other request.

Blocking the Event Loop

The most critical Node.js performance anti-pattern:

// WRONG: Blocks the event loop for all requests
const crypto = require('crypto');
app.get('/hash', (req, res) => {
    // pbkdf2Sync blocks for ~200ms per call
    const hash = crypto.pbkdf2Sync(req.body.password, 'salt', 100000, 64, 'sha256');
    res.send(hash.toString('hex'));
});

// CORRECT: Uses the thread pool
app.get('/hash', (req, res) => {
    crypto.pbkdf2(req.body.password, 'salt', 100000, 64, 'sha256', (err, hash) => {
        res.send(hash.toString('hex'));
    });
});

// ALSO WRONG: Synchronous JSON parsing of large payload blocks event loop
const huge = JSON.parse(fs.readFileSync('10mb.json')); // blocks!

// WRONG: Tight computation loop
app.get('/fib', (req, res) => {
    const n = parseInt(req.query.n);
    res.send(String(fib(n))); // fib(45) = ~5 seconds
});

Detecting event loop lag:

// Measure event loop delay (should be <1ms under no load)
const { monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 10 });
h.enable();
setInterval(() => {
    console.log(`Event loop delay P99: ${h.percentile(99)}ms`);
    h.reset();
}, 5000);

Worker Threads (Node.js 10+)

Worker threads provide true parallel JavaScript execution in Node.js using V8 Isolates — separate JavaScript heaps that can run in parallel OS threads.

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
    const worker = new Worker(__filename);
    worker.on('message', result => console.log('Result:', result));
    worker.postMessage({ n: 45 });
} else {
    parentPort.on('message', ({ n }) => {
        // This runs in a separate OS thread, won't block main thread
        parentPort.postMessage(fib(n));
    });
}

Workers have separate V8 heaps — data is passed by structured clone (copy) or SharedArrayBuffer (shared memory). SharedArrayBuffer requires Atomics for synchronization.

Streams and Backpressure

Node.js streams are the canonical way to handle large data transfers. Backpressure prevents a fast producer from overwhelming a slow consumer:

const readable = fs.createReadStream('largefile.dat');
const writable = fs.createWriteStream('dest.dat');

readable.pipe(writable); // pipe() handles backpressure automatically

// Manual backpressure:
readable.on('data', chunk => {
    const ok = writable.write(chunk);
    if (!ok) {
        readable.pause(); // pause if write buffer is full
        writable.once('drain', () => readable.resume());
    }
});

Without backpressure, writable.write() returns false when its internal buffer exceeds the high-water mark, but a naive producer ignores this and continues calling write(). The buffer grows unboundedly, consuming memory until the process crashes.

Cluster Module

The cluster module forks multiple Node.js processes (one per CPU core), each running a full copy of the application, sharing a TCP/UDP port:

const cluster = require('cluster');
const http = require('http');

if (cluster.isPrimary) {
    for (let i = 0; i < require('os').cpus().length; i++) {
        cluster.fork();
    }
} else {
    http.createServer((req, res) => {
        res.writeHead(200);
        res.end('Hello');
    }).listen(3000);
}

The primary process distributes incoming connections to workers using a round-robin distribution (default on all platforms except Windows). Workers are independent processes with separate V8 heaps — no shared memory. Use Redis or a shared database for cross-worker state.

Node.js Memory Model

Node.js memory has distinct regions: - V8 heap: Holds JavaScript objects, closures, strings. Limited by --max-old-space-size (default: 1.5GB on 64-bit). Managed by V8's GC (Scavenger for young gen, Major GC for old gen). - External/ArrayBuffer memory: Off-V8-heap binary data (Buffer, TypedArray backed by native memory). Counted toward V8's external memory to influence GC timing. - C++ heap (libuv, binding code): Not counted in the V8 heap or process.memoryUsage(). Tracked as process.memoryUsage().external. - RSS vs heap used: process.memoryUsage().heapUsed is V8 live objects; RSS includes code, stack, libuv, and OS-level allocations.

Buffer.alloc allocates directly from the native heap (C++ malloc/new), backed by a SharedArrayBuffer. It does not go through V8 object allocation.

Performance Profiling

# Built-in V8 profiler (sampling profiler)
node --prof app.js
node --prof-process isolate-*.log > profile.txt

# 0x flame graph (best option for event loop analysis)
npx 0x -- node app.js
# Generates an HTML flamegraph

# clinic.js suite
npx clinic doctor -- node app.js   # general diagnostics
npx clinic flame -- node app.js    # CPU flame graph
npx clinic bubbleprof -- node app.js # async operation profiling

Historical Context

Node.js was created by Ryan Dahl in 2009. The key insight was using the browser's JavaScript engine (V8, released by Google in 2008) server-side, coupled with libuv's event loop for non-blocking I/O. libuv was created specifically for Node.js (written by Ryan Dahl and Ben Noordhuis) to provide a unified async I/O abstraction across Linux (epoll), macOS (kqueue), and Windows (IOCP). The "10,000 connections" benchmark that Ryan Dahl demonstrated at JSConf.eu 2009 shocked the backend community accustomed to Apache's thread-per-connection model. Worker threads were added in Node.js 10 (2018) to address the CPU-bound computation gap.

Production Examples

// Health endpoint that measures real event loop health
const { performance, PerformanceObserver } = require('perf_hooks');

let lastMark = performance.now();
setInterval(() => {
    const now = performance.now();
    const lag = now - lastMark - 100; // should be ~0ms over 100ms interval
    if (lag > 50) {
        console.warn(`Event loop lag: ${lag.toFixed(1)}ms`);
        // Alert: something blocked the event loop
    }
    lastMark = now;
}, 100);
# Heap snapshot for memory leak analysis
node --inspect app.js
# In Chrome DevTools: Memory > Heap Snapshot
# Or programmatically:
v8.writeHeapSnapshot('/tmp/heap.heapsnapshot')
# Analyze in Chrome DevTools Memory tab

Debugging Notes

  • --inspect / --inspect-brk enables the Chrome DevTools Protocol debugger; use chrome://inspect or VS Code's built-in Node debugger
  • node --expose-internals allows requiring internal/* modules for deep introspection
  • Memory leaks: heap snapshot comparison (before vs after suspected leak) shows retained objects and their GC roots
  • unhandledRejection events from unhandled Promise rejections are critical to monitor; in production, set --unhandled-rejections=throw to convert them to crashes (better than silent data corruption)
  • Diagnosis of event loop blocking: instrument with async_hooks module to trace async context propagation and find where callbacks are slow

Security Implications

  • Prototype pollution: Merging user input into {} objects can override Object.prototype, affecting all objects in the V8 heap — leads to privilege escalation or sandbox bypass. Use Object.create(null) for safe dictionaries. Lodash <4.17.11 was vulnerable (CVE-2019-10744).
  • ReDoS (Regular Expression Denial of Service): A single-threaded event loop means a catastrophically backtracking regex blocks all connections. Use the safe-regex library or WASM-based regex engines with linear-time guarantees.
  • Path traversal via path.join: path.join('/uploads', req.params.file) does NOT prevent ../../etc/passwd. Use path.resolve and validate the result starts with the expected directory prefix.
  • Child process injection: child_process.exec(userInput) passes the string to /bin/sh — shell injection is trivial. Use child_process.execFile with argument arrays.

Performance Implications

  • Event loop tick overhead: V8 function call overhead is ~1–10 ns; async callback dispatch adds ~1–5 µs of libuv/V8 overhead per async operation
  • process.nextTick is cheaper than Promise.resolve().then() (microtask), which is cheaper than setImmediate(), which is cheaper than setTimeout(fn, 0) — they run in this priority order
  • JSON.parse/stringify blocking: parsing a 10MB JSON payload synchronously in a request handler blocks for ~50ms. Use streaming JSON parsers (e.g., stream-json) for large payloads.
  • Buffer.concat on many small buffers is O(n²) if done in a loop — batch them with Buffer.concat([...buffers]) once.

Failure Modes

  • EventEmitter memory leak warning: MaxListenersExceededWarning when >10 listeners are added without removing them — classic leak in connection pools that forget to remove listeners on cleanup
  • V8 heap OOM: FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory — increase --max-old-space-size or fix the memory leak
  • UV_THREADPOOL_SIZE exhaustion: All 4 (default) thread pool threads are occupied with slow fs operations; dns.lookup() and new fs calls queue indefinitely, causing timeouts. Increase UV_THREADPOOL_SIZE.
  • zombie connections: Connections whose clients have disconnected but the server hasn't noticed (no keepalive timeout). Accumulate until file descriptor limit is hit (EMFILE). Set server.keepAliveTimeout and server.headersTimeout.

Modern Usage

Node.js 22 (current LTS as of 2024) includes native fetch, WebStreams, and --experimental-strip-types for running TypeScript directly. The node:test module provides a built-in test runner. AsyncLocalStorage (Node.js 12+) provides context propagation across async boundaries — the Node.js equivalent of thread-local storage, built on async_hooks.

Deno and Bun are alternative JavaScript runtimes competing with Node.js. Deno uses tokio (Rust async runtime) instead of libuv, enabling a fully async I/O model including file I/O without a thread pool. Bun uses JavaScriptCore (WebKit's engine) and custom bindings targeting startup performance.

Future Directions

  • Single-executable applications (SEA): Node.js 20+ supports bundling a Node.js app into a single executable, bundling the Node.js binary with the application code
  • WASI integration: Running WebAssembly code in Node.js via node:wasi for sandboxed native-speed modules
  • Async context propagation improvements: Reducing the overhead of AsyncLocalStorage on hot paths (currently ~5–10% overhead in some benchmarks)
  • HTTP/3 native support: libuv and Node.js core HTTP/3 (QUIC) support, removing the need for third-party quic modules

Exercises

  1. Write a Node.js HTTP server that deliberately blocks the event loop for 500ms on every 10th request (using a spin loop, not sleep). Use wrk or autocannon to load test it. Observe P50 vs P99 latency behavior and identify the blocking request in the flame graph.
  2. Set UV_THREADPOOL_SIZE=2 and write a server that makes 10 concurrent dns.lookup() calls per request. Load test and observe the latency cliff as the thread pool saturates. Increase to UV_THREADPOOL_SIZE=16 and re-measure.
  3. Demonstrate backpressure: create a readable stream that produces data faster than a writable stream can consume it (use writable._write with a 10ms delay). Show that without backpressure handling, memory grows; with pipe() or manual pause/resume, it stays bounded.
  4. Implement a worker thread pool for CPU-bound tasks. The main thread distributes fibonacci(n) tasks to N worker threads via message passing. Measure throughput vs N workers and compare to a single-threaded implementation.
  5. Use async_hooks to implement request-scoped logging — every log statement from code within a request handler automatically includes the request ID, without passing it explicitly through every function call.

References

  • Ryan Dahl, "Node.js" JSConf.eu 2009 presentation. https://www.youtube.com/watch?v=ztspvPYybIY
  • libuv documentation: https://docs.libuv.org/en/v1.x/design.html
  • Bert Belder, "Everything You Need to Know About Node.js Event Loop." JSConf Asia 2016. https://www.youtube.com/watch?v=PNa9OMajl9s
  • Deepal Jayasekara, "Node.js Event Loop Series." https://blog.insiderattack.net/event-loop-and-the-big-picture-nodejs-event-loop-part-1-1cb67a182810
  • Node.js documentation: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick
  • Clinic.js docs: https://clinicjs.org/documentation/