Async/Await

Technical Overview

Async/await is a language-level syntax for writing asynchronous code that looks like synchronous code. An async function can await an asynchronous operation — suspending the function's execution until the operation completes, without blocking the underlying thread. The calling thread is free to run other async tasks while waiting.

Under the hood, async/await is syntactic sugar over one of two things: a future/promise-based state machine (stackless, as in Rust, C#, JavaScript) or a green thread/goroutine scheduler (as in Go, which provides similar semantics without keywords). The implementation differs substantially, but the programmer experience converges.

Prerequisites

Blocking vs. non-blocking I/O concepts
Event loop architecture (epoll, kqueue, IOCP)
Promise/Future concept
Callback-based async programming (to appreciate what async/await replaces)
Basic coroutine understanding (03-fibers-and-coroutines.md)

Core Concepts

The Problem: Callback Hell

Before async/await, non-blocking I/O in JavaScript looked like this:

// Callback hell: reading a file, parsing it, making an API call
fs.readFile('config.json', 'utf8', function(err, configData) {
    if (err) return handleError(err);

    parseConfig(configData, function(err, config) {
        if (err) return handleError(err);

        fetchFromAPI(config.apiUrl, function(err, data) {
            if (err) return handleError(err);

            saveToDatabase(data, function(err, result) {
                if (err) return handleError(err);

                sendWebhook(result, function(err) {
                    if (err) return handleError(err);
                    console.log('Pipeline complete');
                    // 5 levels of nesting, error handling repeated everywhere
                    // This is "callback hell" / "pyramid of doom"
                });
            });
        });
    });
});

The core problem is inversion of control: you don't call the next step when ready; instead, you hand a callback to the I/O system and it calls you back. Error handling is scattered, cancellation is impossible, and the execution order is invisible from the code structure.

Futures and Promises

The intermediate step between callbacks and async/await was Futures (or Promises in JavaScript):

// Promise chain (better, but still non-sequential)
readFile('config.json')
  .then(configData => parseConfig(configData))
  .then(config => fetchFromAPI(config.apiUrl))
  .then(data => saveToDatabase(data))
  .then(result => sendWebhook(result))
  .then(() => console.log('Pipeline complete'))
  .catch(err => handleError(err));
// Error handling centralized, but still not sequential code flow

Async/Await: Sequential-Looking Async Code

// async/await: sequential-looking, async-behaving
async function pipeline() {
    try {
        const configData = await readFile('config.json');
        const config = await parseConfig(configData);
        const data = await fetchFromAPI(config.apiUrl);
        const result = await saveToDatabase(data);
        await sendWebhook(result);
        console.log('Pipeline complete');
    } catch (err) {
        handleError(err);
    }
}

This is the same async behavior — readFile doesn't block the event loop — but the code reads as sequential. Error handling is natural try/catch. The execution order is obvious.

Event Loop Architecture

The event loop is the scheduler that makes async/await work. Node.js uses libuv; Python uses asyncio's event loop; Rust uses Tokio's runtime.

Event Loop Architecture (Node.js / Python asyncio)
====================================================

Application Code (single thread)
  |
  +-- await readFile()    ← suspends current async function
  |     |
  |     +-- registers callback with event loop
  |     |   (file: read from disk, notify when ready)
  |     |
  |     +-- event loop continues other work:
  |           - other pending async tasks
  |           - already-resolved Promises
  |           - timer callbacks
  |
  +-- [kernel: async I/O in background via epoll/kqueue/IOCP]
  |
  +-- file read completes
  |     |
  |     +-- kernel notifies event loop (via epoll_wait)
  |     +-- event loop resumes the awaiting async function
  |     +-- code continues after 'await readFile()'
  |
  [single thread throughout — no preemption, no race conditions on shared state]

Event Loop Phases (Node.js libuv):
  1. timers         (setTimeout/setInterval callbacks)
  2. I/O callbacks  (I/O error callbacks)
  3. idle/prepare   (internal)
  4. poll           (retrieve new I/O events, execute I/O callbacks)
  5. check          (setImmediate callbacks)
  6. close callbacks

async/await in Python (asyncio)

import asyncio
import aiohttp
import time

async def fetch_url(session, url):
    """Async HTTP fetch — suspends while waiting for network"""
    async with session.get(url) as response:
        return await response.text()

async def fetch_all(urls):
    """Fetch all URLs concurrently"""
    async with aiohttp.ClientSession() as session:
        # Create all fetch coroutines
        tasks = [fetch_url(session, url) for url in urls]
        # Wait for all concurrently
        results = await asyncio.gather(*tasks)
    return results

# Running:
urls = [f"http://example.com/api/{i}" for i in range(100)]

# Sync version: 100 sequential HTTP requests, ~100 * RTT
start = time.time()
# (sequential version would block here for 100 * 100ms = 10 seconds)

# Async version: 100 concurrent HTTP requests, ~1 * RTT
start = time.time()
results = asyncio.run(fetch_all(urls))
elapsed = time.time() - start
# elapsed ≈ 0.1-0.5 seconds (parallel) vs 10 seconds (sequential)

Key Python asyncio concepts: - async def: declares a coroutine function - await: suspends current coroutine, transfers control to event loop - asyncio.gather(): runs multiple coroutines concurrently - asyncio.run(): creates event loop, runs coroutine to completion - Only one coroutine runs at a time (single-threaded event loop)

async/await in JavaScript/Node.js

JavaScript's async model is Promise-based. async/await is syntactic sugar:

// async function returns a Promise
async function fetchUser(id) {
    const response = await fetch(`/api/users/${id}`);
    if (!response.ok) {
        throw new Error(`HTTP error: ${response.status}`);
    }
    return response.json();  // returns Promise<User>
}

// Equivalent promise chain:
function fetchUserPromise(id) {
    return fetch(`/api/users/${id}`)
        .then(response => {
            if (!response.ok) throw new Error(`HTTP error: ${response.status}`);
            return response.json();
        });
}

// Concurrent fetching:
async function fetchMultipleUsers(ids) {
    // Sequential (wrong — doesn't use concurrency):
    const users = [];
    for (const id of ids) {
        users.push(await fetchUser(id));  // each waits for previous
    }

    // Concurrent (correct):
    return Promise.all(ids.map(id => fetchUser(id)));  // all start simultaneously
}

// Error handling:
async function safeOperation() {
    try {
        const result = await riskyOperation();
        return result;
    } catch (err) {
        console.error('Failed:', err);
        return null;
    }
    // 'finally' works too:
    finally {
        await cleanup();  // runs even if thrown
    }
}

Node.js gotcha: await inside a loop creates sequential operations. Use Promise.all() for concurrent operations. This is a common performance bug in Node.js code.

async/await in Rust: Zero-Cost

Rust's async model is fundamentally different: it's zero-cost in that awaiting a future does not allocate memory on the heap for the stack frame. The compiler transforms async functions into state machines at compile time:

use tokio::time::{sleep, Duration};
use reqwest;

// Async function: compiles to a state machine
async fn fetch_url(url: &str) -> Result<String, reqwest::Error> {
    let response = reqwest::get(url).await?;  // await here
    let body = response.text().await?;          // await here
    Ok(body)
}

// The compiler generates approximately:
// enum FetchUrlState {
//     Start { url: String },
//     WaitingForGet { future: GetFuture },
//     WaitingForText { future: TextFuture },
//     Done,
// }
// impl Future for FetchUrlState { ... }

#[tokio::main]  // macro that sets up Tokio runtime
async fn main() {
    // Concurrent fetching:
    let (result1, result2) = tokio::join!(
        fetch_url("https://example.com"),
        fetch_url("https://api.example.org")
    );

    println!("{:?}", result1);
    println!("{:?}", result2);
}

// Real-world Rust async: handling timeouts
use tokio::time::timeout;

async fn fetch_with_timeout(url: &str) -> Result<String, Box<dyn std::error::Error>> {
    let result = timeout(
        Duration::from_secs(5),
        fetch_url(url)
    ).await??;  // outer ? for timeout, inner ? for reqwest
    Ok(result)
}

Rust's zero-cost async: 1. No runtime allocation per async fn call (state machine is inline) 2. The Future trait's poll() method drives the state machine 3. Waker is the mechanism to notify the executor when a future is ready 4. No garbage collector needed — futures are dropped when complete

// Rust Future trait (the low-level interface)
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll, Waker};

// Every async operation implements Future:
impl Future for MyFuture {
    type Output = String;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // Poll the underlying I/O:
        match self.check_io_ready() {
            true  => Poll::Ready("result".to_string()),  // I/O done
            false => {
                // Register waker: when I/O completes, call waker.wake()
                self.register_waker(cx.waker().clone());
                Poll::Pending  // not ready yet, come back later
            }
        }
    }
}

async/await in C++20

#include <coroutine>
#include <future>
#include <asio.hpp>  // Boost.Asio or standalone Asio

// Asio-based async HTTP client with C++20 coroutines
asio::awaitable<std::string> fetch(std::string url) {
    auto executor = co_await asio::this_coro::executor;
    asio::ip::tcp::resolver resolver(executor);

    auto endpoints = co_await resolver.async_resolve(url, "80", asio::use_awaitable);

    asio::ip::tcp::socket socket(executor);
    co_await asio::async_connect(socket, endpoints, asio::use_awaitable);

    std::string request = "GET / HTTP/1.1\r\nHost: " + url + "\r\n\r\n";
    co_await asio::async_write(socket, asio::buffer(request), asio::use_awaitable);

    std::string response;
    co_await asio::async_read(socket, asio::dynamic_buffer(response),
                               asio::use_awaitable);
    co_return response;
}

asio::awaitable<void> main_coro() {
    auto [r1, r2] = co_await (
        fetch("example.com") && fetch("api.example.org")
    );
    std::cout << r1 << "\n" << r2 << "\n";
}

Go: No async/await Keywords

Go achieves the same concurrency benefits through goroutines + blocking syntax. There are no async/await keywords — the runtime handles the "async" part transparently:

// Go: blocking syntax, async runtime behavior
package main

import (
    "fmt"
    "io"
    "net/http"
    "sync"
)

func fetchURL(url string) (string, error) {
    // This LOOKS synchronous. The goroutine blocks here,
    // but the OS thread is parked and runs other goroutines.
    resp, err := http.Get(url)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    body, err := io.ReadAll(resp.Body)
    return string(body), err
}

func fetchConcurrently(urls []string) []string {
    results := make([]string, len(urls))
    var wg sync.WaitGroup

    for i, url := range urls {
        wg.Add(1)
        go func(idx int, u string) {
            defer wg.Done()
            result, _ := fetchURL(u)  // goroutine blocks, thread doesn't
            results[idx] = result
        }(i, url)
    }

    wg.Wait()
    return results
}

Go's approach avoids the "function coloring" problem: you don't need async keywords because all blocking I/O is automatically cooperative. The tradeoff: Go requires a more complex runtime that handles goroutine parking on system calls.

Async vs. Threads Performance

Async I/O vs. Thread-per-connection
=====================================

Thread-per-connection model:
  1 kernel thread = 1 active connection

  Concurrent connections: 10,000
  Threads: 10,000
  Memory: 10,000 × 8MB (stack) = 80 GB virtual, ~200 MB real
  Context switches: ~10,000 per second (significant overhead)
  CPU usage for I/O wait: high (threads sleeping, still scheduled)

Async I/O model (Node.js/asyncio):
  1 event loop thread = 10,000 concurrent connections

  Concurrent connections: 10,000
  Threads: 1 (or small pool)
  Memory: ~10,000 × 1KB (state per connection) = ~10 MB
  Context switches: minimal (no OS scheduling involved)
  CPU usage: high when work available, 0 when idle

Go goroutine model:
  N:M: typically 100-1000 kernel threads for 100,000 goroutines

  Concurrent connections: 100,000
  Goroutines: 100,000
  Kernel threads: GOMAXPROCS (e.g., 8 on 8-core CPU)
  Memory: 100,000 × 8KB = 800 MB (goroutine stacks)
  Context switches: goroutine-level only, not kernel-level

The C10K Problem and Its Solution

Dan Kegel's 1999 article "The C10K Problem" posed the question: how do you handle 10,000 concurrent connections on a single server? At the time, the standard approach (thread-per-connection with blocking I/O) failed above ~2,000 threads due to OS scheduling overhead.

Solutions that emerged: 1. Select/poll loop: Single thread, non-blocking I/O, select() on all fds (limited scalability) 2. epoll (Linux 2.6): O(1) event notification for any number of fds 3. Async/await in application code: Makes epoll-style programming natural

Node.js (2009) made the C10K solution the default: every I/O operation is async, the event loop uses libuv (wrapping epoll/kqueue/IOCP), and async/await makes the programming model ergonomic.

The "C10M problem" (10 million connections) is the modern version, solved by kernel bypass (DPDK) and more efficient runtimes.

Historical Context

C# async/await (2012)

Eric Lippert and Mads Torgersen at Microsoft designed the first async/await syntax in a mainstream language (C# 5.0, 2012). The design choices they made — async modifier on functions, await as an expression, state machine transformation — became the template for all subsequent async/await implementations.

Key insight from the C# design: the transformation should be visible to the programmer (functions are colored async) to make the async boundary explicit, reducing surprise when debugging stack traces.

The Async Takeover of Python Web Frameworks

Django (2019, version 3.0): added async view support FastAPI (2018): async-first, became one of the fastest Python web frameworks Starlette/ASGI (2018): replaced WSGI as the async server interface

The performance argument: FastAPI + uvicorn (ASGI server) can handle ~50,000 req/s on a single core for I/O-bound applications. Equivalent Django (WSGI, threaded) handles ~3,000-5,000 req/s on the same hardware.

Production Examples

Node.js at LinkedIn

LinkedIn's profile service migration from Java servlets to Node.js (2011) is a canonical async/await story. Profile fetch involved multiple concurrent backend API calls: - Java: thread-per-request, blocked waiting for each backend - Node.js: single-threaded event loop, all backend calls concurrent

Result: 10x fewer servers required, faster response times. LinkedIn's engineering blog: "We went from 30 servers down to 3."

Rust + Tokio at Cloudflare

Cloudflare's DNS resolver (1.1.1.1) and their edge proxy infrastructure use Rust + Tokio. Key metrics from their engineering blog: - Memory: Rust async futures use ~10-50KB per connection vs. thread-based systems using 1-8MB per connection - Throughput: single Tokio worker handles ~1 million connections on a server - Latency: p99 DNS response time < 1ms globally

The zero-cost async model means Cloudflare can run on commodity hardware without specialized networking gear.

Debugging Notes

# Python asyncio debugging
import asyncio
import logging

# Enable asyncio debug mode (slow coroutine warnings, etc.)
asyncio.get_event_loop().set_debug(True)
logging.getLogger('asyncio').setLevel(logging.DEBUG)

# Find stalled coroutines:
# Python 3.11+: asyncio.get_event_loop().get_coroutines()
# Or use asyncio.all_tasks() to see what's running:
for task in asyncio.all_tasks():
    print(task.get_name(), task.get_coro())

# asyncio slowness diagnostic:
# WARNING:asyncio:Executing <Task...> took 0.150 seconds
# This means a coroutine ran for 150ms without yielding — blocking event loop

// Node.js: detect event loop lag (blocking operations)
const { monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => {
    // min/max/mean/stddev of event loop delay in nanoseconds
    console.log(`Event loop delay: mean=${h.mean/1e6}ms max=${h.max/1e6}ms`);
    h.reset();
}, 5000);

// Common async bug: sequential awaits where concurrent would work
// SLOW (sequential):
async function slow() {
    const a = await operation1();  // waits for this
    const b = await operation2();  // THEN waits for this
    return [a, b];
}

// FAST (concurrent):
async function fast() {
    const [a, b] = await Promise.all([operation1(), operation2()]);
    return [a, b];
}

// Tokio debugging: console subscriber for async task inspection
// cargo add console-subscriber
#[tokio::main]
async fn main() {
    console_subscriber::init();  // connects to tokio-console
    // Run: tokio-console (separate terminal)
    // Shows: active tasks, their states, blocked durations

    // In code: instrument async functions
    let task = tokio::task::Builder::new()
        .name("my_important_task")
        .spawn(async { /* ... */ });
}

Security Implications

Async Timing Attacks

In a single-threaded async runtime, timing of responses can leak information about private state. Because there's no preemption, a long-running operation in one coroutine doesn't get interrupted — its timing is fully observable by concurrent coroutines in the same runtime.

Mitigation: use constant-time operations for security-sensitive comparisons, and consider running security-critical code in separate threads/processes from the event loop.

Unhandled Promise Rejections

In JavaScript, an unhandled Promise rejection silently disappears (in old Node.js) or terminates the process (new Node.js). This can swallow errors from security-relevant operations (failed authentication checks, failed audit log writes).

// DANGEROUS: error silently ignored
async function checkAuth(token) {
    await validateToken(token);  // throws if invalid — but if not awaited...
}

checkAuth("bad_token");  // NOT awaited — exception disappears!
// Authentication failure is silently ignored

// FIX: always await async calls or handle rejection explicitly
await checkAuth("bad_token");  // propagates exception

Blocking the Event Loop = DoS

A CPU-intensive operation in a single-threaded async runtime blocks ALL async operations:

// VULNERABLE: blocking the event loop
app.get('/search', async (req, res) => {
    const query = req.query.q;
    // If query is crafted to cause catastrophic backtracking in this regex:
    const result = query.match(/^(a+)+$/);  // ReDoS vulnerability
    // This blocks the event loop for seconds/minutes on crafted input
    // ALL other requests are frozen during this time
    res.json(result);
});

This is why CPU-intensive work must be offloaded to worker threads in Node.js:

const { Worker } = require('worker_threads');

app.get('/compute', async (req, res) => {
    const result = await runInWorker(heavyComputation, req.data);
    res.json(result);
});

Performance Implications

Async Overhead

Single async operation overhead (above synchronous equivalent):

Runtime	`await` overhead	Notes
Node.js (V8)	~200-500 ns	V8 Promise microtask overhead
Python asyncio	~1-5 µs	CPython overhead per await
Rust Tokio	~10-50 ns	State machine, minimal overhead
C# .NET 6+	~50-200 ns	Well-optimized async machinery
Go (goroutine park)	~200-500 ns	Goroutine suspend/resume

For I/O-bound work (waiting ms for network), these overheads are negligible. For tight loops with thousands of awaits per second, Rust's ~50ns overhead vs Python's ~5µs is 100x different.

Failure Modes and Real Incidents

The Node.js EventEmitter Memory Leak Pattern

In Node.js, streaming data with async/await can leak memory:

// LEAK: request never released because event listener persists
async function streamData(req, res) {
    const readable = fs.createReadStream('large_file.txt');
    readable.on('data', chunk => res.write(chunk));
    // If client disconnects: 'data' listener keeps readable's reference
    // readable is never garbage collected
    await new Promise(resolve => readable.on('end', resolve));
}

// FIX: use pipeline() which handles cleanup:
const { pipeline } = require('stream/promises');
async function streamDataFixed(req, res) {
    await pipeline(
        fs.createReadStream('large_file.txt'),
        res  // automatically cleans up on completion or error
    );
}

FastAPI Blocking I/O in Async Route

# FastAPI bug: blocking I/O in async route blocks event loop
@app.get("/users/{user_id}")
async def get_user(user_id: int):
    # BUG: psycopg2 (sync) blocks the event loop!
    conn = psycopg2.connect(DATABASE_URL)
    user = conn.execute("SELECT * FROM users WHERE id = %s", (user_id,))
    return user

# FIX: use async database driver (asyncpg, databases library)
@app.get("/users/{user_id}")
async def get_user(user_id: int, db=Depends(get_async_db)):
    user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
    return user

Production consequence: a single slow database query blocks ALL requests in the FastAPI event loop until it completes. This caused several high-profile production incidents where mixing sync database drivers with async frameworks created apparent "hang" behavior under load.

Modern Usage

Node.js: async/await is the standard. All major frameworks (Express, Fastify, NestJS) are async-first.
Python FastAPI: async-first web framework, highest throughput Python web framework
Rust Tokio: standard for async systems programming in Rust (web, databases, networking)
C#/.NET: ASP.NET Core is fully async/await throughout
Swift: Structured concurrency (async let, TaskGroup) added in Swift 5.5

Future Directions

Async Iterators: Async generators (Python async for, JavaScript for await...of, Rust Stream trait) extend async/await to sequences of values. This is the async version of iterators.

Structured Concurrency: Swift's TaskGroup, Java's StructuredTaskScope, Kotlin's structured concurrency — these ensure spawned async tasks are scoped to their parent, preventing leaks. Will become standard practice.

Rust async traits: The async-fn-in-traits feature (stabilizing in Rust 1.x series, 2024) allows async fn in trait definitions without boxing, completing Rust's async ergonomics.

Exercises

Event Loop Visualization: Write a Node.js program that instruments setImmediate, setTimeout, and Promise.resolve() to show exactly what order callbacks execute. Create 10 of each, mix them, and trace the execution order. Verify against the Node.js event loop phase documentation.
Python async Performance: Benchmark sequential vs. asyncio.gather() for 100 network requests to a local test server. Instrument with asyncio.get_event_loop().time() to show the actual time distribution. Identify and fix a sequential-await bug in provided sample code.
Rust Future from Scratch: Implement a minimal Sleep future in Rust that uses std::thread::sleep in a background thread and wakes the executor via Waker. Run it with a custom minimal single-threaded executor (not Tokio). This forces understanding of Poll::Pending, Waker, and the poll contract.
Blocking the Node.js Event Loop: Write a Node.js Express server with a route that performs a CPU-intensive operation. Show the event loop blocking effect (all requests stall) using autocannon. Then fix it using worker_threads. Measure the throughput difference.
Async Error Handling Audit: Take a sample Node.js or Python application (any open-source project from GitHub). Audit all async functions for: unhandled promise rejections, missing await before async calls, and sequential awaits where concurrent would work. Report findings and propose fixes.

References

Kegel, D. "The C10K Problem." http://www.kegel.com/c10k.html, 1999. [The motivating problem]
Tobin-Hochstadt, S. "The State of JavaScript Promises." Blog post, 2015.
Lippert, E. "Async/Await FAQ." Microsoft Blog, 2012. [C# design rationale]
Nystrom, B. "What Color is Your Function?" 2015. https://journal.stuffwithstuff.com/2015/02/26/
Matsakis, N. "Async/Await — The Power of Zero-Cost Abstractions." RustConf 2019.
Cloudflare Blog: "How we built 1.1.1.1." https://blog.cloudflare.com/
Python asyncio documentation: https://docs.python.org/3/library/asyncio.html
Tokio documentation: https://tokio.rs/tokio/tutorial
Node.js event loop documentation: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick