Async/Await
Technical Overview
Async/await is a language-level syntax for writing asynchronous code that looks like synchronous code. An async function can await an asynchronous operation — suspending the function's execution until the operation completes, without blocking the underlying thread. The calling thread is free to run other async tasks while waiting.
Under the hood, async/await is syntactic sugar over one of two things: a future/promise-based state machine (stackless, as in Rust, C#, JavaScript) or a green thread/goroutine scheduler (as in Go, which provides similar semantics without keywords). The implementation differs substantially, but the programmer experience converges.
Prerequisites
- Blocking vs. non-blocking I/O concepts
- Event loop architecture (epoll, kqueue, IOCP)
- Promise/Future concept
- Callback-based async programming (to appreciate what async/await replaces)
- Basic coroutine understanding (
03-fibers-and-coroutines.md)
Core Concepts
The Problem: Callback Hell
Before async/await, non-blocking I/O in JavaScript looked like this:
// Callback hell: reading a file, parsing it, making an API call
fs.readFile('config.json', 'utf8', function(err, configData) {
if (err) return handleError(err);
parseConfig(configData, function(err, config) {
if (err) return handleError(err);
fetchFromAPI(config.apiUrl, function(err, data) {
if (err) return handleError(err);
saveToDatabase(data, function(err, result) {
if (err) return handleError(err);
sendWebhook(result, function(err) {
if (err) return handleError(err);
console.log('Pipeline complete');
// 5 levels of nesting, error handling repeated everywhere
// This is "callback hell" / "pyramid of doom"
});
});
});
});
});
The core problem is inversion of control: you don't call the next step when ready; instead, you hand a callback to the I/O system and it calls you back. Error handling is scattered, cancellation is impossible, and the execution order is invisible from the code structure.
Futures and Promises
The intermediate step between callbacks and async/await was Futures (or Promises in JavaScript):
// Promise chain (better, but still non-sequential)
readFile('config.json')
.then(configData => parseConfig(configData))
.then(config => fetchFromAPI(config.apiUrl))
.then(data => saveToDatabase(data))
.then(result => sendWebhook(result))
.then(() => console.log('Pipeline complete'))
.catch(err => handleError(err));
// Error handling centralized, but still not sequential code flow
Async/Await: Sequential-Looking Async Code
// async/await: sequential-looking, async-behaving
async function pipeline() {
try {
const configData = await readFile('config.json');
const config = await parseConfig(configData);
const data = await fetchFromAPI(config.apiUrl);
const result = await saveToDatabase(data);
await sendWebhook(result);
console.log('Pipeline complete');
} catch (err) {
handleError(err);
}
}
This is the same async behavior — readFile doesn't block the event loop — but the code reads as sequential. Error handling is natural try/catch. The execution order is obvious.
Event Loop Architecture
The event loop is the scheduler that makes async/await work. Node.js uses libuv; Python uses asyncio's event loop; Rust uses Tokio's runtime.
Event Loop Architecture (Node.js / Python asyncio)
====================================================
Application Code (single thread)
|
+-- await readFile() ← suspends current async function
| |
| +-- registers callback with event loop
| | (file: read from disk, notify when ready)
| |
| +-- event loop continues other work:
| - other pending async tasks
| - already-resolved Promises
| - timer callbacks
|
+-- [kernel: async I/O in background via epoll/kqueue/IOCP]
|
+-- file read completes
| |
| +-- kernel notifies event loop (via epoll_wait)
| +-- event loop resumes the awaiting async function
| +-- code continues after 'await readFile()'
|
[single thread throughout — no preemption, no race conditions on shared state]
Event Loop Phases (Node.js libuv):
1. timers (setTimeout/setInterval callbacks)
2. I/O callbacks (I/O error callbacks)
3. idle/prepare (internal)
4. poll (retrieve new I/O events, execute I/O callbacks)
5. check (setImmediate callbacks)
6. close callbacks
async/await in Python (asyncio)
import asyncio
import aiohttp
import time
async def fetch_url(session, url):
"""Async HTTP fetch — suspends while waiting for network"""
async with session.get(url) as response:
return await response.text()
async def fetch_all(urls):
"""Fetch all URLs concurrently"""
async with aiohttp.ClientSession() as session:
# Create all fetch coroutines
tasks = [fetch_url(session, url) for url in urls]
# Wait for all concurrently
results = await asyncio.gather(*tasks)
return results
# Running:
urls = [f"http://example.com/api/{i}" for i in range(100)]
# Sync version: 100 sequential HTTP requests, ~100 * RTT
start = time.time()
# (sequential version would block here for 100 * 100ms = 10 seconds)
# Async version: 100 concurrent HTTP requests, ~1 * RTT
start = time.time()
results = asyncio.run(fetch_all(urls))
elapsed = time.time() - start
# elapsed ≈ 0.1-0.5 seconds (parallel) vs 10 seconds (sequential)
Key Python asyncio concepts:
- async def: declares a coroutine function
- await: suspends current coroutine, transfers control to event loop
- asyncio.gather(): runs multiple coroutines concurrently
- asyncio.run(): creates event loop, runs coroutine to completion
- Only one coroutine runs at a time (single-threaded event loop)
async/await in JavaScript/Node.js
JavaScript's async model is Promise-based. async/await is syntactic sugar:
// async function returns a Promise
async function fetchUser(id) {
const response = await fetch(`/api/users/${id}`);
if (!response.ok) {
throw new Error(`HTTP error: ${response.status}`);
}
return response.json(); // returns Promise<User>
}
// Equivalent promise chain:
function fetchUserPromise(id) {
return fetch(`/api/users/${id}`)
.then(response => {
if (!response.ok) throw new Error(`HTTP error: ${response.status}`);
return response.json();
});
}
// Concurrent fetching:
async function fetchMultipleUsers(ids) {
// Sequential (wrong — doesn't use concurrency):
const users = [];
for (const id of ids) {
users.push(await fetchUser(id)); // each waits for previous
}
// Concurrent (correct):
return Promise.all(ids.map(id => fetchUser(id))); // all start simultaneously
}
// Error handling:
async function safeOperation() {
try {
const result = await riskyOperation();
return result;
} catch (err) {
console.error('Failed:', err);
return null;
}
// 'finally' works too:
finally {
await cleanup(); // runs even if thrown
}
}
Node.js gotcha: await inside a loop creates sequential operations. Use Promise.all() for concurrent operations. This is a common performance bug in Node.js code.
async/await in Rust: Zero-Cost
Rust's async model is fundamentally different: it's zero-cost in that awaiting a future does not allocate memory on the heap for the stack frame. The compiler transforms async functions into state machines at compile time:
use tokio::time::{sleep, Duration};
use reqwest;
// Async function: compiles to a state machine
async fn fetch_url(url: &str) -> Result<String, reqwest::Error> {
let response = reqwest::get(url).await?; // await here
let body = response.text().await?; // await here
Ok(body)
}
// The compiler generates approximately:
// enum FetchUrlState {
// Start { url: String },
// WaitingForGet { future: GetFuture },
// WaitingForText { future: TextFuture },
// Done,
// }
// impl Future for FetchUrlState { ... }
#[tokio::main] // macro that sets up Tokio runtime
async fn main() {
// Concurrent fetching:
let (result1, result2) = tokio::join!(
fetch_url("https://example.com"),
fetch_url("https://api.example.org")
);
println!("{:?}", result1);
println!("{:?}", result2);
}
// Real-world Rust async: handling timeouts
use tokio::time::timeout;
async fn fetch_with_timeout(url: &str) -> Result<String, Box<dyn std::error::Error>> {
let result = timeout(
Duration::from_secs(5),
fetch_url(url)
).await??; // outer ? for timeout, inner ? for reqwest
Ok(result)
}
Rust's zero-cost async:
1. No runtime allocation per async fn call (state machine is inline)
2. The Future trait's poll() method drives the state machine
3. Waker is the mechanism to notify the executor when a future is ready
4. No garbage collector needed — futures are dropped when complete
// Rust Future trait (the low-level interface)
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll, Waker};
// Every async operation implements Future:
impl Future for MyFuture {
type Output = String;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Poll the underlying I/O:
match self.check_io_ready() {
true => Poll::Ready("result".to_string()), // I/O done
false => {
// Register waker: when I/O completes, call waker.wake()
self.register_waker(cx.waker().clone());
Poll::Pending // not ready yet, come back later
}
}
}
}
async/await in C++20
#include <coroutine>
#include <future>
#include <asio.hpp> // Boost.Asio or standalone Asio
// Asio-based async HTTP client with C++20 coroutines
asio::awaitable<std::string> fetch(std::string url) {
auto executor = co_await asio::this_coro::executor;
asio::ip::tcp::resolver resolver(executor);
auto endpoints = co_await resolver.async_resolve(url, "80", asio::use_awaitable);
asio::ip::tcp::socket socket(executor);
co_await asio::async_connect(socket, endpoints, asio::use_awaitable);
std::string request = "GET / HTTP/1.1\r\nHost: " + url + "\r\n\r\n";
co_await asio::async_write(socket, asio::buffer(request), asio::use_awaitable);
std::string response;
co_await asio::async_read(socket, asio::dynamic_buffer(response),
asio::use_awaitable);
co_return response;
}
asio::awaitable<void> main_coro() {
auto [r1, r2] = co_await (
fetch("example.com") && fetch("api.example.org")
);
std::cout << r1 << "\n" << r2 << "\n";
}
Go: No async/await Keywords
Go achieves the same concurrency benefits through goroutines + blocking syntax. There are no async/await keywords — the runtime handles the "async" part transparently:
// Go: blocking syntax, async runtime behavior
package main
import (
"fmt"
"io"
"net/http"
"sync"
)
func fetchURL(url string) (string, error) {
// This LOOKS synchronous. The goroutine blocks here,
// but the OS thread is parked and runs other goroutines.
resp, err := http.Get(url)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
return string(body), err
}
func fetchConcurrently(urls []string) []string {
results := make([]string, len(urls))
var wg sync.WaitGroup
for i, url := range urls {
wg.Add(1)
go func(idx int, u string) {
defer wg.Done()
result, _ := fetchURL(u) // goroutine blocks, thread doesn't
results[idx] = result
}(i, url)
}
wg.Wait()
return results
}
Go's approach avoids the "function coloring" problem: you don't need async keywords because all blocking I/O is automatically cooperative. The tradeoff: Go requires a more complex runtime that handles goroutine parking on system calls.
Async vs. Threads Performance
Async I/O vs. Thread-per-connection
=====================================
Thread-per-connection model:
1 kernel thread = 1 active connection
Concurrent connections: 10,000
Threads: 10,000
Memory: 10,000 × 8MB (stack) = 80 GB virtual, ~200 MB real
Context switches: ~10,000 per second (significant overhead)
CPU usage for I/O wait: high (threads sleeping, still scheduled)
Async I/O model (Node.js/asyncio):
1 event loop thread = 10,000 concurrent connections
Concurrent connections: 10,000
Threads: 1 (or small pool)
Memory: ~10,000 × 1KB (state per connection) = ~10 MB
Context switches: minimal (no OS scheduling involved)
CPU usage: high when work available, 0 when idle
Go goroutine model:
N:M: typically 100-1000 kernel threads for 100,000 goroutines
Concurrent connections: 100,000
Goroutines: 100,000
Kernel threads: GOMAXPROCS (e.g., 8 on 8-core CPU)
Memory: 100,000 × 8KB = 800 MB (goroutine stacks)
Context switches: goroutine-level only, not kernel-level
The C10K Problem and Its Solution
Dan Kegel's 1999 article "The C10K Problem" posed the question: how do you handle 10,000 concurrent connections on a single server? At the time, the standard approach (thread-per-connection with blocking I/O) failed above ~2,000 threads due to OS scheduling overhead.
Solutions that emerged: 1. Select/poll loop: Single thread, non-blocking I/O, select() on all fds (limited scalability) 2. epoll (Linux 2.6): O(1) event notification for any number of fds 3. Async/await in application code: Makes epoll-style programming natural
Node.js (2009) made the C10K solution the default: every I/O operation is async, the event loop uses libuv (wrapping epoll/kqueue/IOCP), and async/await makes the programming model ergonomic.
The "C10M problem" (10 million connections) is the modern version, solved by kernel bypass (DPDK) and more efficient runtimes.
Historical Context
C# async/await (2012)
Eric Lippert and Mads Torgersen at Microsoft designed the first async/await syntax in a mainstream language (C# 5.0, 2012). The design choices they made — async modifier on functions, await as an expression, state machine transformation — became the template for all subsequent async/await implementations.
Key insight from the C# design: the transformation should be visible to the programmer (functions are colored async) to make the async boundary explicit, reducing surprise when debugging stack traces.
The Async Takeover of Python Web Frameworks
Django (2019, version 3.0): added async view support FastAPI (2018): async-first, became one of the fastest Python web frameworks Starlette/ASGI (2018): replaced WSGI as the async server interface
The performance argument: FastAPI + uvicorn (ASGI server) can handle ~50,000 req/s on a single core for I/O-bound applications. Equivalent Django (WSGI, threaded) handles ~3,000-5,000 req/s on the same hardware.
Production Examples
Node.js at LinkedIn
LinkedIn's profile service migration from Java servlets to Node.js (2011) is a canonical async/await story. Profile fetch involved multiple concurrent backend API calls: - Java: thread-per-request, blocked waiting for each backend - Node.js: single-threaded event loop, all backend calls concurrent
Result: 10x fewer servers required, faster response times. LinkedIn's engineering blog: "We went from 30 servers down to 3."
Rust + Tokio at Cloudflare
Cloudflare's DNS resolver (1.1.1.1) and their edge proxy infrastructure use Rust + Tokio. Key metrics from their engineering blog: - Memory: Rust async futures use ~10-50KB per connection vs. thread-based systems using 1-8MB per connection - Throughput: single Tokio worker handles ~1 million connections on a server - Latency: p99 DNS response time < 1ms globally
The zero-cost async model means Cloudflare can run on commodity hardware without specialized networking gear.
Debugging Notes
# Python asyncio debugging
import asyncio
import logging
# Enable asyncio debug mode (slow coroutine warnings, etc.)
asyncio.get_event_loop().set_debug(True)
logging.getLogger('asyncio').setLevel(logging.DEBUG)
# Find stalled coroutines:
# Python 3.11+: asyncio.get_event_loop().get_coroutines()
# Or use asyncio.all_tasks() to see what's running:
for task in asyncio.all_tasks():
print(task.get_name(), task.get_coro())
# asyncio slowness diagnostic:
# WARNING:asyncio:Executing <Task...> took 0.150 seconds
# This means a coroutine ran for 150ms without yielding — blocking event loop
// Node.js: detect event loop lag (blocking operations)
const { monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => {
// min/max/mean/stddev of event loop delay in nanoseconds
console.log(`Event loop delay: mean=${h.mean/1e6}ms max=${h.max/1e6}ms`);
h.reset();
}, 5000);
// Common async bug: sequential awaits where concurrent would work
// SLOW (sequential):
async function slow() {
const a = await operation1(); // waits for this
const b = await operation2(); // THEN waits for this
return [a, b];
}
// FAST (concurrent):
async function fast() {
const [a, b] = await Promise.all([operation1(), operation2()]);
return [a, b];
}
// Tokio debugging: console subscriber for async task inspection
// cargo add console-subscriber
#[tokio::main]
async fn main() {
console_subscriber::init(); // connects to tokio-console
// Run: tokio-console (separate terminal)
// Shows: active tasks, their states, blocked durations
// In code: instrument async functions
let task = tokio::task::Builder::new()
.name("my_important_task")
.spawn(async { /* ... */ });
}
Security Implications
Async Timing Attacks
In a single-threaded async runtime, timing of responses can leak information about private state. Because there's no preemption, a long-running operation in one coroutine doesn't get interrupted — its timing is fully observable by concurrent coroutines in the same runtime.
Mitigation: use constant-time operations for security-sensitive comparisons, and consider running security-critical code in separate threads/processes from the event loop.
Unhandled Promise Rejections
In JavaScript, an unhandled Promise rejection silently disappears (in old Node.js) or terminates the process (new Node.js). This can swallow errors from security-relevant operations (failed authentication checks, failed audit log writes).
// DANGEROUS: error silently ignored
async function checkAuth(token) {
await validateToken(token); // throws if invalid — but if not awaited...
}
checkAuth("bad_token"); // NOT awaited — exception disappears!
// Authentication failure is silently ignored
// FIX: always await async calls or handle rejection explicitly
await checkAuth("bad_token"); // propagates exception
Blocking the Event Loop = DoS
A CPU-intensive operation in a single-threaded async runtime blocks ALL async operations:
// VULNERABLE: blocking the event loop
app.get('/search', async (req, res) => {
const query = req.query.q;
// If query is crafted to cause catastrophic backtracking in this regex:
const result = query.match(/^(a+)+$/); // ReDoS vulnerability
// This blocks the event loop for seconds/minutes on crafted input
// ALL other requests are frozen during this time
res.json(result);
});
This is why CPU-intensive work must be offloaded to worker threads in Node.js:
const { Worker } = require('worker_threads');
app.get('/compute', async (req, res) => {
const result = await runInWorker(heavyComputation, req.data);
res.json(result);
});
Performance Implications
Async Overhead
Single async operation overhead (above synchronous equivalent):
| Runtime | await overhead |
Notes |
|---|---|---|
| Node.js (V8) | ~200-500 ns | V8 Promise microtask overhead |
| Python asyncio | ~1-5 µs | CPython overhead per await |
| Rust Tokio | ~10-50 ns | State machine, minimal overhead |
| C# .NET 6+ | ~50-200 ns | Well-optimized async machinery |
| Go (goroutine park) | ~200-500 ns | Goroutine suspend/resume |
For I/O-bound work (waiting ms for network), these overheads are negligible. For tight loops with thousands of awaits per second, Rust's ~50ns overhead vs Python's ~5µs is 100x different.
Failure Modes and Real Incidents
The Node.js EventEmitter Memory Leak Pattern
In Node.js, streaming data with async/await can leak memory:
// LEAK: request never released because event listener persists
async function streamData(req, res) {
const readable = fs.createReadStream('large_file.txt');
readable.on('data', chunk => res.write(chunk));
// If client disconnects: 'data' listener keeps readable's reference
// readable is never garbage collected
await new Promise(resolve => readable.on('end', resolve));
}
// FIX: use pipeline() which handles cleanup:
const { pipeline } = require('stream/promises');
async function streamDataFixed(req, res) {
await pipeline(
fs.createReadStream('large_file.txt'),
res // automatically cleans up on completion or error
);
}
FastAPI Blocking I/O in Async Route
# FastAPI bug: blocking I/O in async route blocks event loop
@app.get("/users/{user_id}")
async def get_user(user_id: int):
# BUG: psycopg2 (sync) blocks the event loop!
conn = psycopg2.connect(DATABASE_URL)
user = conn.execute("SELECT * FROM users WHERE id = %s", (user_id,))
return user
# FIX: use async database driver (asyncpg, databases library)
@app.get("/users/{user_id}")
async def get_user(user_id: int, db=Depends(get_async_db)):
user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
return user
Production consequence: a single slow database query blocks ALL requests in the FastAPI event loop until it completes. This caused several high-profile production incidents where mixing sync database drivers with async frameworks created apparent "hang" behavior under load.
Modern Usage
- Node.js: async/await is the standard. All major frameworks (Express, Fastify, NestJS) are async-first.
- Python FastAPI: async-first web framework, highest throughput Python web framework
- Rust Tokio: standard for async systems programming in Rust (web, databases, networking)
- C#/.NET: ASP.NET Core is fully async/await throughout
- Swift: Structured concurrency (
async let,TaskGroup) added in Swift 5.5
Future Directions
Async Iterators: Async generators (Python async for, JavaScript for await...of, Rust Stream trait) extend async/await to sequences of values. This is the async version of iterators.
Structured Concurrency: Swift's TaskGroup, Java's StructuredTaskScope, Kotlin's structured concurrency — these ensure spawned async tasks are scoped to their parent, preventing leaks. Will become standard practice.
Rust async traits: The async-fn-in-traits feature (stabilizing in Rust 1.x series, 2024) allows async fn in trait definitions without boxing, completing Rust's async ergonomics.
Exercises
-
Event Loop Visualization: Write a Node.js program that instruments
setImmediate,setTimeout, andPromise.resolve()to show exactly what order callbacks execute. Create 10 of each, mix them, and trace the execution order. Verify against the Node.js event loop phase documentation. -
Python async Performance: Benchmark sequential vs.
asyncio.gather()for 100 network requests to a local test server. Instrument withasyncio.get_event_loop().time()to show the actual time distribution. Identify and fix a sequential-await bug in provided sample code. -
Rust Future from Scratch: Implement a minimal
Sleepfuture in Rust that usesstd::thread::sleepin a background thread and wakes the executor viaWaker. Run it with a custom minimal single-threaded executor (not Tokio). This forces understanding ofPoll::Pending,Waker, and the poll contract. -
Blocking the Node.js Event Loop: Write a Node.js Express server with a route that performs a CPU-intensive operation. Show the event loop blocking effect (all requests stall) using autocannon. Then fix it using
worker_threads. Measure the throughput difference. -
Async Error Handling Audit: Take a sample Node.js or Python application (any open-source project from GitHub). Audit all async functions for: unhandled promise rejections, missing await before async calls, and sequential awaits where concurrent would work. Report findings and propose fixes.
References
- Kegel, D. "The C10K Problem." http://www.kegel.com/c10k.html, 1999. [The motivating problem]
- Tobin-Hochstadt, S. "The State of JavaScript Promises." Blog post, 2015.
- Lippert, E. "Async/Await FAQ." Microsoft Blog, 2012. [C# design rationale]
- Nystrom, B. "What Color is Your Function?" 2015. https://journal.stuffwithstuff.com/2015/02/26/
- Matsakis, N. "Async/Await — The Power of Zero-Cost Abstractions." RustConf 2019.
- Cloudflare Blog: "How we built 1.1.1.1." https://blog.cloudflare.com/
- Python asyncio documentation: https://docs.python.org/3/library/asyncio.html
- Tokio documentation: https://tokio.rs/tokio/tutorial
- Node.js event loop documentation: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick