Skip to content

07 — HTTP Versions

Technical Overview

HTTP has undergone three major version transitions since 1991, each addressing limitations of its predecessor. HTTP/1.0 established the request-response model. HTTP/1.1 added persistence and pipelining but introduced head-of-line blocking. HTTP/2 introduced binary framing and multiplexing over a single TCP connection, solving application-layer HOL blocking but retaining TCP-layer HOL blocking. HTTP/3 moves to QUIC, solving all HOL blocking and enabling 0-RTT connection establishment. Each version reflects both protocol engineering tradeoffs and the deployment realities of the web at that time.


Prerequisites

  • TCP connection lifecycle (see 01-tcp-state-machine.md)
  • QUIC protocol (see 04-quic-protocol.md)
  • TLS handshake (see 06-tls-internals.md)
  • Basic HTTP semantics: methods, headers, status codes

Core Content

HTTP/1.0: One Request Per Connection

HTTP/1.0 (1996) opened a new TCP connection for every request/response pair:

Client                              Server
  TCP connect (1 RTT)
  TLS handshake (1 RTT if HTTPS)
  GET /index.html                 → process
  ← 200 OK                          |
  TCP close                          |
  TCP connect (1 RTT again)          |
  GET /style.css                  → process
  ← 200 OK                          |
  TCP close
  TCP connect (1 RTT again)
  GET /logo.png                   → process
  ← 200 OK
  TCP close

For a page with 50 resources: 50 TCP connections × (1 RTT TCP + 1 RTT TLS + 1 RTT request) = 150 RTTs of latency plus 50× TCP slow start. On a 100ms RTT: 15 seconds just in round trips.

Connection overhead breakdown per resource: - TCP SYN-ACK: 1 RTT - TLS handshake: 1 RTT (TLS 1.3) or 2 RTT (TLS 1.2) - HTTP request: 1 RTT - Total: 3–4 RTTs minimum per resource


HTTP/1.1: Persistent Connections and Pipelining

HTTP/1.1 (RFC 2068/2616, 1997) introduced:

Keep-alive (persistent connections): reuse the TCP connection for multiple requests. The Connection: keep-alive header signals intent; Connection: close signals the last request.

Client                              Server
  TCP connect (1 RTT)
  TLS (1 RTT)
  GET /index.html                  → process
  ← 200 OK
  GET /style.css      (same TCP connection, no new handshake)
  ← 200 OK
  GET /logo.png
  ← 200 OK
  [keep-alive timeout expires]
  TCP close

Pipelining: send multiple requests without waiting for responses. Requests can be transmitted back-to-back; the server responds in order.

Head-of-line (HOL) blocking in HTTP/1.1: responses must be returned in request order. If request 1 is slow to process, requests 2 and 3 wait — even if they're already done on the server:

Client            Server
  GET /slow    ─→
  GET /fast    ─→
  GET /tiny    ─→
               ← [processing /slow...]
  (blocked)    ← 200 /slow response  (takes 500ms)
               ← 200 /fast response  (was done in 5ms, held for 500ms)
               ← 200 /tiny response  (was done in 1ms, held for 500ms)

Browser workaround: browsers open 6–8 parallel TCP connections per origin, effectively multiplexing requests across connections. This partially mitigates HOL blocking but multiplies connection overhead.

Connection: keep-alive            # HTTP/1.1 header (often default)
Connection: close                 # override: close after response
Keep-Alive: timeout=65, max=100   # parameters

Transfer-Encoding: chunked: allows streaming responses without known Content-Length — the server sends chunks with sizes, useful for server-side rendering and streaming APIs.


HTTP/2: Binary Framing and Multiplexing

HTTP/2 (RFC 7540, 2015) was developed from Google's SPDY protocol. Key changes:

Binary framing: HTTP/1.1 is text-based (human-readable headers, CRLF delimiters). HTTP/2 uses binary frames with type, flags, stream ID, and payload — more efficient to parse, less ambiguous.

Streams and multiplexing: HTTP/2 introduces the concept of streams — numbered, bidirectional byte sequences within a single TCP connection. Multiple streams exist concurrently; frames from different streams are interleaved:

Single TCP connection with 3 concurrent HTTP/2 streams:

Frame 1: [Stream 1 HEADERS: GET /index.html]
Frame 2: [Stream 3 HEADERS: GET /style.css]
Frame 5: [Stream 5 HEADERS: GET /logo.png]
Frame 6: [Stream 1 DATA: response headers]
Frame 7: [Stream 3 DATA: response headers]
Frame 8: [Stream 3 DATA: CSS content]
Frame 9: [Stream 5 DATA: PNG content]
Frame 10: [Stream 1 DATA: HTML content - last fragment]
...

Stream 3's response completes before Stream 1,
no waiting — delivered to application immediately.

This eliminates application-layer HOL blocking. A slow request no longer holds up fast requests.

HTTP/2 frame types:

Frame Purpose
HEADERS Request/response headers (compressed)
DATA Request/response body
SETTINGS Connection parameters
WINDOW_UPDATE Flow control (per-stream and connection-level)
RST_STREAM Cancel a stream
PUSH_PROMISE Server push announcement
PING Keepalive / RTT measurement
GOAWAY Graceful connection shutdown

HPACK header compression: HTTP/2 uses HPACK (RFC 7541) to compress headers: - Static table: 61 common headers with predefined codes (:method: GET = 2, :status: 200 = 8) - Dynamic table: per-connection table of previously seen header fields; new headers are added and referenced by index - Huffman encoding: compress literal values with a fixed Huffman code

HPACK achieves 60–80% header compression vs HTTP/1.1 for typical web traffic — headers like User-Agent and Cookie that repeat across requests are sent once, then referenced by index.

HTTP/2 flow control: operates at two levels: - Stream-level: WINDOW_UPDATE frame per stream limits bytes a sender can send per stream - Connection-level: total bytes in flight across all streams on the connection

Default initial window size: 65,535 bytes (too small for high-throughput connections). Tune:

nginx: http2_recv_buffer_size, http2_chunk_size
go: http2.Transport{} — InitialWindowSize

TCP HOL Blocking in HTTP/2

HTTP/2 eliminates application-layer HOL blocking but not TCP-layer HOL blocking:

HTTP/2 streams over TCP on a lossy network:

TCP segment carrying Stream 3 DATA (fast CSS) gets lost.

TCP receiver buffer:
  [Stream 1 HEADERS received]
  [Stream 3 HEADERS received]
  [Stream 1 DATA received]
  [TCP HOLE: Stream 3 DATA missing] ← TCP holds everything after this
  [Stream 5 DATA received but queued]
  [Stream 3 DATA more received but queued]
  [Stream 1 DATA more received but queued]

HTTP/2 cannot process ANY frames past the hole,
even though Streams 1 and 5's data arrived successfully.

On a network with 1% packet loss, TCP HOL blocking causes ~8 streams to be blocked for each lost packet (on average). This is why HTTP/3/QUIC was needed.


HTTP/2 Multiplexing vs HTTP/1.1 Parallel Connections

HTTP/1.1 (6 parallel connections):         HTTP/2 (1 connection, 6 streams):
=========================================   ===================================

Conn1  ─ GET /a ─────────────────→          Stream1  ─ GET /a ──────────────→
Conn2  ─ GET /b ──────────────────→         Stream3  ─ GET /b ──────────────→
Conn3  ─ GET /c ───────────────────→        Stream5  ─ GET /c ─────────────→
Conn4  ─ GET /d ────────────────────→       Stream7  ─ GET /d ────────────→
Conn5  ─ GET /e ─────────────────────→      Stream9  ─ GET /e ───────────→
Conn6  ─ GET /f ──────────────────────→     Stream11 ─ GET /f ──────────→

6 × TCP handshake overhead                  1 × TCP handshake overhead
6 × TLS handshake overhead                  1 × TLS handshake overhead
6 × slow start (independent)                1 × slow start (shared cwnd)
6 × congestion windows                      1 × congestion window

H2 advantage: less overhead, fairer         H2 disadvantage: if connection
bandwidth sharing, better header            is lost/slow, all streams affected
compression

HPACK vs QPACK: Header Compression

HPACK (HTTP/2): maintains a dynamic header table whose state is synchronized between client and server. Because HTTP/2 is over TCP, header table updates are delivered reliably and in order. The sender can reference table entries immediately after adding them.

QPACK (HTTP/3, RFC 9204): designed for QUIC's out-of-order delivery. QUIC streams are independent; HEADERS on stream 5 may arrive before a dynamic table update carried on stream 3. QPACK solves this with:

  • Static table: 99 entries (larger than HPACK's 61)
  • Required Insert Count: headers that reference dynamic table entries include a "required insert count" — the decoder blocks only that stream until the referenced entries arrive on the encoder stream

QPACK maintains two unidirectional control streams (encoder stream and decoder stream) separate from HTTP streams, ensuring header table updates are applied in order.

QPACK dynamic table update is decoupled from HTTP streams:

QUIC Encoder Stream:   TABLE_UPDATE(new_entry) ──────────────────→
QUIC Stream 5:         HEADERS(required_count=1, ref entry 0) ──→
QUIC Stream 7:         HEADERS(no dynamic refs) ────────────────→

If Stream 5 arrives before encoder stream, QPACK
blocks only Stream 5 until encoder stream delivers entry.
Stream 7 proceeds immediately (no dynamic refs).

Server Push: Deployment and Deprecation

HTTP/2 server push allows the server to proactively send resources the client will need before it asks:

Client         Server
  GET /index.html  ─→
               ← PUSH_PROMISE (stream 2: /style.css)
               ← PUSH_PROMISE (stream 4: /logo.png)
               ← HEADERS: 200 /index.html
               ← DATA: HTML content
               ← HEADERS: 200 /style.css (on stream 2)
               ← DATA: CSS content
               ← HEADERS: 200 /logo.png (on stream 4)
               ← DATA: PNG content

Browser can use pushed resources before it even parses the HTML.

Why server push failed in practice: 1. Cache blindness: the server doesn't know what's in the browser cache. Pushing /style.css wastes bandwidth if it's already cached. 2. Prioritization difficulty: pushed resources may displace higher-priority user-initiated requests in the buffer. 3. Cookie handling complexity: pushed resources must match the same security policy as the main resource. 4. Adoption: most web frameworks never implemented push correctly.

Chrome removed HTTP/2 server push in version 106 (2022). The HTTP/2 spec didn't remove it, but it's effectively deprecated in practice. The replacement is the 103 Early Hints status code (RFC 8297) which sends hint headers over the same connection without the complexity of server push.


Performance Comparison: H1 vs H2 vs H3

Metric comparison on a 100ms RTT, 1% loss network:
=========================================================

Metric                   H1.1     H2       H3 (QUIC)
---------------------------------------------------------
Time to first byte       200ms    100ms    50ms (0-RTT: 0ms)
Pages with 50 resources  ~2000ms  ~400ms   ~350ms
Packet loss effect       Per-conn HOL block  Per-stream only
HOL blocking             App+TCP  TCP only  None
Connection overhead      High     Low       Low
Header overhead          High     Low(HPACK) Low(QPACK)
CPU overhead (server)    Low      Medium    Higher

H3 advantage is most pronounced on: - High-loss networks (mobile, satellite): no HOL blocking means individual stream loss doesn't block others - High-RTT networks: 0-RTT eliminates handshake latency for repeat visitors - Unreliable networks: connection migration maintains sessions through IP changes

H2 and H3 have similar performance on reliable, low-latency networks (same datacenter).


HTTP Caching

HTTP caching reduces origin load and improves latency:

Cache-Control directives:

Cache-Control: max-age=3600              # cache for 1 hour
Cache-Control: no-cache                  # revalidate before use
Cache-Control: no-store                  # never cache
Cache-Control: public                    # CDN may cache
Cache-Control: private                   # only browser cache
Cache-Control: stale-while-revalidate=60 # use stale, refresh in background
Cache-Control: immutable                 # never revalidate (versioned assets)

ETags: entity tags enable conditional requests. Server generates opaque ETag from content hash or version:

Server: ETag: "abc123"
Client: If-None-Match: "abc123"
Server: 304 Not Modified  (if content unchanged)
     or 200 OK with new body and new ETag

Vary header: tells CDNs which request headers affect the response — critical for content negotiation:

Vary: Accept-Encoding    # cached gzip and non-gzip separately
Vary: Accept-Language    # cached per language
Vary: Origin             # CORS — cached per origin

Vary: * tells CDNs the response cannot be cached (different per every request header combination).


Historical Context

HTTP/1.0 was documented in RFC 1945 (1996), though the protocol was used from 1991. Tim Berners-Lee and others at CERN designed it to serve hypertext documents — the concept of performance optimization wasn't primary.

HTTP/1.1 (RFC 2616, 1999) was written by Roy Fielding (also of REST fame) and reflects decade of web deployment experience. The persistent connection and pipelining design remain sound; browser vendors' reluctance to enable pipelining by default (due to HOL blocking) is why it was never widely used.

SPDY was Google's 2009 experiment that became the direct precursor to HTTP/2. Most of HTTP/2's design (binary frames, streams, header compression) comes directly from SPDY. Google deployed SPDY to Google.com in 2010; adoption was rapid because Chrome supported it natively.

HTTP/3's standardization (RFC 9114, June 2022) completed the QUIC ecosystem. By mid-2022, HTTP/3 was enabled by default at Cloudflare, Google, and Meta, and all major browsers had enabled it.


Production Examples

nginx HTTP/2 configuration:

server {
    listen 443 ssl http2;

    # HTTP/2 server push (largely deprecated — use Early Hints instead)
    # http2_push /style.css;

    # Increase HTTP/2 stream window for bulk APIs
    http2_recv_buffer_size 256k;

    # Limit concurrent streams (default 128)
    http2_max_concurrent_streams 128;

    # Add Early Hints for main page resources
    location = / {
        add_header Link '</style.css>; rel=preload; as=style';
        return 103;
    }
}

Measuring HTTP version in use:

# Check HTTP version
curl -v --http2 https://example.com 2>&1 | grep HTTP/
curl -v --http3 https://cloudflare.com 2>&1 | grep HTTP/

# Test H2 with nghttp
nghttp -nv https://example.com

# H3 test
quiche-client https://example.com

# Check server headers
curl -sI https://example.com | grep -i 'alt-svc\|x-protocol'

Debugging Notes

# Observe HTTP/2 multiplexing
nghttp -v https://example.com 2>&1 | grep -E 'stream|HEADERS|DATA'

# H2 stream statistics
curl --http2 -w "http_version=%{http_version}\n" https://example.com -o /dev/null -s

# Wireshark HTTP/2 dissection
# Filter: http2 or tcp.port == 443 (then decrypt with NSS keylog)

# Check for HTTP/2 errors (GOAWAY, RST_STREAM)
tshark -r capture.pcap -Y 'http2.type == 7' -T fields -e http2.error_code
# GOAWAY with error codes: ENHANCE_YOUR_CALM, REFUSED_STREAM, INTERNAL_ERROR

# Monitor H2 stream reset events on nginx
grep 'RST_STREAM' /var/log/nginx/error.log

# Check Alt-Svc advertisement (for H3 upgrade)
curl -sI https://example.com | grep alt-svc
# h3=":443"; ma=86400  → upgrade to H3 on port 443, cache for 24h

# Measure TTFB per HTTP version
for v in --http1.1 --http2 --http3; do
    echo -n "$v: "
    curl $v -s -w "TTFB=%{time_starttransfer}s\n" -o /dev/null https://example.com
done

Security Implications

  • HTTP/2 HPACK compression oracle (CRIME, BREACH): if an attacker can inject chosen data adjacent to secrets in compressed streams, they can infer secrets via compression ratio side channel. BREACH (HTTP body compression) is unpatched — disable gzip for responses containing CSRF tokens or session identifiers in the response body.
  • HTTP/2 stream reset attacks (CVE-2023-44487, "HTTP/2 Rapid Reset"): a client sends a stream HEADERS and immediately sends RST_STREAM, causing the server to allocate and immediately free resources. Sending thousands per second created CPU exhaustion at CDN scale. Mitigation: rate-limit RST_STREAM per connection; nginx 1.25.3+ and other servers patched October 2023.
  • HTTP/2 header size limits: large headers (cookies, JWTs) can fill the HPACK dynamic table, causing HPACK_ENCODING_ERROR. Servers should limit max_header_size.
  • HTTP/3 UDP amplification: QUIC Initial packets must be ≥1200 bytes; server responses are limited to 3× client bytes until path validation. This prevents QUIC from amplifying attacks.

Performance Implications

Protocol Best for Worst for
HTTP/1.1 Simple proxies, debugging High-resource pages
HTTP/2 Most production HTTPS Heavily lossy networks
HTTP/3 Mobile, lossy, high-RTT Stable datacenter networks (marginal)

Practical decision: enable HTTP/2 by default. Enable HTTP/3 with graceful fallback for mobile-facing endpoints. Both can be active simultaneously:

listen 443 ssl http2;   # TCP: H1.1 + H2
listen 443 quic;        # UDP: H3
add_header Alt-Svc 'h3=":443"; ma=86400';

Failure Modes and Real Incidents

Incident: HTTP/2 Rapid Reset DDoS (October 2023) Multiple major CDNs (Cloudflare, Google, AWS CloudFront) experienced record-breaking DDoS attacks using HTTP/2 stream reset. A relatively small botnet (~20,000 machines) generated 201 million requests/second against Cloudflare infrastructure — 3× the previous record. The attack exploited the server cost of processing HEADERS + RST_STREAM cycles. Patched with per-connection RST_STREAM rate limiting.

Incident: HTTP/2 connection coalescing causing authentication bypass (2019) Browsers coalesce HTTP/2 connections when multiple origins resolve to the same IP and use the same TLS certificate. Site A and Site B on the same IP with a wildcard cert would share one HTTP/2 connection. A timing attack allowed Site A to read responses destined for Site B on the shared connection. Fixed in browser implementations with stricter coalescing checks.

Failure Mode: gRPC + H2 flow control deadlock gRPC streams can deadlock when both sides have data to send simultaneously and both send/receive windows are exhausted. Fix: increase initial HTTP/2 window size or implement application-level flow control above the transport.


Modern Usage

  • HTTP/3 adoption (2025): ~30% of web traffic, all major browsers enabled, all major CDNs support it
  • gRPC over HTTP/2: gRPC uses HTTP/2 exclusively. For inter-service communication in Kubernetes, gRPC over H2 with mTLS is the dominant pattern (see envoy, Istio)
  • HTTP/2 in APIs: REST APIs on HTTP/2 gain multiplexing benefits when clients issue concurrent requests (GraphQL subscriptions, batch APIs)
  • 103 Early Hints: replacing server push in practice — server sends 103 responses with Link: rel=preload headers before the full 200 response, allowing browsers to preload resources

Future Directions

  • HTTP/3 QPACK improvements: ongoing optimization of QPACK's blocking behavior for large header tables
  • WebTransport: a new API over HTTP/3 QUIC that provides bidirectional streams and datagrams for gaming, real-time communication — replacement for WebSockets + WebRTC data channels
  • HTTP/2 push replacementLink: rel=103-hint and Signed Exchanges (SXG) are exploring push-like semantics without push's complexity
  • Structured Field Values (RFC 8941): standardizes header value syntax, enabling efficient HPACK/QPACK compression and machine parsing of complex header values

Exercises

  1. Use nghttp2 to trace an HTTP/2 connection to a real server (nghttp -nv https://www.google.com). Count the number of streams used, identify PUSH_PROMISE frames (if any), and measure the ratio of HEADERS vs DATA frames by byte volume.

  2. Implement a test that demonstrates HTTP/2 application-layer multiplexing advantage: serve 50 resources concurrently, with one resource taking 500ms. Compare P50 and P99 page load times for HTTP/1.1 (6 parallel connections) vs HTTP/2 (1 connection, 50 streams). Use tc netem with 0% loss first, then 2% loss.

  3. Measure the impact of HPACK dynamic table size on HTTP/2 header compression ratio. Set max_header_table_size to 0 (disable dynamic table), 4096 (default), and 65536. Compare the compressed header sizes for 1000 requests with typical web headers (User-Agent, Cookie, Accept-Encoding).

  4. Reproduce the Vary: Accept-Encoding caching behavior: configure nginx to serve gzip-compressed responses with Vary: Accept-Encoding. Use curl with and without Accept-Encoding: gzip to verify CDN caches them separately. Observe cache HIT/MISS behavior using response headers.

  5. Simulate the HTTP/2 Rapid Reset attack pattern in a safe local environment: write a script that opens an HTTP/2 connection and sends 1000 HEADERS+RST_STREAM pairs per second. Observe server CPU impact. Then implement a RST_STREAM rate limiter (one per 100ms) and measure its effect.


References

  • RFC 9110 — HTTP Semantics (2022 update)
  • RFC 9112 — HTTP/1.1
  • RFC 9113 — HTTP/2 (2022 update, supersedes 7540)
  • RFC 9114 — HTTP/3
  • RFC 9204 — QPACK: Field Compression for HTTP/3
  • RFC 7541 — HPACK: Header Compression for HTTP/2
  • RFC 8297 — 103 Early Hints
  • Belshe, M. et al. SPDY Protocol. Google Technical Report, 2012.
  • Grigorik, I. High-Performance Browser Networking. O'Reilly, 2013. (Available online at hpbn.co)
  • CVE-2023-44487 — HTTP/2 Rapid Reset Attack
  • Cloudflare blog. The HTTP/2 Rapid Reset Attack Exposed. October 2023.
  • nginx documentation: ngx_http_v2_module, ngx_http_v3_module
  • nghttp2 source and documentation: nghttp2.org