API Design and Gateways

Overview

APIs are the contracts between systems. A well-designed API is stable, predictable, and hard to misuse. A poorly designed API is a liability: it leaks implementation details, creates versioning nightmares, and generates support burden for years. API gateways sit in front of these APIs and provide cross-cutting concerns — authentication, rate limiting, routing, observability — that would otherwise be duplicated across every service.

This document covers the three dominant API styles (REST, GraphQL, gRPC), the role of the API gateway in microservice architectures, and the algorithmic underpinnings of rate limiting. Understanding these deeply is essential for both building APIs and designing the infrastructure around them.

Prerequisites

HTTP/1.1 and HTTP/2 protocol fundamentals
Understanding of TLS/HTTPS
Basic familiarity with JSON and Protocol Buffers
Distributed systems concepts: latency, consistency, idempotency
Authentication concepts: JWT, OAuth 2.0, API keys

Historical Context

REST (Representational State Transfer) was defined by Roy Fielding in his 2000 PhD dissertation as an architectural style for hypermedia systems. Fielding defined REST as a set of constraints on distributed hypermedia systems, not a protocol or specification. The "REST API" of today is often "REST-ish" — it borrows HTTP verbs and status codes but ignores HATEOAS (Hypermedia as the Engine of Application State), which Fielding considered essential.

REST dominated API design throughout the 2000s-2010s, replacing SOAP/XML-RPC for most use cases due to simplicity and alignment with HTTP infrastructure.

GraphQL was developed internally at Facebook starting in 2012 to solve the mobile API problem: mobile apps with limited bandwidth could not afford to over-fetch data, but the server-driven REST model required many round trips. GraphQL was open-sourced in 2015 and rapidly adopted by GitHub (2016), Shopify, and Twitter.

gRPC was open-sourced by Google in 2015. It descends from Google's internal Stubby RPC framework, which handled billions of internal RPCs per second. gRPC's use of Protocol Buffers and HTTP/2 makes it 5-10x more efficient than REST/JSON for internal service communication.

API gateways evolved from hardware load balancers (F5, Citrix NetScaler) in the 2000s, through the ESB (Enterprise Service Bus) era, to cloud-native software gateways. Kong (2015), AWS API Gateway (2015), and Envoy (2016, from Lyft) defined the modern API gateway.

REST API Design Principles

REST is built on six constraints: 1. Client-server separation 2. Statelessness (each request contains all needed information) 3. Cacheability 4. Uniform interface (resources, HTTP verbs, status codes) 5. Layered system (proxies, caches transparent to client) 6. Code on demand (optional: serve executable code)

Resources and HTTP Verbs

  Resources are nouns, not actions.

  BAD (RPC-style):
    POST /getUser?id=123
    POST /createOrder
    POST /updateUserEmail
    POST /deleteProduct?id=456

  GOOD (resource-oriented):
    GET    /users/123              → get user
    POST   /orders                 → create order
    PATCH  /users/123              → partial update (email only)
    DELETE /products/456           → delete product
    PUT    /users/123              → full replacement

  HTTP Verb Semantics:

  GET    — read, safe (no side effects), idempotent
  POST   — create or non-idempotent action
  PUT    — full replace, idempotent (same result if called N times)
  PATCH  — partial update (RFC 7396: merge patch; RFC 6902: JSON patch)
  DELETE — delete, idempotent (delete twice = same result: not found)
  HEAD   — like GET but no body (metadata only)
  OPTIONS — CORS preflight, capability discovery

  Idempotency:
    Idempotent: f(f(x)) = f(x) — same result on repeat calls
    PUT /users/123 with {name: "Alice"} → always results in name=Alice
    DELETE /users/123 → first call deletes, subsequent calls: 404 (or 204)
    POST /orders is NOT idempotent by default → use idempotency keys:
    POST /orders
    Idempotency-Key: <client-generated-UUID>
    (server deduplicates based on key)

HTTP Status Codes

  1xx: Informational
    100 Continue — server agrees to accept request body (for large POSTs)

  2xx: Success
    200 OK — general success (GET, PUT, PATCH)
    201 Created — resource created (POST); include Location header
    202 Accepted — async processing started (job submitted)
    204 No Content — success, no response body (DELETE, PUT with no return)

  3xx: Redirection
    301 Moved Permanently — URL changed; update bookmarks
    302 Found — temporary redirect
    304 Not Modified — conditional GET, client cache is still valid

  4xx: Client Error (caller's fault)
    400 Bad Request — malformed request, validation error
    401 Unauthorized — not authenticated (misleading name; means authn failed)
    403 Forbidden — authenticated but not authorized
    404 Not Found — resource doesn't exist
    405 Method Not Allowed — wrong verb (POST to a read-only endpoint)
    409 Conflict — state conflict (optimistic locking failure, duplicate)
    410 Gone — resource existed but was permanently deleted
    422 Unprocessable Entity — syntactically valid but semantically wrong
    429 Too Many Requests — rate limited; include Retry-After header

  5xx: Server Error (server's fault)
    500 Internal Server Error — generic server crash
    502 Bad Gateway — upstream service failed
    503 Service Unavailable — intentional throttling / maintenance
    504 Gateway Timeout — upstream took too long

Versioning

  Three common approaches:

  1. URI Path versioning (most common, most visible):
     /v1/users
     /v2/users
     + Simple, explicit, easy to route
     - "Permanent" v1 URLs create maintenance burden

  2. Header versioning:
     GET /users
     Accept: application/vnd.myapi.v2+json
     + Clean URLs
     - Harder to test in browser, less visible

  3. Query parameter versioning:
     GET /users?version=2
     + Simple
     - Pollutes query strings, accidental caching issues

  Versioning best practices:
  - Maintain N and N-1 (never N-2) simultaneously
  - Use /v1/ for all stable APIs; never /v0/ in production
  - Breaking changes: new major version
  - Non-breaking additions (new optional fields): no new version needed
  - Sunset old versions: announce 6-12 months in advance, 
    respond with Deprecation: date header

HATEOAS

Rarely implemented in practice but included in strict REST:

GET /orders/123 →
{
  "id": 123,
  "status": "pending",
  "_links": {
    "self":   {"href": "/orders/123"},
    "cancel": {"href": "/orders/123/cancel", "method": "POST"},
    "items":  {"href": "/orders/123/items"}
  }
}
Client discovers available actions from response — no out-of-band API docs needed.

GraphQL

GraphQL solves the over-fetching/under-fetching problem:

  REST over-fetching:

  GET /users/123
  Returns: {id, name, email, phone, address, preferences, 
            last_login, created_at, ...}  (50 fields)
  Mobile app needs: {name, avatar_url}  (2 fields)
  → Wasted bandwidth

  REST under-fetching (N+1 problem):

  GET /users/123 → {id, name, order_ids: [456, 789]}
  GET /orders/456 → {id, items: [...]}
  GET /orders/789 → {id, items: [...]}
  → 3 HTTP round trips for one screen

  GraphQL single query:

  POST /graphql
  {
    "query": "
      query UserWithOrders($userId: ID!) {
        user(id: $userId) {
          name
          avatarUrl
          orders(last: 5) {
            id
            total
            status
          }
        }
      }
    ",
    "variables": {"userId": "123"}
  }

  → ONE request, client specifies exactly what it needs
  → Server returns exactly those fields, no more

N+1 Problem and DataLoader

The N+1 problem: resolving a list of N items where each item requires an additional query:

  Schema:
  type Query {
    posts: [Post!]!
  }
  type Post {
    id: ID!
    title: String!
    author: User!    ← requires fetching user for each post
  }

  Naive resolver (N+1):

  posts resolver: SELECT * FROM posts → 100 posts
  author resolver (runs for EACH post):
    SELECT * FROM users WHERE id = ? → 100 separate queries

  Total: 1 + 100 = 101 queries

  DataLoader solution (batching):

  DataLoader collects all user IDs requested within one event loop tick,
  then issues a single batched query:

  author resolver: loader.load(post.authorId)
  → DataLoader batches: [userId1, userId2, ..., userId100]
  → single query: SELECT * FROM users WHERE id IN (...)

  Total: 1 + 1 = 2 queries

  DataLoader also caches: requesting same user twice → 1 query

GraphQL Considerations

  Subscriptions (real-time):
  subscription {
    orderStatusChanged(orderId: "123") {
      status
      updatedAt
    }
  }
  → WebSocket connection; server pushes updates

  Schema-first development:
  1. Write .graphql schema files (source of truth)
  2. Generate server resolvers (stubs from schema)
  3. Generate client types (type-safe queries)

  Security concerns:
  - Query depth attacks: deeply nested queries → O(n!) resolver calls
    Fix: depth limiting (graphql-depth-limit)
  - Complexity attacks: very wide queries requesting many fields
    Fix: query complexity scoring + max complexity threshold
  - Introspection in production: disable to hide schema from attackers
    Fix: disable __schema and __type in production
  - Always use persisted queries (client sends hash, server looks up query)
    to prevent arbitrary query injection

gRPC

gRPC uses Protocol Buffers (protobuf) for schema and HTTP/2 for transport:

  gRPC vs REST performance:

  REST/JSON:
  { "id": 123, "name": "Alice", "email": "alice@example.com" }
  → ~60 bytes (text, human-readable, no schema enforcement)

  gRPC/Protobuf:
  field 1 (id): varint 123    → 2 bytes
  field 2 (name): len+data    → 7 bytes
  field 3 (email): len+data   → 20 bytes
  → ~29 bytes (binary, ~50% smaller)

  Performance advantages:
  - Binary encoding: 50-80% smaller payload than JSON
  - HTTP/2 multiplexing: multiple concurrent streams on one connection
  - Streaming RPCs: server-push, client-push, bidirectional
  - Code generation: type-safe stubs in 10+ languages from .proto files

  .proto definition:

  syntax = "proto3";
  package users;

  service UserService {
    rpc GetUser (GetUserRequest) returns (User);
    rpc ListUsers (ListUsersRequest) returns (stream User);  // server streaming
    rpc BatchCreateUsers (stream CreateUserRequest)          // client streaming
        returns (BatchCreateResponse);
    rpc Chat (stream Message) returns (stream Message);     // bidirectional
  }

  message User {
    int64 id = 1;
    string name = 2;
    string email = 3;
  }

gRPC streaming:

  Unary RPC:          client sends 1 request → server sends 1 response
  Server streaming:   client sends 1 request → server streams N responses
  Client streaming:   client streams N requests → server sends 1 response
  Bidirectional:      both sides stream concurrently (chat, live sync)

When to use gRPC:

  USE gRPC for:
  - Internal service-to-service communication (microservices)
  - Performance-critical paths (low latency, high throughput)
  - Strongly typed contracts between teams
  - Streaming data (real-time feeds, large file transfers)
  - Polyglot environments (auto-generated clients in Go, Python, Java, etc.)

  AVOID gRPC for:
  - Public APIs (browsers require grpc-web proxy; not native HTTP/2 gRPC)
  - Simple CRUD APIs (REST is easier)
  - Debugging/explorability (binary protobuf is not human-readable)
  - Teams not comfortable with .proto compilation in build pipelines

API Gateway Architecture

The API gateway is the single entry point for all external traffic:

  API Gateway Request Flow:

  Client (Browser / Mobile / Partner)
        │
        │ HTTPS
        ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                      API GATEWAY                             │
  │                                                              │
  │  1. SSL Termination                                          │
  │     → Decrypt TLS, forward HTTP internally                  │
  │                                                              │
  │  2. Authentication                                           │
  │     → Validate JWT / API key / OAuth token                  │
  │     → Reject unauthenticated requests (401)                 │
  │                                                              │
  │  3. Authorization                                            │
  │     → Check token scope / RBAC                              │
  │     → Reject unauthorized requests (403)                    │
  │                                                              │
  │  4. Rate Limiting                                            │
  │     → Token bucket / sliding window per client              │
  │     → Reject over-limit requests (429)                      │
  │                                                              │
  │  5. Request Routing                                          │
  │     → /v1/users → users-service                             │
  │     → /v1/orders → orders-service                           │
  │     → /v2/products → products-service-v2 (canary: 10%)     │
  │                                                              │
  │  6. Protocol Translation                                     │
  │     → External REST → Internal gRPC                         │
  │     → SOAP → REST                                           │
  │                                                              │
  │  7. Request/Response Transformation                          │
  │     → Add/remove headers                                     │
  │     → Rename fields, filter sensitive data from response    │
  │                                                              │
  │  8. Observability Injection                                  │
  │     → Generate request ID (X-Request-ID)                    │
  │     → Add trace context (W3C traceparent header)            │
  │     → Record latency, status codes, upstream metrics        │
  └─────────────────────────────────────────────────────────────┘
        │
        │ HTTP/gRPC (internal)
        ▼
  ┌──────────┐  ┌──────────┐  ┌──────────┐
  │  users-  │  │ orders-  │  │ products-│
  │  service │  │  service │  │  service │
  └──────────┘  └──────────┘  └──────────┘

Rate Limiting Algorithms

Rate limiting protects backends from overload and prevents API abuse. The algorithm choice determines the behavior under burst and steady-state load:

Token Bucket

  Token Bucket:

  Bucket capacity: C tokens (burst size)
  Refill rate: R tokens/second

  State: bucket = C  (starts full)
         last_refill_time = now()

  On each request:
    elapsed = now() - last_refill_time
    bucket = min(C, bucket + elapsed × R)
    last_refill_time = now()

    if bucket >= 1:
      bucket -= 1
      ALLOW
    else:
      DENY (429)

  Example:
  C = 10 tokens (burst: allow 10 requests at once)
  R = 2 tokens/second (steady-state: 2 req/s)

  t=0:   bucket=10, send 10 requests at once → all ALLOWED (burst absorbed)
  t=0:   bucket=0
  t=1s:  bucket=2, send 2 requests → ALLOWED
  t=1s:  bucket=0
  t=10s: bucket=10 (refilled to cap)

  Properties:
  + Allows controlled bursts (good for APIs)
  + Simple implementation
  - Requires atomic CAS or Redis for distributed state

Leaky Bucket

  Leaky Bucket:

  Queue capacity: C requests
  Drain rate: R requests/second (constant output)

  Behavior: requests enter queue; processed at constant rate

  If queue full → DENY

  Example:
  C = 5 (queue size)
  R = 10 req/s (1 request every 100ms)

  Burst of 20 requests arrives:
    5 enter queue → 15 DENIED immediately
    Queue drains at 10/s: request served every 100ms

  Properties:
  + Absolute output rate guarantee (useful for upstream protection)
  + Smooths bursty traffic into steady stream
  - Burst requests are denied immediately, not buffered
  - Adds latency (queue wait time)

Fixed Window Counter

  Fixed Window:

  Windows: [0-60s], [60-120s], [180-240s], ...
  Limit per window: N requests

  Simple: count[window_key] += 1
          if count[window_key] > N: DENY

  Problem — boundary burst:

  Limit: 100 req/min

  t=59s: 100 requests → all allowed (fills window 1)
  t=61s: 100 requests → all allowed (fills window 2)

  In 2 seconds, 200 requests passed! Window boundary exploitable.

  Properties:
  + Simple, cheap (single Redis INCR per request)
  - Boundary burst vulnerability

Sliding Window Log

  Sliding Window Log:

  Maintain sorted set of request timestamps for each client.
  On each request: remove entries older than window_size; count remainder.

  Exact correctness, no boundary burst.

  Problem: O(requests) storage per client — too expensive at scale.

Sliding Window Counter

  Sliding Window Counter (best tradeoff):

  Approximate sliding window using two fixed windows:

  current_window: requests in current minute
  previous_window: requests in previous minute

  overlap_weight = 1 - (elapsed_in_current_window / window_size)

  estimated_count = current_window + previous_window × overlap_weight

  Example:
  Window size: 60s
  Elapsed in current window: 45s  →  overlap_weight = 1 - 0.75 = 0.25
  previous_window: 80 requests
  current_window: 20 requests

  estimated = 20 + 80 × 0.25 = 40 requests

  Properties:
  + O(1) storage per client (two counters)
  + ~99% accurate vs exact sliding window
  + Eliminates boundary burst almost entirely
  - Slight approximation (accepts/rejects near limit may be off by ~1%)

API Gateway Comparison

  +-------------------+----------+----------+----------+----------+
  | Feature           | Kong     | AWS API  | Envoy    | Traefik  |
  |                   |          | Gateway  |          |          |
  +-------------------+----------+----------+----------+----------+
  | Open source       | Yes(CE)  | No       | Yes      | Yes      |
  | Managed service   | Kong Cloud| Yes     | Envoy SaaS| No      |
  | Config format     | Admin API| AWS CLI  | xDS API  | TOML/k8s |
  | Plugin ecosystem  | Rich     | Lambda   | Filters  | Middleware|
  | gRPC support      | Yes      | Yes(1.0) | Native   | Yes      |
  | Service mesh      | No       | No       | Yes(Istio)| No      |
  | Rate limiting     | Plugin   | Built-in | Ext Auth | Plugin   |
  | Auth              | Plugin   | Cognito  | ext_authz| Middleware|
  | Best for          | Mid-size | AWS-only | Mesh+GW  | Kubernetes|
  |                   | orgs     | simple   | expert   | simple   |
  +-------------------+----------+----------+----------+----------+

  Decision guide:

  Choose Envoy/Istio if:
  - You need a full service mesh (mTLS, circuit breaking between services)
  - You have Kubernetes and a dedicated platform team
  - You need fine-grained traffic management (header-based routing, fault injection)

  Choose Kong if:
  - You want a traditional API gateway with rich plugin ecosystem
  - You need DB-backed configuration with Admin API
  - Mixed Kubernetes + VM workloads

  Choose AWS API Gateway if:
  - AWS-only architecture
  - Lambda backends (direct integration)
  - Minimal operational overhead required

  Choose Traefik if:
  - Kubernetes-first, simple setup
  - Automatic service discovery from k8s Ingress/IngressClass
  - Small to medium teams without dedicated platform team

Debugging Notes

# Test API with curl (verbose to see headers):
curl -v -X POST https://api.example.com/v1/orders \
  -H "Authorization: Bearer <JWT>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{"items": [{"id": "prod-1", "qty": 2}]}'

# Inspect rate limit headers in response:
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 45
# X-RateLimit-Reset: 1700000060
# Retry-After: 30  (when 429 returned)

# Test GraphQL endpoint:
curl -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ user(id: \"123\") { name email } }"}'

# gRPC with grpcurl:
grpcurl -plaintext localhost:50051 list   # list services
grpcurl -d '{"id": "123"}' \
  localhost:50051 users.UserService/GetUser

# Check Kong rate limiting state (Redis):
redis-cli keys "kong_rate_limiting*" | head
redis-cli get "kong_rate_limiting:alice:minute"

# Kong Admin API - check routes and plugins:
curl http://kong:8001/routes
curl http://kong:8001/plugins  # list all active plugins

Security Implications

API keys in query parameters (e.g., ?api_key=secret) appear in server logs and browser history. Always use Authorization header.
JWT secrets/private keys must be rotated regularly. Verify alg claim — reject alg: none (allows unsigned tokens). Validate aud (audience) claim to prevent token reuse across services.
Rate limiting must be applied before authentication checks to prevent brute-force credential attacks. An unauthenticated endpoint that is slow to fail (bcrypt) is still abuse-able even with rate limiting at 10 req/s.
API versioning for security: emergency security patches should not wait for the next major version. Use feature flags or silent breaking changes (with careful deprecation) to fix security issues quickly.
CORS headers at the API gateway must be carefully configured: Access-Control-Allow-Origin: * is safe only for public read APIs. Any endpoint that uses cookies or sends sensitive data requires specific origin allowlisting.

Performance Implications

The API gateway is a latency-adding component. Each plugin in the request chain adds 0.1-1ms. Keep critical hot paths to <3 plugins in the chain.
gRPC's HTTP/2 multiplexing eliminates connection overhead for internal services. Replace hundreds of HTTP/1.1 connection pools with a small number of persistent HTTP/2 connections.
GraphQL adds query parsing overhead per request (unless using persisted queries). With persisted queries, the server only parses once and looks up by hash on subsequent requests.
Rate limiting in Redis adds 1-2ms per request (Redis round trip). Use local token bucket for non-critical APIs to eliminate Redis as a hot path.

Modern Usage

BFF (Backend for Frontend): Rather than one API gateway for all clients, each frontend (web, mobile, IoT) has its own BFF that aggregates and tailors backend data. Reduces over-fetching and allows client-specific versioning.
Gateway API (Kubernetes): The official Kubernetes replacement for Ingress, with role-oriented configuration and richer routing (traffic splitting, header matching). Implemented by Envoy Gateway, Nginx, Cilium.
AI API gateways: New category handling LLM API traffic: semantic caching (cache similar prompts), cost accounting per tenant, prompt injection detection.

Future Directions

Corsa / WASM plugins: API gateways executing WebAssembly plugins for sandboxed, polyglot extensibility without process-based plugin isolation overhead.
Federated GraphQL: Apollo Federation and GraphQL Mesh allow composing a single GraphQL graph from multiple microservices, enabling a unified API surface.
HTTP/3 (QUIC): Eliminates TCP head-of-line blocking in the API gateway layer, especially for mobile clients on lossy networks.

Exercises

Design the REST API for a simplified order management system (create order, list orders, get order, cancel order). Define status codes, request/response schemas, and idempotency handling for order creation.
Implement token bucket rate limiting with Redis. Test with concurrent clients that each send 50 requests simultaneously. Verify that exactly the allowed number pass through and the rest get 429.
Build a simple GraphQL server with a N+1 query problem (posts and authors). Verify the N+1 with logging. Fix it using DataLoader batching. Measure the reduction in database queries.
Configure Envoy as a reverse proxy with a rate limit filter backed by a local ratelimit service. Test the 429 behavior when the limit is exceeded.
Benchmark REST/JSON vs gRPC/Protobuf for the same API operation (get user by ID). Measure: payload size, latency under 1000 req/s load, connection count. Document the tradeoffs.

References

Roy Fielding, "Architectural Styles and the Design of Network-based Software Architectures" — PhD dissertation, 2000 (original REST definition)
"GraphQL: A data query language" — Facebook Engineering Blog (2015)
gRPC documentation: grpc.io/docs/
"Designing APIs for the Web" — Mark Nottingham, IETF Blog
"The Four Pillars of API Design" — Kin Lane (API Evangelist)
Kong documentation: docs.konghq.com
Envoy proxy documentation: envoyproxy.io/docs
"Production Ready GraphQL" — Marc-André Giroux (2020, book)
RFC 6902: JSON Patch
RFC 7807: Problem Details for HTTP APIs (standard error response format)
"Rate Limiting Algorithms" — Cloudflare Engineering Blog