Serverless and FaaS: Lambda Internals, Cold Starts, and Platform Comparison

Overview

Serverless computing is perhaps the most radical change in the operational model of software deployment. In the serverless model, developers provide code; the platform provides execution. No server provisioning, no OS patching, no capacity planning, no idle-resource cost. The billing model — per invocation and per millisecond of execution — eliminates the baseline infrastructure cost that every always-on deployment incurs.

AWS Lambda, the service that popularized FaaS when launched in 2014, runs on a remarkably sophisticated substrate. Each concurrent invocation executes in its own isolated MicroVM (Firecracker), managed by a fleet of worker hosts with carefully tuned lifecycle management. Understanding how Lambda actually works — why cold starts happen, how execution environments are reused, what the actual resource limits are — transforms your ability to design serverless systems that perform predictably.

Prerequisites

Understanding of Linux process model, namespaces, and cgroups
Familiarity with container runtimes and OCI image format
Basic understanding of event-driven programming patterns
Awareness of TCP/TLS connection overhead (for cold start analysis)
JVM startup mechanics (for JVM cold start analysis)

Historical Context

Serverless as a concept predates Lambda. PaaS platforms (Heroku, Google App Engine) had long abstracted server management. But Lambda's innovation was the granularity: not "deploy an application," but "deploy a function that executes in response to an event, in under a second, with millisecond billing, scaling instantly to thousands of concurrent executions."

Lambda launched in November 2014. Within three years, the term "serverless" had been coined and every major cloud provider had a competing FaaS offering. The CNCF Serverless Working Group formalized definitions in 2018. Cloudflare Workers (2017) introduced a different execution model (V8 isolates instead of MicroVMs) that trades stronger isolation for dramatically reduced cold starts (<5ms).

Firecracker: The Lambda Foundation

AWS Lambda executes each concurrent invocation in a separate Firecracker MicroVM. Firecracker is an open-source VMM (Virtual Machine Monitor) written in Rust, developed by AWS specifically for multi-tenant FaaS workloads.

Key Firecracker properties: - Boots a minimal Linux kernel in ~125ms (vs ~1-2 seconds for a full VM) - Memory overhead per MicroVM: ~5MB (vs ~100MB+ for a full QEMU VM) - Exposes only a minimal device set: virtio-net, virtio-blk, vsock, serial - No BIOS emulation, no PCI bus emulation, no USB - Enforces strict jailer (seccomp-bpf + cgroups) to limit MicroVM capabilities - KVM-based: hardware virtualization, not software emulation

Each Lambda execution environment is one Firecracker MicroVM with: - A specific amount of RAM (128MB to 10GB, customer-configured) - vCPU allocation proportional to RAM (1 vCPU at 1769MB RAM, fractional below) - /tmp filesystem (512MB default, configurable up to 10GB with ephemeral storage option) - Immutable root filesystem (Lambda layer and function code) - Network interface (ENI in Lambda's VPC, or in customer's VPC if configured)

Lambda Execution Lifecycle

[Event arrives] → Lambda service checks for available warm execution environment
                       │
           ┌───────────┴─────────────┐
           │ None available           │ Warm environment exists
           ▼                          ▼
      COLD START                WARM INVOCATION
      ──────────                ───────────────
      1. Download code          Skip to step 5
      2. Start Firecracker VM
      3. Boot minimal kernel
      4. Start language runtime
         (Python: ~50ms)
         (Node.js: ~100ms)
         (JVM: 500ms-3s)
         (Go/Rust: <50ms)
      5. Run init code (top-level)    ←── both paths converge here
      6. INIT_REPORT logged
           │
           ▼
      HANDLER EXECUTION
      ─────────────────
      7. Invoke handler function
      8. Return response
      9. Lambda service receives response
           │
           ▼
      POST-INVOCATION
      ───────────────
      10. Environment frozen (CPU suspended)
      11. [Next event] → thawed, back to step 7 (warm)
      12. [~15 minutes idle] → environment destroyed

The freeze/thaw cycle is critical: between invocations, the execution environment exists but its CPU is not running. This is how Lambda achieves sub-millisecond scale-from-zero time for warm invocations while still paying no CPU cost at idle.

Any state stored in global variables or the /tmp filesystem persists across warm invocations of the same execution environment. This is a common source of bugs: database connections opened at init time are reused (good — saves connection overhead), but state from a previous invocation that leaked into global scope affects the next (bad — hard-to-reproduce bugs).

Cold Start Analysis

Cold starts are the primary operational concern for latency-sensitive Lambda functions. A cold start occurs whenever Lambda must provision a new execution environment — at first invocation, when concurrency exceeds the current pool of warm environments, or after extended idle periods.

Cold Start Duration Breakdown (approximate):
──────────────────────────────────────────────────────
│ Firecracker boot     │ ~125ms                       │
│ Kernel init          │ ~50ms                        │
│ Runtime init         │ varies:                      │
│   Python 3.12        │   ~50-150ms                  │
│   Node.js 20         │   ~100-200ms                 │
│   Java 21 (JVM)      │   ~1000-3000ms               │
│   Go 1.21            │   ~30-80ms                   │
│   Rust (custom RT)   │   ~10-50ms                   │
│ Function init code   │ ~50ms-2s (your code)         │
│ VPC attachment (if)  │ ~1-10s (ENI allocation)      │
──────────────────────────────────────────────────────
Total (non-VPC, Go):   ~200-300ms
Total (VPC, Java):     ~3-15s

The JVM cold start problem is severe enough that AWS developed SnapStart specifically for Java Lambda. SnapStart takes a snapshot of the initialized execution environment (after the @SnapStart annotated init phase) and stores it as an Amazon S3-backed snapshot. On cold start, Firecracker restores from snapshot rather than booting fresh. Cold start time drops from 1-3 seconds to ~150ms for Java functions.

VPC-attached Lambda cold starts were historically catastrophic (8-15 seconds for ENI provisioning) until AWS redesigned the VPC integration in 2019. The new model pre-allocates a pool of ENIs in customer VPCs, reducing VPC cold start to under 1 second.

Cold Start Mitigations

Provisioned Concurrency

Pre-initialize a specified number of execution environments that are always warm. These environments are kept in the INIT state (runtime started, init code run) but not frozen. Invocations to provisioned environments have zero cold start.

Cost: you pay for provisioned concurrency time even when idle (~$0.015/GB-hour for provisioned, vs $0.0000166667/GB-second for on-demand). For predictable traffic patterns, schedule Provisioned Concurrency changes with Application Auto Scaling.

Function Architecture Changes

Move initialization code (DB connections, config loading, SDK clients) outside the handler to the init phase. The init phase runs once per execution environment, not per invocation.
Use connection pooling proxies (RDS Proxy for PostgreSQL/MySQL, ElastiCache for Redis) to reduce per-cold-start connection establishment time.
Reduce package size: smaller zip = faster download. Lambda layers cache commonly used packages at the worker level.
Choose runtimes with fast startup: Go, Rust (custom runtime), Node.js in preference to JVM for latency-sensitive endpoints.

Lambda Concurrency Model

Lambda's concurrency model is one-invocation-per-execution-environment. Two simultaneous requests to the same function require two separate execution environments. This is fundamentally different from a traditional web server that handles many concurrent requests in one process.

Lambda Concurrency = Simultaneous in-flight invocations

If 1000 users invoke the function simultaneously:
  → 1000 Firecracker MicroVMs running simultaneously
  → 1000 separate execution environments
  → Each consuming their allocated RAM

Concurrency limits:
  - Default account limit: 1000 concurrent executions per region
  - Reserved concurrency: guarantee N executions for a function
    (also caps the function at N — useful to protect downstream services)
  - Burst limits: 500-3000 initial burst, then +500/minute until limit

Reserved concurrency serves dual purposes: floor (guarantee capacity is available) and ceiling (protect a downstream database that can't handle >100 connections). Setting reserved concurrency = 0 disables the function entirely (useful for emergency circuit breaking).

Lambda Limitations

Resource            Limit
──────────────────────────────────────
Execution timeout   15 minutes
Memory              128MB – 10GB
vCPU                proportional to memory (max 6 vCPU at 10GB)
/tmp storage        512MB – 10GB (10GB requires config)
Package size        50MB (zipped), 250MB (unzipped), 10GB (container image)
Environment vars    4KB total
Payload (sync)      6MB request + 6MB response
Payload (async)     256KB
Layers              5 layers per function
Concurrent execs    1000 per region (default, soft limit)

The 15-minute timeout is the most commonly hit architectural constraint. Long-running batch jobs must be decomposed into smaller units (Step Functions for orchestration, SQS for queuing work).

Egress cost is a hidden Lambda cost: Lambda functions in a customer VPC pay standard VPC data transfer rates. Lambda functions outside a VPC (default) reach AWS services via AWS's internal network (no egress cost) but cannot reach private VPC resources.

Platform Comparison

Platform          Runtime Model        Cold Start     Max Duration  Pricing
──────────────────────────────────────────────────────────────────────────────
AWS Lambda        Firecracker MicroVM  100ms-3s+      15 min        per GB·ms
Google Cloud      gVisor container     200ms-2s+      60 min        per GB·s
  Functions
Azure Functions   WASM/container       200ms-2s+      Unlimited     per exec
  (Consumption)   (host process reuse)  (plan dep.)   (Dedicaed)
Cloudflare        V8 Isolate           <5ms           30 sec        per CPU·ms
  Workers         (no OS, no boot)
Fastly Compute    Wasm (wasmtime)      <1ms           N/A           per req
  @Edge
AWS Lambda@Edge   Firecracker MicroVM  100ms-300ms    5-30 seconds  per req
  (CloudFront)

Cloudflare Workers: V8 Isolates

Cloudflare Workers abandon OS-level isolation (no MicroVM, no container) in favor of V8 JavaScript isolates. Each Worker runs in its own V8 context within a shared V8 heap on a Cloudflare edge node. Isolation is provided by the JavaScript engine's memory model, not hardware virtualization.

Tradeoffs: - Cold start under 5ms (no VM to boot, no kernel to start) - Strict limitations: must write in JavaScript/TypeScript/WebAssembly, no arbitrary binaries - Limited execution time (30s), limited memory (128MB) - Security boundary is V8 isolation, not hardware VM isolation — more attack surface for sandbox escapes - Runs at the edge (200+ PoPs), not in a central region

Use for: edge personalization, A/B testing, request/response transformation, authentication at the edge. Not suitable for: CPU-intensive computation, workloads requiring arbitrary system access, long-running operations.

Google Cloud Run: Containers with FaaS Billing

Cloud Run occupies a middle ground — you provide a Docker container image, Google manages scaling (including scale to zero). It's not pure FaaS (you control the runtime, not just a function), but shares FaaS billing and operational characteristics.

Differentiator: Cloud Run handles multiple concurrent requests per instance (configurable, up to 1000 per container). This reduces cold starts dramatically for moderate traffic and enables efficient connection pooling within a single container.

Use Cases and Architectural Patterns

Event Processing: Lambda + SQS or Lambda + Kinesis for stream processing. Lambda polls the queue/stream and invokes in batches. Failure handling: SQS dead-letter queues, Kinesis bisect-on-error.

Webhooks and API callbacks: Infrequent HTTP callbacks (GitHub webhooks, Stripe events) — perfect for Lambda. No always-on server needed, and traffic spikes are handled automatically.

ETL and Data Processing: S3 event triggers → Lambda → transform → write to destination. Common pattern for log processing, image resizing, data normalization.

Infrastructure Automation: AWS Config rules, CloudWatch Events/EventBridge rules for compliance checking, automated remediation.

Microservice backends: GraphQL/REST API backed by Lambda functions. Works well when request volume is moderate and latency P99 requirements allow for occasional cold starts.

Debugging Notes

Cold start diagnosis: Lambda X-Ray traces include the Init duration separately from execution duration. INIT_START and INIT_REPORT in CloudWatch Logs identify cold starts.
Memory and timeout tuning: always set Lambda memory above your actual RSS usage. Lambda throttles CPU proportionally to memory; under-provisioning memory causes CPU throttling even if you're not near the memory limit. Use Lambda Power Tuning (open source tool) to find the optimal memory/cost balance.
Lambda destination failures: async Lambda invocations that fail after all retry attempts can be routed to an SQS DLQ or SNS topic via Lambda Destinations — don't use DLQ only on the invocation source.
/tmp across warm invocations: if your function writes to /tmp, a subsequent warm invocation on the same environment may find those files. Either clean up explicitly or treat /tmp as a single-invocation scratch space.
Execution environment reuse across function versions: Lambda creates separate execution environment pools per function version (including $LATEST). Rolling deploys create a brief period where both old and new versions serve traffic.

Security Implications

Each Lambda execution environment has a unique IAM role temporary credential (STS AssumeRole). These credentials are injected into the environment via the credentials endpoint at 169.254.170.2. Compromising a Lambda function gives an attacker its IAM role — minimize role permissions.
Lambda SnapStart snapshots include all in-memory state at snapshot time — ensure no secrets, tokens, or sensitive data are in memory during the snapshot phase.
Lambda functions can access the EC2 Instance Metadata Service (IMDS) unless blocked. The IMDS provides Lambda's credentials — an SSRF vulnerability that makes an HTTP request to 169.254.170.2 can steal Lambda's AWS credentials.
VPC Lambda functions can access private resources in your VPC. Ensure VPC Security Groups for Lambda's ENIs follow least-privilege: only allow outbound to specific services on specific ports.
Function code in ECR container images benefits from ECR image signing (cosign / Sigstore) to prevent tampering with the function's code at rest.

Performance Implications

CPU scales with memory. A 128MB Lambda gets 1/14th of a vCPU. A 1769MB Lambda gets exactly 1 vCPU. A 3538MB Lambda gets 2 vCPUs. CPU-bound workloads should be profiled at different memory settings — often increasing memory from 1GB to 2GB halves wall-clock time, making it cost-neutral while reducing latency.
Network throughput scales with memory allocation. 10GB Lambda functions get significantly more network bandwidth than 128MB functions.
Avoid synchronous invocations in serial loops. Each Lambda invocation incurs ~1ms overhead for Lambda service routing. 1000 serial sub-invocations = 1 second of overhead minimum. Prefer batch processing or parallel fan-out with Promise.all / asyncio.gather.

Failure Modes

Concurrency limit throttling: when account concurrency limit is reached, Lambda returns HTTP 429 (TooManyRequestsException) to synchronous callers. Async callers are queued for up to 6 hours. Set reserved concurrency on critical functions to guarantee capacity.
Downstream cascade: Lambda functions that synchronously call another Lambda that times out will themselves eventually time out. Chain timeouts additively. Design with circuit breakers (AWS SDK retry/backoff, or explicit circuit breaker logic).
Payload size exceeded: 6MB synchronous limit. Functions that attempt to return large responses will fail with an unhandled error. Stream responses (Lambda Response Streaming, introduced 2023) bypasses this for HTTP invocations.
Init phase failure: if init code throws an exception, Lambda retries the cold start up to 3 times, then reports an initialization error. All invocations during this period fail — a bad deployment can cause 100% error rate.

Modern Usage

EventBridge Pipes (2022) and EventBridge Scheduler have made serverless orchestration more composable — routing events between sources and targets without custom Lambda glue code. AWS Step Functions Express Workflows support Lambda orchestration at 100K executions/second, replacing hand-rolled state machines.

Lambda Function URLs (2022) provide direct HTTPS endpoints for Lambda without API Gateway, eliminating one abstraction layer for simple HTTP use cases. Response streaming (2023) removes the 6MB response limit for buffered responses.

Future Directions

WebAssembly as a Lambda runtime: WASM's sub-millisecond startup and language-agnostic binary format make it an ideal FaaS substrate. AWS Lambda already supports WASM via custom runtimes; first-class support would bring Cloudflare Workers-style startup times to Lambda's security model.
GPU-backed Lambda for ML inference: currently requires container-based Lambda; dedicated GPU execution environments are the next frontier for serverless ML.
Longer execution times: the 15-minute limit is increasingly constraining for agentic AI workflows. AWS has hinted at extensions via Durable Execution patterns (Step Functions).

Exercises

Build a Lambda function that processes messages from an SQS queue. Implement a dead-letter queue and observe the behavior when your function throws an exception. Verify the DLQ receives the message after the configured retry count.
Use AWS X-Ray to capture a trace of a cold-start Lambda invocation. Identify Init duration vs. execution duration. Implement Provisioned Concurrency and verify cold starts are eliminated in the traces.
Implement the Lambda Power Tuning tool for a CPU-bound function (e.g., image compression). Plot the performance/cost curve across memory settings from 128MB to 3008MB.
Create a Cloudflare Worker that rate-limits requests by IP using the Workers KV store. Measure cold start latency from multiple geographic locations using a latency testing tool.
Compare Lambda cold start times for Python 3.12, Node.js 20, Java 21 (without SnapStart), Java 21 (with SnapStart), and a Go custom runtime using an identical simple HTTP handler. Document the differences and the optimal choice for each latency requirement.

References

AWS re:Invent 2018: "A Serverless Journey: AWS Lambda Under the Hood" (SRV409)
Firecracker: Lightweight Virtualization for Serverless Applications (NSDI 2020) — Agache et al.
Cloudflare Workers: "How Workers Works" (blog.cloudflare.com/how-workers-works)
AWS Lambda Power Tuning: https://github.com/alexcasalboni/aws-lambda-power-tuning
"Serverless in the Wild" (ATC 2020) — Shahrad et al., Microsoft Research trace analysis of Azure Functions
Google Cloud Run documentation: https://cloud.google.com/run/docs
CNCF Serverless Whitepaper v1.0 (2018)
AWS Lambda SnapStart deep-dive: https://aws.amazon.com/blogs/compute/reducing-java-cold-starts-on-aws-lambda-with-snapstart