Serverless and FaaS: Lambda Internals, Cold Starts, and Platform Comparison
Overview
Serverless computing is perhaps the most radical change in the operational model of software deployment. In the serverless model, developers provide code; the platform provides execution. No server provisioning, no OS patching, no capacity planning, no idle-resource cost. The billing model — per invocation and per millisecond of execution — eliminates the baseline infrastructure cost that every always-on deployment incurs.
AWS Lambda, the service that popularized FaaS when launched in 2014, runs on a remarkably sophisticated substrate. Each concurrent invocation executes in its own isolated MicroVM (Firecracker), managed by a fleet of worker hosts with carefully tuned lifecycle management. Understanding how Lambda actually works — why cold starts happen, how execution environments are reused, what the actual resource limits are — transforms your ability to design serverless systems that perform predictably.
Prerequisites
- Understanding of Linux process model, namespaces, and cgroups
- Familiarity with container runtimes and OCI image format
- Basic understanding of event-driven programming patterns
- Awareness of TCP/TLS connection overhead (for cold start analysis)
- JVM startup mechanics (for JVM cold start analysis)
Historical Context
Serverless as a concept predates Lambda. PaaS platforms (Heroku, Google App Engine) had long abstracted server management. But Lambda's innovation was the granularity: not "deploy an application," but "deploy a function that executes in response to an event, in under a second, with millisecond billing, scaling instantly to thousands of concurrent executions."
Lambda launched in November 2014. Within three years, the term "serverless" had been coined and every major cloud provider had a competing FaaS offering. The CNCF Serverless Working Group formalized definitions in 2018. Cloudflare Workers (2017) introduced a different execution model (V8 isolates instead of MicroVMs) that trades stronger isolation for dramatically reduced cold starts (<5ms).
Firecracker: The Lambda Foundation
AWS Lambda executes each concurrent invocation in a separate Firecracker MicroVM. Firecracker is an open-source VMM (Virtual Machine Monitor) written in Rust, developed by AWS specifically for multi-tenant FaaS workloads.
Key Firecracker properties: - Boots a minimal Linux kernel in ~125ms (vs ~1-2 seconds for a full VM) - Memory overhead per MicroVM: ~5MB (vs ~100MB+ for a full QEMU VM) - Exposes only a minimal device set: virtio-net, virtio-blk, vsock, serial - No BIOS emulation, no PCI bus emulation, no USB - Enforces strict jailer (seccomp-bpf + cgroups) to limit MicroVM capabilities - KVM-based: hardware virtualization, not software emulation
Each Lambda execution environment is one Firecracker MicroVM with: - A specific amount of RAM (128MB to 10GB, customer-configured) - vCPU allocation proportional to RAM (1 vCPU at 1769MB RAM, fractional below) - /tmp filesystem (512MB default, configurable up to 10GB with ephemeral storage option) - Immutable root filesystem (Lambda layer and function code) - Network interface (ENI in Lambda's VPC, or in customer's VPC if configured)
Lambda Execution Lifecycle
[Event arrives] → Lambda service checks for available warm execution environment
│
┌───────────┴─────────────┐
│ None available │ Warm environment exists
▼ ▼
COLD START WARM INVOCATION
────────── ───────────────
1. Download code Skip to step 5
2. Start Firecracker VM
3. Boot minimal kernel
4. Start language runtime
(Python: ~50ms)
(Node.js: ~100ms)
(JVM: 500ms-3s)
(Go/Rust: <50ms)
5. Run init code (top-level) ←── both paths converge here
6. INIT_REPORT logged
│
▼
HANDLER EXECUTION
─────────────────
7. Invoke handler function
8. Return response
9. Lambda service receives response
│
▼
POST-INVOCATION
───────────────
10. Environment frozen (CPU suspended)
11. [Next event] → thawed, back to step 7 (warm)
12. [~15 minutes idle] → environment destroyed
The freeze/thaw cycle is critical: between invocations, the execution environment exists but its CPU is not running. This is how Lambda achieves sub-millisecond scale-from-zero time for warm invocations while still paying no CPU cost at idle.
Any state stored in global variables or the /tmp filesystem persists across warm invocations of the same execution environment. This is a common source of bugs: database connections opened at init time are reused (good — saves connection overhead), but state from a previous invocation that leaked into global scope affects the next (bad — hard-to-reproduce bugs).
Cold Start Analysis
Cold starts are the primary operational concern for latency-sensitive Lambda functions. A cold start occurs whenever Lambda must provision a new execution environment — at first invocation, when concurrency exceeds the current pool of warm environments, or after extended idle periods.
Cold Start Duration Breakdown (approximate):
──────────────────────────────────────────────────────
│ Firecracker boot │ ~125ms │
│ Kernel init │ ~50ms │
│ Runtime init │ varies: │
│ Python 3.12 │ ~50-150ms │
│ Node.js 20 │ ~100-200ms │
│ Java 21 (JVM) │ ~1000-3000ms │
│ Go 1.21 │ ~30-80ms │
│ Rust (custom RT) │ ~10-50ms │
│ Function init code │ ~50ms-2s (your code) │
│ VPC attachment (if) │ ~1-10s (ENI allocation) │
──────────────────────────────────────────────────────
Total (non-VPC, Go): ~200-300ms
Total (VPC, Java): ~3-15s
The JVM cold start problem is severe enough that AWS developed SnapStart specifically for Java Lambda. SnapStart takes a snapshot of the initialized execution environment (after the @SnapStart annotated init phase) and stores it as an Amazon S3-backed snapshot. On cold start, Firecracker restores from snapshot rather than booting fresh. Cold start time drops from 1-3 seconds to ~150ms for Java functions.
VPC-attached Lambda cold starts were historically catastrophic (8-15 seconds for ENI provisioning) until AWS redesigned the VPC integration in 2019. The new model pre-allocates a pool of ENIs in customer VPCs, reducing VPC cold start to under 1 second.
Cold Start Mitigations
Provisioned Concurrency
Pre-initialize a specified number of execution environments that are always warm. These environments are kept in the INIT state (runtime started, init code run) but not frozen. Invocations to provisioned environments have zero cold start.
Cost: you pay for provisioned concurrency time even when idle (~$0.015/GB-hour for provisioned, vs $0.0000166667/GB-second for on-demand). For predictable traffic patterns, schedule Provisioned Concurrency changes with Application Auto Scaling.
Function Architecture Changes
- Move initialization code (DB connections, config loading, SDK clients) outside the handler to the init phase. The init phase runs once per execution environment, not per invocation.
- Use connection pooling proxies (RDS Proxy for PostgreSQL/MySQL, ElastiCache for Redis) to reduce per-cold-start connection establishment time.
- Reduce package size: smaller zip = faster download. Lambda layers cache commonly used packages at the worker level.
- Choose runtimes with fast startup: Go, Rust (custom runtime), Node.js in preference to JVM for latency-sensitive endpoints.
Lambda Concurrency Model
Lambda's concurrency model is one-invocation-per-execution-environment. Two simultaneous requests to the same function require two separate execution environments. This is fundamentally different from a traditional web server that handles many concurrent requests in one process.
Lambda Concurrency = Simultaneous in-flight invocations
If 1000 users invoke the function simultaneously:
→ 1000 Firecracker MicroVMs running simultaneously
→ 1000 separate execution environments
→ Each consuming their allocated RAM
Concurrency limits:
- Default account limit: 1000 concurrent executions per region
- Reserved concurrency: guarantee N executions for a function
(also caps the function at N — useful to protect downstream services)
- Burst limits: 500-3000 initial burst, then +500/minute until limit
Reserved concurrency serves dual purposes: floor (guarantee capacity is available) and ceiling (protect a downstream database that can't handle >100 connections). Setting reserved concurrency = 0 disables the function entirely (useful for emergency circuit breaking).
Lambda Limitations
Resource Limit
──────────────────────────────────────
Execution timeout 15 minutes
Memory 128MB – 10GB
vCPU proportional to memory (max 6 vCPU at 10GB)
/tmp storage 512MB – 10GB (10GB requires config)
Package size 50MB (zipped), 250MB (unzipped), 10GB (container image)
Environment vars 4KB total
Payload (sync) 6MB request + 6MB response
Payload (async) 256KB
Layers 5 layers per function
Concurrent execs 1000 per region (default, soft limit)
The 15-minute timeout is the most commonly hit architectural constraint. Long-running batch jobs must be decomposed into smaller units (Step Functions for orchestration, SQS for queuing work).
Egress cost is a hidden Lambda cost: Lambda functions in a customer VPC pay standard VPC data transfer rates. Lambda functions outside a VPC (default) reach AWS services via AWS's internal network (no egress cost) but cannot reach private VPC resources.
Platform Comparison
Platform Runtime Model Cold Start Max Duration Pricing
──────────────────────────────────────────────────────────────────────────────
AWS Lambda Firecracker MicroVM 100ms-3s+ 15 min per GB·ms
Google Cloud gVisor container 200ms-2s+ 60 min per GB·s
Functions
Azure Functions WASM/container 200ms-2s+ Unlimited per exec
(Consumption) (host process reuse) (plan dep.) (Dedicaed)
Cloudflare V8 Isolate <5ms 30 sec per CPU·ms
Workers (no OS, no boot)
Fastly Compute Wasm (wasmtime) <1ms N/A per req
@Edge
AWS Lambda@Edge Firecracker MicroVM 100ms-300ms 5-30 seconds per req
(CloudFront)
Cloudflare Workers: V8 Isolates
Cloudflare Workers abandon OS-level isolation (no MicroVM, no container) in favor of V8 JavaScript isolates. Each Worker runs in its own V8 context within a shared V8 heap on a Cloudflare edge node. Isolation is provided by the JavaScript engine's memory model, not hardware virtualization.
Tradeoffs: - Cold start under 5ms (no VM to boot, no kernel to start) - Strict limitations: must write in JavaScript/TypeScript/WebAssembly, no arbitrary binaries - Limited execution time (30s), limited memory (128MB) - Security boundary is V8 isolation, not hardware VM isolation — more attack surface for sandbox escapes - Runs at the edge (200+ PoPs), not in a central region
Use for: edge personalization, A/B testing, request/response transformation, authentication at the edge. Not suitable for: CPU-intensive computation, workloads requiring arbitrary system access, long-running operations.
Google Cloud Run: Containers with FaaS Billing
Cloud Run occupies a middle ground — you provide a Docker container image, Google manages scaling (including scale to zero). It's not pure FaaS (you control the runtime, not just a function), but shares FaaS billing and operational characteristics.
Differentiator: Cloud Run handles multiple concurrent requests per instance (configurable, up to 1000 per container). This reduces cold starts dramatically for moderate traffic and enables efficient connection pooling within a single container.
Use Cases and Architectural Patterns
Event Processing: Lambda + SQS or Lambda + Kinesis for stream processing. Lambda polls the queue/stream and invokes in batches. Failure handling: SQS dead-letter queues, Kinesis bisect-on-error.
Webhooks and API callbacks: Infrequent HTTP callbacks (GitHub webhooks, Stripe events) — perfect for Lambda. No always-on server needed, and traffic spikes are handled automatically.
ETL and Data Processing: S3 event triggers → Lambda → transform → write to destination. Common pattern for log processing, image resizing, data normalization.
Infrastructure Automation: AWS Config rules, CloudWatch Events/EventBridge rules for compliance checking, automated remediation.
Microservice backends: GraphQL/REST API backed by Lambda functions. Works well when request volume is moderate and latency P99 requirements allow for occasional cold starts.
Debugging Notes
- Cold start diagnosis: Lambda X-Ray traces include the Init duration separately from execution duration.
INIT_STARTandINIT_REPORTin CloudWatch Logs identify cold starts. - Memory and timeout tuning: always set Lambda memory above your actual RSS usage. Lambda throttles CPU proportionally to memory; under-provisioning memory causes CPU throttling even if you're not near the memory limit. Use Lambda Power Tuning (open source tool) to find the optimal memory/cost balance.
- Lambda destination failures: async Lambda invocations that fail after all retry attempts can be routed to an SQS DLQ or SNS topic via Lambda Destinations — don't use DLQ only on the invocation source.
/tmpacross warm invocations: if your function writes to/tmp, a subsequent warm invocation on the same environment may find those files. Either clean up explicitly or treat/tmpas a single-invocation scratch space.- Execution environment reuse across function versions: Lambda creates separate execution environment pools per function version (including $LATEST). Rolling deploys create a brief period where both old and new versions serve traffic.
Security Implications
- Each Lambda execution environment has a unique IAM role temporary credential (STS AssumeRole). These credentials are injected into the environment via the credentials endpoint at
169.254.170.2. Compromising a Lambda function gives an attacker its IAM role — minimize role permissions. - Lambda SnapStart snapshots include all in-memory state at snapshot time — ensure no secrets, tokens, or sensitive data are in memory during the snapshot phase.
- Lambda functions can access the EC2 Instance Metadata Service (IMDS) unless blocked. The IMDS provides Lambda's credentials — an SSRF vulnerability that makes an HTTP request to 169.254.170.2 can steal Lambda's AWS credentials.
- VPC Lambda functions can access private resources in your VPC. Ensure VPC Security Groups for Lambda's ENIs follow least-privilege: only allow outbound to specific services on specific ports.
- Function code in ECR container images benefits from ECR image signing (cosign / Sigstore) to prevent tampering with the function's code at rest.
Performance Implications
- CPU scales with memory. A 128MB Lambda gets 1/14th of a vCPU. A 1769MB Lambda gets exactly 1 vCPU. A 3538MB Lambda gets 2 vCPUs. CPU-bound workloads should be profiled at different memory settings — often increasing memory from 1GB to 2GB halves wall-clock time, making it cost-neutral while reducing latency.
- Network throughput scales with memory allocation. 10GB Lambda functions get significantly more network bandwidth than 128MB functions.
- Avoid synchronous invocations in serial loops. Each Lambda invocation incurs ~1ms overhead for Lambda service routing. 1000 serial sub-invocations = 1 second of overhead minimum. Prefer batch processing or parallel fan-out with Promise.all / asyncio.gather.
Failure Modes
- Concurrency limit throttling: when account concurrency limit is reached, Lambda returns HTTP 429 (TooManyRequestsException) to synchronous callers. Async callers are queued for up to 6 hours. Set reserved concurrency on critical functions to guarantee capacity.
- Downstream cascade: Lambda functions that synchronously call another Lambda that times out will themselves eventually time out. Chain timeouts additively. Design with circuit breakers (AWS SDK retry/backoff, or explicit circuit breaker logic).
- Payload size exceeded: 6MB synchronous limit. Functions that attempt to return large responses will fail with an unhandled error. Stream responses (Lambda Response Streaming, introduced 2023) bypasses this for HTTP invocations.
- Init phase failure: if init code throws an exception, Lambda retries the cold start up to 3 times, then reports an initialization error. All invocations during this period fail — a bad deployment can cause 100% error rate.
Modern Usage
EventBridge Pipes (2022) and EventBridge Scheduler have made serverless orchestration more composable — routing events between sources and targets without custom Lambda glue code. AWS Step Functions Express Workflows support Lambda orchestration at 100K executions/second, replacing hand-rolled state machines.
Lambda Function URLs (2022) provide direct HTTPS endpoints for Lambda without API Gateway, eliminating one abstraction layer for simple HTTP use cases. Response streaming (2023) removes the 6MB response limit for buffered responses.
Future Directions
- WebAssembly as a Lambda runtime: WASM's sub-millisecond startup and language-agnostic binary format make it an ideal FaaS substrate. AWS Lambda already supports WASM via custom runtimes; first-class support would bring Cloudflare Workers-style startup times to Lambda's security model.
- GPU-backed Lambda for ML inference: currently requires container-based Lambda; dedicated GPU execution environments are the next frontier for serverless ML.
- Longer execution times: the 15-minute limit is increasingly constraining for agentic AI workflows. AWS has hinted at extensions via Durable Execution patterns (Step Functions).
Exercises
- Build a Lambda function that processes messages from an SQS queue. Implement a dead-letter queue and observe the behavior when your function throws an exception. Verify the DLQ receives the message after the configured retry count.
- Use AWS X-Ray to capture a trace of a cold-start Lambda invocation. Identify Init duration vs. execution duration. Implement Provisioned Concurrency and verify cold starts are eliminated in the traces.
- Implement the Lambda Power Tuning tool for a CPU-bound function (e.g., image compression). Plot the performance/cost curve across memory settings from 128MB to 3008MB.
- Create a Cloudflare Worker that rate-limits requests by IP using the Workers KV store. Measure cold start latency from multiple geographic locations using a latency testing tool.
- Compare Lambda cold start times for Python 3.12, Node.js 20, Java 21 (without SnapStart), Java 21 (with SnapStart), and a Go custom runtime using an identical simple HTTP handler. Document the differences and the optimal choice for each latency requirement.
References
- AWS re:Invent 2018: "A Serverless Journey: AWS Lambda Under the Hood" (SRV409)
- Firecracker: Lightweight Virtualization for Serverless Applications (NSDI 2020) — Agache et al.
- Cloudflare Workers: "How Workers Works" (blog.cloudflare.com/how-workers-works)
- AWS Lambda Power Tuning: https://github.com/alexcasalboni/aws-lambda-power-tuning
- "Serverless in the Wild" (ATC 2020) — Shahrad et al., Microsoft Research trace analysis of Azure Functions
- Google Cloud Run documentation: https://cloud.google.com/run/docs
- CNCF Serverless Whitepaper v1.0 (2018)
- AWS Lambda SnapStart deep-dive: https://aws.amazon.com/blogs/compute/reducing-java-cold-starts-on-aws-lambda-with-snapstart