Cross-Domain Relationship Map

How topics in different sections of the archive interconnect. This document is for readers who already have some background and want to see the architecture of the knowledge landscape — where concepts from one domain illuminate another.

Part 1: Domain-to-Domain Connection Tables

Kernel Concepts ↔ Security

Kernel Concept	Security Implication	Exploit Class	Mitigation
Virtual memory / paging	Exploit needs to locate code/data	Address harvesting	ASLR + KASLR
Page permissions (NX bit)	Shellcode must be non-executable	Stack/heap shellcode	NX/XD (no-execute)
Privilege levels (ring 0/3)	Kernel code runs with full privilege	Privilege escalation	SMEP/SMAP
Slab/SLUB allocator	Objects at predictable addresses enable spray	Heap spray, cross-cache	Randomized freelists, slab hardening
Speculative execution	CPU executes code speculatively, updates cache	Spectre, Meltdown	KPTI, retpoline, microcode
Copy_to/from_user	Kernel touches user memory	SMEP/SMAP bypass vectors	SMAP enforcement
Page table structure	Kernel pagetables visible to user (pre-KPTI)	Meltdown read of kernel	KPTI (separate user/kernel page tables)
Kernel function pointers	Overwriting a function pointer redirects execution	ROP, ret2usr	CFI, KASLR
Loadable modules	Unsigned modules can run in ring 0	Rootkit installation	Module signing, Secure Boot
eBPF verifier	BPF programs run in kernel	Verifier bugs → ring 0	Verifier hardening, unprivileged BPF off
syscall interface	Boundary between user and kernel	Syscall filtering bypass	seccomp-BPF
RCU grace periods	Freed objects accessed during grace period window	UAF via RCU	RCU-protected object lifetime rules

Key insight: Almost every kernel subsystem becomes an attack surface. Security researchers must understand kernel internals as deeply as kernel engineers do — often more, because they must think about unintended interactions between subsystems.

Memory Management ↔ Performance

Memory Concept	Performance Implication	Tool to Measure	Optimization
TLB coverage	More TLB misses = more page walks	perf stat -e dTLB-load-misses	Huge pages (2MB/1GB), reduce working set
Cache line size (64B)	False sharing wastes cache bandwidth	perf c2c (cache-to-cache)	Pad hot data to cache line boundaries
NUMA locality	Remote NUMA access ~2x slower	numastat, perf mem	NUMA-aware allocation, thread pinning
Page fault cost	Major fault = disk I/O (ms); minor fault = µs	perf stat -e page-faults	mlock, pre-fault, madvise MADV_POPULATE
Allocator fragmentation	Repeated alloc/free fragments heap; poor cache locality	valgrind massif	jemalloc/tcmalloc with size-class tuning
Copy-on-write	fork() is fast; first write to shared page triggers copy	/proc/[pid]/status VmRSS vs VmSize	Minimize writes after fork; use vfork for exec-only
Swap latency	A single swap-in adds ms to latency	vmstat si/so	Disable swap for latency-sensitive, use zswap
slab cache reuse	Freshly freed slab objects are hot in cache	perf mem, SLUB stats	Keep allocator hot path per-CPU
mmap vs read	mmap avoids double-copy for large files	strace bytes, perf	Use mmap for files > 1 page; use O_DIRECT for large sequential
Huge page alignment	Misaligned huge page boundaries cause extra TLB entries	perf stat -e iTLB-misses	Align hot code sections; use THP

Key insight: The memory subsystem is the source of the majority of performance surprises in production. Cache misses, TLB pressure, NUMA imbalance, and allocator behavior are the four mechanisms — understand them at the hardware level to reason about application-level performance.

Scheduling ↔ Distributed Systems

Scheduler Concept	Distributed Analog	Connection
Preemption	Lease expiration / timeout	Both ensure no single task monopolizes a resource indefinitely
CPU quota (cgroups)	Rate limiting / token bucket	Both bound resource consumption per entity
Work stealing (Go, CFS)	Consistent hashing with rebalancing	Idle processors/nodes pull work from overloaded ones
Priority inversion	Head-of-line blocking	A high-priority task blocked behind a low-priority one; in distributed: a high-priority request stuck behind slow ones in a queue
Real-time deadline scheduling (SCHED_DEADLINE)	Deadline-aware routing (Google Borg)	Scheduling with explicit deadlines to guarantee bounded response time
CPU affinity / pinning	Data locality in distributed systems	Move computation near data to reduce transport cost
Scheduler runqueue depth	Queue depth in message queues	Both are signals of saturation; high depth = latency spike
Idle task / C-states	Backpressure and flow control	Systems signal "no work" to reduce energy/overhead; distributed: backpressure prevents overload
Gang scheduling (MPI/GPU)	Coordinated reservation (YARN, k8s ResourceQuota)	Multiple tasks that must run simultaneously require coordinated scheduling
CFS fairness via vruntime	Weighted fair queuing	Both assign each entity a fair share of a shared resource over time

Key insight: Distributed systems are, at a deep level, resource scheduling problems at scale. The mental models for CPU scheduling (fairness, priorities, work stealing, deadlines) translate directly to distributed resource management. A scheduler is a distributed system with one shared resource; a distributed system is a scheduler with many resources.

Networking ↔ Cloud Infrastructure

Networking Concept	Cloud Infrastructure Manifestation	What Actually Happens
ARP and MAC learning	VPC virtual switch behavior	Cloud VPCs use SDN; no actual ARP broadcasts — overlay control plane answers ARP
BGP routing	AWS/GCP transit backbone, Direct Connect	Cloud providers use BGP internally between PoPs; customer BGP sessions to routers
TCP connection establishment	Load balancer connection pooling	ALB/NLB terminate TCP; connection to backend is a separate TCP connection
NAT	Security group + elastic IP	Cloud instances have private IPs; DNAT/SNAT rules map to public IPs
MTU and fragmentation	VPC overlay (VXLAN) MTU reduction	VXLAN adds 50-byte overhead; inner packets must be 1450 bytes to avoid fragmentation
TCP congestion control	Cross-AZ bandwidth pricing	AWS charges for cross-AZ traffic because bandwidth is a shared, constrained resource
DNS TTL	Service discovery and load balancing	Low TTL for rapid failover; high TTL for caching; Kubernetes DNS TTL ≈ 5s
UDP multicast	Replaced by overlay unicast in cloud	Cloud VPCs don't support multicast; applications must use gossip/unicast
ECMP load balancing	NLB and Route53 routing	Multiple equal-cost paths; stateless per-flow hashing
SR-IOV and VFIO	EC2 Enhanced Networking (ENA)	NIC hardware partitioned into virtual functions; bypasses hypervisor for lower latency
RDMA over Converged Ethernet (RoCE)	AWS EFA, Azure InfiniBand, GCP GPUDirect	Required for tight-coupled HPC and AI training workloads

Filesystems ↔ Databases

Filesystem Concept	Database Analog	Notes
WAL (Write-Ahead Log)	WAL / redo log	Both log changes before applying to data pages; enables crash recovery
Copy-on-Write (btrfs/ZFS)	MVCC snapshot isolation	Both maintain multiple versions of data; old versions for readers, new for writers
B-tree (ext4/NTFS index)	B+-tree storage engine (InnoDB, WiredTiger)	Filesystem directory index uses same structure as DB primary index
Journaling (ext4, XFS)	ARIES recovery	Both use physiological logging; checkpoint to bound recovery time
Page cache (kernel)	Buffer pool (InnoDB, PostgreSQL)	Both cache disk pages in memory; both face the same replacement policy problem
Dentry cache (dcache)	Catalog cache / table handle cache	Both cache frequently accessed metadata to avoid disk lookups
Fragmentation (free space)	Heap fragmentation in storage engine	Both lead to performance degradation over time; both require periodic compaction/defrag
Atomic rename (rename())	Atomic table rename / online DDL	Both use an atomic filesystem operation to swap the new version in
Superblock / fsck	Database catalog / consistency check	Both have a "root of truth" for the filesystem/database state
mmap() for file access	mmap-based storage engine (LMDB)	LMDB maps the database file directly; avoids double buffering between kernel and DB
Sparse files	Sparse/compressed tables	Both allow logical size >> physical size
overlayfs layers	LSM-tree compaction layers	Both have a layered read/write model; overlayfs: upper/lower; LSM: L0→L6

Key insight: A database storage engine is, in essence, a userspace filesystem. The concepts developed for filesystems (journaling, page cache, B-trees, compaction) were either borrowed by or independently invented by database designers solving the same fundamental problems.

Virtualization ↔ Containers ↔ Cloud

                    ISOLATION SPECTRUM
    ┌────────────────────────────────────────────────────────────┐
    │                                                            │
    │  Bare Metal    VM (KVM)    Kata Container  Container (runc)│
    │      │            │              │                │        │
    │    No OS     Guest OS      Micro-VMM +       Linux        │
    │   overhead    overhead     mini kernel      namespaces    │
    │                            overhead          overhead     │
    │    ~0ms       ~5ms         ~100ms              ~1ms       │
    │   isolation  isolation     isolation          isolation   │
    │                                                            │
    └────────────────────────────────────────────────────────────┘

    Security boundary strength: VM > Kata > Container > Process
    Density per host:           Container > VM > Bare metal
    Boot time:                  Container < VM < Bare metal (instance)

Concept	Virtualization	Container	Cloud Manifestation
Isolation boundary	Hardware virt + EPT	Linux namespaces	AWS isolates tenants with Nitro (bare metal-equivalent)
Resource limits	VM vCPU/RAM allocation	cgroup v2 CPU/memory	EC2 instance types define both
Networking	virtio-net → bridge → physical	veth → bridge → physical	All overlay'd by VPC SDN
Storage	virtio-blk → LVM → disk	bind mount / volume	EBS is a network block device behind virtio
Image format	QCOW2 / VMDK	OCI image (layers)	AMI = EBS snapshot; Docker image on ECR
Live migration	KVM live migration via RDMA	Pod rescheduling (not live)	AWS evacuates instances; pods restart
Multi-tenancy	Multiple VMs per host	Multiple pods per node	Cloud accounts/projects as isolation boundaries
Escape vulnerabilities	VM escape (VENOM, Spectre)	Container escape (runc CVE-2019-5736)	Cloud: cross-tenant attacks are critical severity
Density driver	Memory deduplication (KSM)	Shared image layers (overlayfs)	Cloud: bin-packing algorithms for profit margin

Key insight: The evolution from VMs to containers to serverless is a progressive trade of isolation strength for density and startup latency. Understanding this tradeoff — and what isolation mechanism each boundary provides — is essential for cloud security architecture.

Hardware ↔ Performance Optimization

Hardware Reality	Performance Implication	Code-Level Response
L1 cache: ~64KB, 4 cycles	Hot data must fit in L1	Struct layout to minimize footprint; SOA over AOS
L3 cache: ~32MB, ~30 cycles	Working set > 32MB → LLC thrash	Partition work to fit; cache-oblivious algorithms
DRAM: ~100ns, ~40GB/s	Memory-bound loops limited to ~400M/s 8B accesses	Vectorize loops; prefetch; use HBM for AI workloads
Cache line 64B	Loading one byte loads 64B	Pack related data together; avoid pointer chasing
NUMA remote: ~200ns	Random allocs go to any NUMA node	numactl; NUMA-aware allocators; thread migration pinning
Branch predictor: ~98% accuracy	Misprediction flushes pipeline (~15 cycles)	Sort data before conditional loops; use branchless code
Superscalar (4-wide): 4 IPC theoretical	Serial dependency chains prevent IPC > 1	Break dependency chains; unroll loops
SIMD AVX-512: 512-bit vectors	Scalar code at 1/8 potential throughput	Auto-vectorization or intrinsics; aligned data
HyperThreading: 2 logical per physical	Sharing execution resources with sibling	Bind latency-critical to one logical core; disable HT for RT
PCIe 4.0 x16: 32 GB/s	GPU copy bottleneck if frequent small transfers	Batch transfers; use unified memory; GPUDirect to bypass CPU
NVMe queue depth: 65535	Avoid using depth 1 (defeats NVMe parallelism)	Use io_uring with multiple inflight requests
HBM bandwidth: ~3 TB/s	AI training is memory-bandwidth-bound	Maximize arithmetic intensity (FLOPS/byte); kernel fusion

Runtime Systems ↔ AI Infrastructure

Runtime Concept	AI Infrastructure Manifestation
JIT compilation (JVM TurboFan)	torch.compile (PyTorch 2.0), XLA compilation for TPU
Garbage collection pauses	Training jitter from Python GC during backward pass
GIL (CPython)	Limits Python data loading throughput; workaround: multiprocessing
Memory allocator (jemalloc)	PyTorch uses CachingAllocator to reuse GPU memory blocks
Thread pool (Java Executor)	CUDA stream pool for overlapping compute and data transfer
Reference counting (Python)	Tensor lifetime management; incorrect RC causes GPU memory leaks
Coroutines / async/await	Async data prefetching to overlap I/O with GPU compute
Escape analysis (JVM)	PyTorch graph capture avoids Python overhead for hot loops
Tiered compilation (C1→C2)	PyTorch eager → torch.compile → TensorRT (progressive optimization)
Stack allocation (Rust)	On-device constant memory for weights avoids DRAM round-trips
Interpreter overhead (CPython)	PyTorch 2.0 traces out the Python overhead; Triton for custom kernels

Part 2: Domain Cluster ASCII Diagrams

The Full Stack: From Transistors to Distributed AI

┌────────────────────────────────────────────────────────────────┐
│                    AI TRAINING / INFERENCE                     │
│  Distributed training │ Model serving │ Checkpoint storage     │
└───────────────────────┬────────────────────────────────────────┘
                        │  calls into
┌───────────────────────▼────────────────────────────────────────┐
│              RUNTIME SYSTEMS & COMPILERS                       │
│   PyTorch / TF │ CUDA Graphs │ XLA │ Triton │ torch.compile    │
└───────────────────────┬────────────────────────────────────────┘
                        │  calls into
┌───────────────────────▼────────────────────────────────────────┐
│          OPERATING SYSTEM (Linux + CUDA Driver)                │
│  Scheduler │ Memory Manager │ CUDA Driver │ Network Driver     │
└──────────┬─────────────────────┬──────────────────┬───────────┘
           │                     │                  │
┌──────────▼──────┐  ┌───────────▼──────┐  ┌───────▼────────┐
│   CPU + DRAM    │  │   GPU + HBM      │  │  NIC + Fabric  │
│  (x86 / ARM64)  │  │  (A100 / H100)   │  │ (InfiniBand /  │
│  NUMA, PCIe     │  │  NVLink/NVSwitch  │  │  Ethernet RoCE)│
└─────────────────┘  └──────────────────┘  └────────────────┘

Cross-Domain Security Attack Surface

┌──────────────────────────────────────────────────────────────┐
│                    ATTACK SURFACE MAP                         │
│                                                              │
│  [Hardware] ──Spectre/Meltdown──► [Kernel] ──SMEP bypass──► │
│       │                               │                      │
│  [Microcode]                  [Kernel Exploits]              │
│  [CPU Bugs]                          │                       │
│                               [Container Escape]             │
│                                      │                       │
│                               [Cloud Escape]                  │
│                                      │                       │
│  [Supply Chain] ──SBOM───────► [Image Signing]               │
│  [Compiler]     ──Backdoors──► [Reproducible Builds]         │
│  [Dependency]                                                │
└──────────────────────────────────────────────────────────────┘

Observability Connects All Domains

                    ┌──────────────────┐
                    │  OBSERVABILITY   │
                    │  (metrics/logs/  │
                    │   traces/profiles│
                    └────────┬─────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │                  │                  │
    ┌─────▼─────┐    ┌───────▼──────┐   ┌──────▼──────┐
    │  Kernel   │    │  Application │   │  Hardware   │
    │  (eBPF,   │    │  (OTel,      │   │  (perf PMU, │
    │  ftrace,  │    │  pprof,      │   │  Nsight,    │
    │  perf)    │    │  JFR)        │   │  turbostat) │
    └─────┬─────┘    └───────┬──────┘   └──────┬──────┘
          │                  │                  │
          └──────────────────▼──────────────────┘
                    ┌────────────────┐
                    │  CORRELATION   │
                    │  (TraceContext, │
                    │  exemplars,    │
                    │  corr. IDs)    │
                    └────────────────┘

Part 3: Unified Mental Model

The following is a systems-level mental model that unifies all domains in the archive into a single coherent framework.

The Fundamental Abstraction Stack

Every computer system can be understood as a series of abstraction layers, where each layer: 1. Hides complexity from the layer above 2. Exposes a simpler, more powerful interface 3. Introduces overhead (the abstraction tax) 4. Creates a trust boundary (privilege separation)

Layer 7: Applications (user intent)
           ↑ hides: all implementation
Layer 6: Runtime Systems (managed memory, GC, JIT)
           ↑ hides: memory safety, calling conventions
Layer 5: Operating System (processes, files, network)
           ↑ hides: hardware multiplexing, isolation
Layer 4: Kernel (syscalls, drivers, scheduling)
           ↑ hides: interrupt handling, DMA, device protocols
Layer 3: Firmware (UEFI/BIOS, device firmware)
           ↑ hides: hardware initialization
Layer 2: Microarchitecture (pipelines, caches, speculative execution)
           ↑ hides: instruction-level parallelism
Layer 1: Digital Logic (gates, flip-flops, buses)
           ↑ hides: analog behavior
Layer 0: Physics (electrons, semiconductor physics)

The Three Universal Constraints

Every systems problem, regardless of domain, is ultimately constrained by one or more of:

Bandwidth — how much data can move per second (memory bandwidth, network bandwidth, storage throughput, PCIe bandwidth, GPU memory bandwidth)
Latency — how long a unit of work takes end-to-end (cache miss latency, network RTT, disk seek time, context switch time, GC pause)
Coordination cost — the overhead of multiple entities working together (lock contention, consensus rounds, cache coherence traffic, network round trips for distributed transactions)

Performance engineering is the art of determining which constraint is binding, then attacking it at the right layer.

The Three Universal Safety Properties

Every security problem is fundamentally about one or more of:

Confidentiality — who can read what (memory isolation, encryption, access control)
Integrity — who can modify what (write protection, hash verification, signatures)
Availability — who can use what when (DoS resistance, resource limits, circuit breakers)

The Three Universal Reliability Properties

Every reliability problem is fundamentally about one or more of:

Failure detection — how quickly do we know something is wrong (health checks, metrics, alerts)
Failure isolation — how do we stop a failure from spreading (bulkheads, circuit breakers, namespaces)
Failure recovery — how do we restore normal operation (restart, failover, rollback)

How All Domains Interconnect: The Dependency Web

[Physics/Transistors]
       │
       ▼
[Digital Logic / Gates]
       │
       ▼
[CPU Microarchitecture] ←─────────────── [DRAM / Storage Devices]
  │         │                                    │
  │         ▼                                    │
  │  [Cache Hierarchy / Memory Ordering]         │
  │         │                                    │
  ▼         ▼                                    ▼
[ISA / Privilege Levels]          [Storage I/O Stack]
       │                                    │
       ▼                                    ▼
[Kernel: Scheduler, Memory Mgr,     [Filesystems / Databases]
 Process Mgr, Network Stack,              │
 Device Drivers]                          │
       │         │                        │
       ▼         ▼                        ▼
[Virtualization]  [Containers]  [Distributed Systems]
       │              │                   │
       ▼              ▼                   ▼
       └──────────► [Cloud Infrastructure / Kubernetes]
                          │
                          ▼
              [Application Runtimes: JVM, Python, Go, Rust]
                          │
                          ▼
                [AI Infrastructure: GPU Clusters, Training,
                 Model Serving, Inference Optimization]

CROSS-CUTTING CONCERNS (affect all layers):
  [Security]         — threat model at every layer
  [Observability]    — instrumentation at every layer
  [Performance]      — constraints and bottlenecks at every layer
  [Reliability]      — failure modes and recovery at every layer

The Practitioner's Heuristic

When you encounter a system problem you don't understand, ask in order:

What resource is exhausted or contended? (CPU, memory, network, storage, lock)
Which layer is the bottleneck? (hardware, kernel, runtime, application, network, distributed)
What is the trust boundary involved? (user/kernel, VM/host, container/host, service/service)
What is the failure mode? (crash, corruption, performance degradation, availability loss)
What is the correct abstraction level to fix it? (hardware config, kernel tuning, application change, architecture change)

This heuristic works because every domain in this archive — from kernel scheduling to Kubernetes pod placement, from CUDA memory coalescing to TCP congestion control — is an instance of the same underlying pattern: multiple entities competing for shared resources, with a mechanism for arbitration, isolation, and recovery.

Understanding that pattern at every layer of the stack is the goal of this archive.