Skip to content

Cross-Domain Relationship Map

How topics in different sections of the archive interconnect. This document is for readers who already have some background and want to see the architecture of the knowledge landscape — where concepts from one domain illuminate another.


Part 1: Domain-to-Domain Connection Tables

Kernel Concepts ↔ Security

Kernel Concept Security Implication Exploit Class Mitigation
Virtual memory / paging Exploit needs to locate code/data Address harvesting ASLR + KASLR
Page permissions (NX bit) Shellcode must be non-executable Stack/heap shellcode NX/XD (no-execute)
Privilege levels (ring 0/3) Kernel code runs with full privilege Privilege escalation SMEP/SMAP
Slab/SLUB allocator Objects at predictable addresses enable spray Heap spray, cross-cache Randomized freelists, slab hardening
Speculative execution CPU executes code speculatively, updates cache Spectre, Meltdown KPTI, retpoline, microcode
Copy_to/from_user Kernel touches user memory SMEP/SMAP bypass vectors SMAP enforcement
Page table structure Kernel pagetables visible to user (pre-KPTI) Meltdown read of kernel KPTI (separate user/kernel page tables)
Kernel function pointers Overwriting a function pointer redirects execution ROP, ret2usr CFI, KASLR
Loadable modules Unsigned modules can run in ring 0 Rootkit installation Module signing, Secure Boot
eBPF verifier BPF programs run in kernel Verifier bugs → ring 0 Verifier hardening, unprivileged BPF off
syscall interface Boundary between user and kernel Syscall filtering bypass seccomp-BPF
RCU grace periods Freed objects accessed during grace period window UAF via RCU RCU-protected object lifetime rules

Key insight: Almost every kernel subsystem becomes an attack surface. Security researchers must understand kernel internals as deeply as kernel engineers do — often more, because they must think about unintended interactions between subsystems.


Memory Management ↔ Performance

Memory Concept Performance Implication Tool to Measure Optimization
TLB coverage More TLB misses = more page walks perf stat -e dTLB-load-misses Huge pages (2MB/1GB), reduce working set
Cache line size (64B) False sharing wastes cache bandwidth perf c2c (cache-to-cache) Pad hot data to cache line boundaries
NUMA locality Remote NUMA access ~2x slower numastat, perf mem NUMA-aware allocation, thread pinning
Page fault cost Major fault = disk I/O (ms); minor fault = µs perf stat -e page-faults mlock, pre-fault, madvise MADV_POPULATE
Allocator fragmentation Repeated alloc/free fragments heap; poor cache locality valgrind massif jemalloc/tcmalloc with size-class tuning
Copy-on-write fork() is fast; first write to shared page triggers copy /proc/[pid]/status VmRSS vs VmSize Minimize writes after fork; use vfork for exec-only
Swap latency A single swap-in adds ms to latency vmstat si/so Disable swap for latency-sensitive, use zswap
slab cache reuse Freshly freed slab objects are hot in cache perf mem, SLUB stats Keep allocator hot path per-CPU
mmap vs read mmap avoids double-copy for large files strace bytes, perf Use mmap for files > 1 page; use O_DIRECT for large sequential
Huge page alignment Misaligned huge page boundaries cause extra TLB entries perf stat -e iTLB-misses Align hot code sections; use THP

Key insight: The memory subsystem is the source of the majority of performance surprises in production. Cache misses, TLB pressure, NUMA imbalance, and allocator behavior are the four mechanisms — understand them at the hardware level to reason about application-level performance.


Scheduling ↔ Distributed Systems

Scheduler Concept Distributed Analog Connection
Preemption Lease expiration / timeout Both ensure no single task monopolizes a resource indefinitely
CPU quota (cgroups) Rate limiting / token bucket Both bound resource consumption per entity
Work stealing (Go, CFS) Consistent hashing with rebalancing Idle processors/nodes pull work from overloaded ones
Priority inversion Head-of-line blocking A high-priority task blocked behind a low-priority one; in distributed: a high-priority request stuck behind slow ones in a queue
Real-time deadline scheduling (SCHED_DEADLINE) Deadline-aware routing (Google Borg) Scheduling with explicit deadlines to guarantee bounded response time
CPU affinity / pinning Data locality in distributed systems Move computation near data to reduce transport cost
Scheduler runqueue depth Queue depth in message queues Both are signals of saturation; high depth = latency spike
Idle task / C-states Backpressure and flow control Systems signal "no work" to reduce energy/overhead; distributed: backpressure prevents overload
Gang scheduling (MPI/GPU) Coordinated reservation (YARN, k8s ResourceQuota) Multiple tasks that must run simultaneously require coordinated scheduling
CFS fairness via vruntime Weighted fair queuing Both assign each entity a fair share of a shared resource over time

Key insight: Distributed systems are, at a deep level, resource scheduling problems at scale. The mental models for CPU scheduling (fairness, priorities, work stealing, deadlines) translate directly to distributed resource management. A scheduler is a distributed system with one shared resource; a distributed system is a scheduler with many resources.


Networking ↔ Cloud Infrastructure

Networking Concept Cloud Infrastructure Manifestation What Actually Happens
ARP and MAC learning VPC virtual switch behavior Cloud VPCs use SDN; no actual ARP broadcasts — overlay control plane answers ARP
BGP routing AWS/GCP transit backbone, Direct Connect Cloud providers use BGP internally between PoPs; customer BGP sessions to routers
TCP connection establishment Load balancer connection pooling ALB/NLB terminate TCP; connection to backend is a separate TCP connection
NAT Security group + elastic IP Cloud instances have private IPs; DNAT/SNAT rules map to public IPs
MTU and fragmentation VPC overlay (VXLAN) MTU reduction VXLAN adds 50-byte overhead; inner packets must be 1450 bytes to avoid fragmentation
TCP congestion control Cross-AZ bandwidth pricing AWS charges for cross-AZ traffic because bandwidth is a shared, constrained resource
DNS TTL Service discovery and load balancing Low TTL for rapid failover; high TTL for caching; Kubernetes DNS TTL ≈ 5s
UDP multicast Replaced by overlay unicast in cloud Cloud VPCs don't support multicast; applications must use gossip/unicast
ECMP load balancing NLB and Route53 routing Multiple equal-cost paths; stateless per-flow hashing
SR-IOV and VFIO EC2 Enhanced Networking (ENA) NIC hardware partitioned into virtual functions; bypasses hypervisor for lower latency
RDMA over Converged Ethernet (RoCE) AWS EFA, Azure InfiniBand, GCP GPUDirect Required for tight-coupled HPC and AI training workloads

Filesystems ↔ Databases

Filesystem Concept Database Analog Notes
WAL (Write-Ahead Log) WAL / redo log Both log changes before applying to data pages; enables crash recovery
Copy-on-Write (btrfs/ZFS) MVCC snapshot isolation Both maintain multiple versions of data; old versions for readers, new for writers
B-tree (ext4/NTFS index) B+-tree storage engine (InnoDB, WiredTiger) Filesystem directory index uses same structure as DB primary index
Journaling (ext4, XFS) ARIES recovery Both use physiological logging; checkpoint to bound recovery time
Page cache (kernel) Buffer pool (InnoDB, PostgreSQL) Both cache disk pages in memory; both face the same replacement policy problem
Dentry cache (dcache) Catalog cache / table handle cache Both cache frequently accessed metadata to avoid disk lookups
Fragmentation (free space) Heap fragmentation in storage engine Both lead to performance degradation over time; both require periodic compaction/defrag
Atomic rename (rename()) Atomic table rename / online DDL Both use an atomic filesystem operation to swap the new version in
Superblock / fsck Database catalog / consistency check Both have a "root of truth" for the filesystem/database state
mmap() for file access mmap-based storage engine (LMDB) LMDB maps the database file directly; avoids double buffering between kernel and DB
Sparse files Sparse/compressed tables Both allow logical size >> physical size
overlayfs layers LSM-tree compaction layers Both have a layered read/write model; overlayfs: upper/lower; LSM: L0→L6

Key insight: A database storage engine is, in essence, a userspace filesystem. The concepts developed for filesystems (journaling, page cache, B-trees, compaction) were either borrowed by or independently invented by database designers solving the same fundamental problems.


Virtualization ↔ Containers ↔ Cloud

                    ISOLATION SPECTRUM
    ┌────────────────────────────────────────────────────────────┐
    │                                                            │
    │  Bare Metal    VM (KVM)    Kata Container  Container (runc)│
    │      │            │              │                │        │
    │    No OS     Guest OS      Micro-VMM +       Linux        │
    │   overhead    overhead     mini kernel      namespaces    │
    │                            overhead          overhead     │
    │    ~0ms       ~5ms         ~100ms              ~1ms       │
    │   isolation  isolation     isolation          isolation   │
    │                                                            │
    └────────────────────────────────────────────────────────────┘

    Security boundary strength: VM > Kata > Container > Process
    Density per host:           Container > VM > Bare metal
    Boot time:                  Container < VM < Bare metal (instance)
Concept Virtualization Container Cloud Manifestation
Isolation boundary Hardware virt + EPT Linux namespaces AWS isolates tenants with Nitro (bare metal-equivalent)
Resource limits VM vCPU/RAM allocation cgroup v2 CPU/memory EC2 instance types define both
Networking virtio-net → bridge → physical veth → bridge → physical All overlay'd by VPC SDN
Storage virtio-blk → LVM → disk bind mount / volume EBS is a network block device behind virtio
Image format QCOW2 / VMDK OCI image (layers) AMI = EBS snapshot; Docker image on ECR
Live migration KVM live migration via RDMA Pod rescheduling (not live) AWS evacuates instances; pods restart
Multi-tenancy Multiple VMs per host Multiple pods per node Cloud accounts/projects as isolation boundaries
Escape vulnerabilities VM escape (VENOM, Spectre) Container escape (runc CVE-2019-5736) Cloud: cross-tenant attacks are critical severity
Density driver Memory deduplication (KSM) Shared image layers (overlayfs) Cloud: bin-packing algorithms for profit margin

Key insight: The evolution from VMs to containers to serverless is a progressive trade of isolation strength for density and startup latency. Understanding this tradeoff — and what isolation mechanism each boundary provides — is essential for cloud security architecture.


Hardware ↔ Performance Optimization

Hardware Reality Performance Implication Code-Level Response
L1 cache: ~64KB, 4 cycles Hot data must fit in L1 Struct layout to minimize footprint; SOA over AOS
L3 cache: ~32MB, ~30 cycles Working set > 32MB → LLC thrash Partition work to fit; cache-oblivious algorithms
DRAM: ~100ns, ~40GB/s Memory-bound loops limited to ~400M/s 8B accesses Vectorize loops; prefetch; use HBM for AI workloads
Cache line 64B Loading one byte loads 64B Pack related data together; avoid pointer chasing
NUMA remote: ~200ns Random allocs go to any NUMA node numactl; NUMA-aware allocators; thread migration pinning
Branch predictor: ~98% accuracy Misprediction flushes pipeline (~15 cycles) Sort data before conditional loops; use branchless code
Superscalar (4-wide): 4 IPC theoretical Serial dependency chains prevent IPC > 1 Break dependency chains; unroll loops
SIMD AVX-512: 512-bit vectors Scalar code at 1/8 potential throughput Auto-vectorization or intrinsics; aligned data
HyperThreading: 2 logical per physical Sharing execution resources with sibling Bind latency-critical to one logical core; disable HT for RT
PCIe 4.0 x16: 32 GB/s GPU copy bottleneck if frequent small transfers Batch transfers; use unified memory; GPUDirect to bypass CPU
NVMe queue depth: 65535 Avoid using depth 1 (defeats NVMe parallelism) Use io_uring with multiple inflight requests
HBM bandwidth: ~3 TB/s AI training is memory-bandwidth-bound Maximize arithmetic intensity (FLOPS/byte); kernel fusion

Runtime Systems ↔ AI Infrastructure

Runtime Concept AI Infrastructure Manifestation
JIT compilation (JVM TurboFan) torch.compile (PyTorch 2.0), XLA compilation for TPU
Garbage collection pauses Training jitter from Python GC during backward pass
GIL (CPython) Limits Python data loading throughput; workaround: multiprocessing
Memory allocator (jemalloc) PyTorch uses CachingAllocator to reuse GPU memory blocks
Thread pool (Java Executor) CUDA stream pool for overlapping compute and data transfer
Reference counting (Python) Tensor lifetime management; incorrect RC causes GPU memory leaks
Coroutines / async/await Async data prefetching to overlap I/O with GPU compute
Escape analysis (JVM) PyTorch graph capture avoids Python overhead for hot loops
Tiered compilation (C1→C2) PyTorch eager → torch.compile → TensorRT (progressive optimization)
Stack allocation (Rust) On-device constant memory for weights avoids DRAM round-trips
Interpreter overhead (CPython) PyTorch 2.0 traces out the Python overhead; Triton for custom kernels

Part 2: Domain Cluster ASCII Diagrams

The Full Stack: From Transistors to Distributed AI

┌────────────────────────────────────────────────────────────────┐
│                    AI TRAINING / INFERENCE                     │
│  Distributed training │ Model serving │ Checkpoint storage     │
└───────────────────────┬────────────────────────────────────────┘
                        │  calls into
┌───────────────────────▼────────────────────────────────────────┐
│              RUNTIME SYSTEMS & COMPILERS                       │
│   PyTorch / TF │ CUDA Graphs │ XLA │ Triton │ torch.compile    │
└───────────────────────┬────────────────────────────────────────┘
                        │  calls into
┌───────────────────────▼────────────────────────────────────────┐
│          OPERATING SYSTEM (Linux + CUDA Driver)                │
│  Scheduler │ Memory Manager │ CUDA Driver │ Network Driver     │
└──────────┬─────────────────────┬──────────────────┬───────────┘
           │                     │                  │
┌──────────▼──────┐  ┌───────────▼──────┐  ┌───────▼────────┐
│   CPU + DRAM    │  │   GPU + HBM      │  │  NIC + Fabric  │
│  (x86 / ARM64)  │  │  (A100 / H100)   │  │ (InfiniBand /  │
│  NUMA, PCIe     │  │  NVLink/NVSwitch  │  │  Ethernet RoCE)│
└─────────────────┘  └──────────────────┘  └────────────────┘

Cross-Domain Security Attack Surface

┌──────────────────────────────────────────────────────────────┐
│                    ATTACK SURFACE MAP                         │
│                                                              │
│  [Hardware] ──Spectre/Meltdown──► [Kernel] ──SMEP bypass──► │
│       │                               │                      │
│  [Microcode]                  [Kernel Exploits]              │
│  [CPU Bugs]                          │                       │
│                               [Container Escape]             │
│                                      │                       │
│                               [Cloud Escape]                  │
│                                      │                       │
│  [Supply Chain] ──SBOM───────► [Image Signing]               │
│  [Compiler]     ──Backdoors──► [Reproducible Builds]         │
│  [Dependency]                                                │
└──────────────────────────────────────────────────────────────┘

Observability Connects All Domains

                    ┌──────────────────┐
                    │  OBSERVABILITY   │
                    │  (metrics/logs/  │
                    │   traces/profiles│
                    └────────┬─────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │                  │                  │
    ┌─────▼─────┐    ┌───────▼──────┐   ┌──────▼──────┐
    │  Kernel   │    │  Application │   │  Hardware   │
    │  (eBPF,   │    │  (OTel,      │   │  (perf PMU, │
    │  ftrace,  │    │  pprof,      │   │  Nsight,    │
    │  perf)    │    │  JFR)        │   │  turbostat) │
    └─────┬─────┘    └───────┬──────┘   └──────┬──────┘
          │                  │                  │
          └──────────────────▼──────────────────┘
                    ┌────────────────┐
                    │  CORRELATION   │
                    │  (TraceContext, │
                    │  exemplars,    │
                    │  corr. IDs)    │
                    └────────────────┘

Part 3: Unified Mental Model

The following is a systems-level mental model that unifies all domains in the archive into a single coherent framework.

The Fundamental Abstraction Stack

Every computer system can be understood as a series of abstraction layers, where each layer: 1. Hides complexity from the layer above 2. Exposes a simpler, more powerful interface 3. Introduces overhead (the abstraction tax) 4. Creates a trust boundary (privilege separation)

Layer 7: Applications (user intent)
           ↑ hides: all implementation
Layer 6: Runtime Systems (managed memory, GC, JIT)
           ↑ hides: memory safety, calling conventions
Layer 5: Operating System (processes, files, network)
           ↑ hides: hardware multiplexing, isolation
Layer 4: Kernel (syscalls, drivers, scheduling)
           ↑ hides: interrupt handling, DMA, device protocols
Layer 3: Firmware (UEFI/BIOS, device firmware)
           ↑ hides: hardware initialization
Layer 2: Microarchitecture (pipelines, caches, speculative execution)
           ↑ hides: instruction-level parallelism
Layer 1: Digital Logic (gates, flip-flops, buses)
           ↑ hides: analog behavior
Layer 0: Physics (electrons, semiconductor physics)

The Three Universal Constraints

Every systems problem, regardless of domain, is ultimately constrained by one or more of:

  1. Bandwidth — how much data can move per second (memory bandwidth, network bandwidth, storage throughput, PCIe bandwidth, GPU memory bandwidth)

  2. Latency — how long a unit of work takes end-to-end (cache miss latency, network RTT, disk seek time, context switch time, GC pause)

  3. Coordination cost — the overhead of multiple entities working together (lock contention, consensus rounds, cache coherence traffic, network round trips for distributed transactions)

Performance engineering is the art of determining which constraint is binding, then attacking it at the right layer.

The Three Universal Safety Properties

Every security problem is fundamentally about one or more of:

  1. Confidentiality — who can read what (memory isolation, encryption, access control)
  2. Integrity — who can modify what (write protection, hash verification, signatures)
  3. Availability — who can use what when (DoS resistance, resource limits, circuit breakers)

The Three Universal Reliability Properties

Every reliability problem is fundamentally about one or more of:

  1. Failure detection — how quickly do we know something is wrong (health checks, metrics, alerts)
  2. Failure isolation — how do we stop a failure from spreading (bulkheads, circuit breakers, namespaces)
  3. Failure recovery — how do we restore normal operation (restart, failover, rollback)

How All Domains Interconnect: The Dependency Web

[Physics/Transistors]
       │
       ▼
[Digital Logic / Gates]
       │
       ▼
[CPU Microarchitecture] ←─────────────── [DRAM / Storage Devices]
  │         │                                    │
  │         ▼                                    │
  │  [Cache Hierarchy / Memory Ordering]         │
  │         │                                    │
  ▼         ▼                                    ▼
[ISA / Privilege Levels]          [Storage I/O Stack]
       │                                    │
       ▼                                    ▼
[Kernel: Scheduler, Memory Mgr,     [Filesystems / Databases]
 Process Mgr, Network Stack,              │
 Device Drivers]                          │
       │         │                        │
       ▼         ▼                        ▼
[Virtualization]  [Containers]  [Distributed Systems]
       │              │                   │
       ▼              ▼                   ▼
       └──────────► [Cloud Infrastructure / Kubernetes]
                          │
                          ▼
              [Application Runtimes: JVM, Python, Go, Rust]
                          │
                          ▼
                [AI Infrastructure: GPU Clusters, Training,
                 Model Serving, Inference Optimization]

CROSS-CUTTING CONCERNS (affect all layers):
  [Security]         — threat model at every layer
  [Observability]    — instrumentation at every layer
  [Performance]      — constraints and bottlenecks at every layer
  [Reliability]      — failure modes and recovery at every layer

The Practitioner's Heuristic

When you encounter a system problem you don't understand, ask in order:

  1. What resource is exhausted or contended? (CPU, memory, network, storage, lock)
  2. Which layer is the bottleneck? (hardware, kernel, runtime, application, network, distributed)
  3. What is the trust boundary involved? (user/kernel, VM/host, container/host, service/service)
  4. What is the failure mode? (crash, corruption, performance degradation, availability loss)
  5. What is the correct abstraction level to fix it? (hardware config, kernel tuning, application change, architecture change)

This heuristic works because every domain in this archive — from kernel scheduling to Kubernetes pod placement, from CUDA memory coalescing to TCP congestion control — is an instance of the same underlying pattern: multiple entities competing for shared resources, with a mechanism for arbitration, isolation, and recovery.

Understanding that pattern at every layer of the stack is the goal of this archive.