Networking and Distributed Systems Terminology Glossary

A comprehensive reference for networking protocols, distributed systems concepts, and infrastructure terminology. Each entry includes a precise technical definition, related terms, and cross-references.

A

ACK (Acknowledgement)

A control message sent by a receiver to confirm that data has been successfully received. In TCP, every segment carries a cumulative ACK number indicating the next expected byte. Delayed ACKs (coalescing multiple ACKs into one) reduce network overhead but can interact negatively with Nagle's algorithm. In Kafka and other messaging systems, ACK policies determine durability guarantees: acks=0 (no wait), acks=1 (leader only), acks=all (all ISRs). In Raft consensus, ACKs from a quorum confirm log entry commitment.

Related terms: TCP, SYN, flow control, congestion control, Nagle
See also: TCP, flow control, congestion control

Anycast

A routing and addressing scheme where a single IP address is assigned to multiple nodes in different locations, and the network routes traffic to the topologically nearest (lowest-cost) node. Anycast is used by DNS root servers, CDN edge nodes, and DDoS mitigation services. BGP's shortest-path routing naturally implements anycast; each anycast node advertises the same prefix, and routers select the best path. Unlike multicast (one-to-many), anycast is one-to-nearest.

Related terms: Unicast, multicast, BGP, DNS, routing table
See also: BGP, unicast, multicast

B

BGP (Border Gateway Protocol)

The exterior gateway protocol that exchanges routing information between autonomous systems (ASes) on the internet. BGP is a path-vector protocol: each route advertisement carries the full AS path, enabling loop detection and policy-based routing. BGP uses TCP port 179 for session establishment. Internal BGP (iBGP) propagates routes within an AS; external BGP (eBGP) connects ASes. BGP convergence is slow (seconds to minutes), and its trust model (no built-in authentication) makes it vulnerable to route hijacking; RPKI provides origin validation.

Related terms: AS, OSPF, routing table, FIB, RPKI, anycast, ECMP
See also: OSPF, routing table, FIB, anycast

C

CAP Theorem

A theoretical result by Eric Brewer stating that a distributed system can provide at most two of three guarantees simultaneously: Consistency (every read receives the most recent write), Availability (every request receives a non-error response), and Partition tolerance (the system continues operating despite network partitions). Since network partitions are a reality, practical systems must choose between CP (consistent but potentially unavailable during partitions, e.g., ZooKeeper, HBase) and AP (available but potentially inconsistent, e.g., Cassandra, DynamoDB). The PACELC theorem extends CAP to also consider the latency-consistency trade-off when no partition exists.

Related terms: PACELC, eventual consistency, linearizability, consensus, Raft, Paxos
See also: Eventual consistency, linearizability, Raft, Paxos

CIDR (Classless Inter-Domain Routing)

An IP addressing scheme that replaced the old class-based (A/B/C) system by allowing flexible prefix lengths, expressed as address/prefix-length (e.g., 192.168.1.0/24). CIDR enables route aggregation (supernetting), reducing the size of routing tables. A /24 covers 256 addresses; a /16 covers 65,536. CIDR is used universally in modern IP networking, including cloud VPC definitions, Kubernetes pod CIDRs, and BGP prefix advertisements.

Related terms: IP routing, routing table, BGP, NAT, subnetting
See also: IP routing, routing table, BGP

Circuit Breaker

A resilience pattern in distributed systems that prevents cascading failures by detecting repeated failures to a remote service and temporarily short-circuiting calls to that service. The circuit breaker transitions through three states: Closed (normal operation, requests pass through), Open (failures exceeded threshold, requests immediately fail without attempting the call), and Half-Open (test requests allowed to check if the service has recovered). Popularized by Netflix Hystrix; now implemented in Resilience4j, Envoy, and Istio. Named by analogy to electrical circuit breakers.

Related terms: Failover, service mesh, retry, timeout, saga pattern
See also: Failover, service mesh

Congestion Control

Algorithms that regulate the rate at which senders inject data into a network to avoid overloading intermediate routers and causing packet loss. TCP's original congestion control (Reno/CUBIC) uses packet loss as a congestion signal. Modern algorithms (BBR, QUIC's built-in CC) use bandwidth and RTT measurements. Key mechanisms: slow start (exponential increase from cold start), congestion avoidance (linear increase), fast retransmit/fast recovery (avoid full timeout on loss). Explicit Congestion Notification (ECN) allows routers to signal congestion without dropping packets.

Related terms: TCP, flow control, BBR, QUIC, ECN, latency
See also: TCP, flow control, QUIC

Consensus

A fundamental problem in distributed computing: getting a group of nodes to agree on a single value, even in the presence of failures. Consensus algorithms (Paxos, Raft, Zab, Multi-Paxos) guarantee safety (all nodes that decide, decide the same value) and liveness (eventually some node decides) under specific failure models. Consensus is the foundation of replicated state machines, distributed databases, and coordination services (etcd, ZooKeeper). The FLP impossibility theorem shows consensus cannot be solved deterministically in an asynchronous system with even one crash failure.

Related terms: Raft, Paxos, Zab, replication, leader election, linearizability
See also: Raft, Paxos, leader election, linearizability

Consistent Hashing

A hashing technique designed for distributed systems where adding or removing nodes requires remapping only a minimal fraction of keys. Both nodes and keys are placed on a conceptual "ring" (hash circle) using the same hash function. A key is assigned to the first node clockwise from its position on the ring. When a node is added or removed, only the keys between it and its predecessor are remapped. Consistent hashing minimizes rebalancing overhead in distributed caches (Memcached, Redis Cluster), CDNs, and sharded databases.

Related terms: Hash ring, sharding, replica, load balancing, DHT
See also: Hash ring, sharding, replica

CRDT (Conflict-Free Replicated Data Type)

A class of data structures designed to be replicated across multiple nodes in a distributed system and merged without conflicts, guaranteeing eventual consistency without requiring coordination. CRDTs come in two flavors: state-based (CvRDT, merge is commutative/associative/idempotent) and operation-based (CmRDT, operations are commutative). Examples: G-Counter (grow-only counter), OR-Set (add wins set), LWW-Register (last-write-wins), and RGA (sequence CRDT used in collaborative editors like Google Docs).

Related terms: Eventual consistency, vector clock, replica, CAP theorem, Riak
See also: Eventual consistency, vector clock, replica

D

DCCP (Datagram Congestion Control Protocol)

A transport layer protocol (IANA protocol number 33) providing unreliable, connection-oriented datagram transport with built-in congestion control. DCCP fills the gap between TCP (reliable, congestion-controlled) and UDP (unreliable, no congestion control), making it suitable for real-time streaming applications (VoIP, gaming) that need congestion control but not reliability. DCCP supports multiple pluggable congestion control profiles (CCID 2: TCP-like, CCID 3: TFRC). Despite its design merits, DCCP has seen limited deployment; QUIC has largely superseded its role.

Related terms: UDP, TCP, QUIC, congestion control, transport layer
See also: UDP, TCP, QUIC

DDoS (Distributed Denial of Service)

An attack that overwhelms a target system, network link, or service with traffic from many distributed sources (often a botnet), preventing legitimate users from accessing it. Attack vectors include volumetric floods (UDP amplification, ICMP flood), protocol attacks (SYN flood exhausting connection tables), and application-layer attacks (HTTP slowloris). Mitigation techniques include traffic scrubbing centers, anycast diffusion, rate limiting, BGP blackholing, and BPF-based packet filtering. Cloud providers offer DDoS protection services.

Related terms: Anycast, BGP, BPF/eBPF, XDP, rate limiting, amplification
See also: Anycast, BGP, XDP

DNS (Domain Name System)

A hierarchical, distributed naming system that translates human-readable domain names (e.g., example.com) into IP addresses. DNS is a tree-structured database with root servers at the top, TLD servers (.com, .org) below, and authoritative nameservers for individual domains. Resolvers query recursively or iteratively. DNS records include A (IPv4), AAAA (IPv6), CNAME (alias), MX (mail), NS (nameserver), TXT (verification/SPF), SRV (service location), and PTR (reverse). DNSSEC adds cryptographic signatures to prevent cache poisoning.

Related terms: FQDN, anycast, TTL, DNSSEC, resolver, mTLS
See also: Anycast, FQDN

DPDK (Data Plane Development Kit)

A set of user-space libraries and drivers that bypass the Linux kernel network stack for high-performance packet processing. DPDK uses poll-mode drivers (PMDs) that continuously poll NIC receive queues in user space, eliminating interrupt overhead and context switches. Hugepages provide TLB-efficient memory for DMA buffers. DPDK achieves 10–100 Mpps throughput on commodity hardware, enabling software routers, firewalls, and 5G user-plane functions. Requires dedicated CPU cores and is not suitable for general-purpose use.

Related terms: XDP, kernel bypass, SR-IOV, huge page, PMD, zero-copy
See also: XDP, zero-copy networking, SR-IOV

E

Epoll

A Linux kernel I/O event notification interface that efficiently monitors large numbers of file descriptors for readiness events (readable, writable, error). Unlike select()/poll() (which scan all monitored FDs on each call), epoll uses an event-driven model: file descriptors are registered once with epoll_ctl(), and epoll_wait() returns only the FDs that are ready. This gives O(1) performance per active event rather than O(n) per monitored FD. epoll supports edge-triggered (ET) and level-triggered (LT) modes and is the I/O multiplexing foundation for Node.js, Nginx, Redis, and most high-performance servers.

Related terms: Socket, non-blocking I/O, kqueue, io_uring, event loop
See also: Socket, TCP

Eventual Consistency

A consistency model where a distributed system guarantees that, if no new updates are made to a data item, all replicas will eventually converge to the same value. Eventual consistency trades strong consistency for availability and partition tolerance (as per CAP theorem). It is used in DNS (TTL-based propagation), Amazon DynamoDB, Apache Cassandra, and S3. Techniques like read-repair, anti-entropy (gossip), and CRDTs help drive convergence. Tunable consistency (quorum reads/writes) can strengthen guarantees at the cost of latency.

Related terms: CAP theorem, CRDT, vector clock, consensus, replica
See also: CAP theorem, CRDT, linearizability

F

Failover

The process of automatically switching traffic or workloads to a standby replica, secondary node, or backup data center when the primary fails. Failover can be automatic (triggered by health checks, heartbeats, or a consensus-based leader election) or manual. Key metrics: Recovery Time Objective (RTO, how quickly the system recovers), Recovery Point Objective (RPO, how much data can be lost). Active-passive failover wastes the standby capacity; active-active setups use both nodes simultaneously. DNS-based failover uses low TTLs to redirect traffic.

Related terms: RPO/RTO, leader election, replica, circuit breaker, HA
See also: Leader election, RPO/RTO, replica

FIB (Forwarding Information Base)

The optimized data structure (derived from the routing table/RIB) used by the kernel and network hardware to make fast packet forwarding decisions. While the RIB stores full routing information (BGP attributes, multiple paths, administrative distance), the FIB contains only what is needed to forward a packet: destination prefix → next-hop interface mapping. The Linux kernel's FIB uses a TRIE-based structure; hardware ASICs implement the FIB in TCAM for line-rate lookups. FIB entries are populated by routing protocols via the RIB.

Related terms: Routing table, BGP, OSPF, RIB, IP routing, longest prefix match
See also: Routing table, IP routing, BGP

Flow Control

A mechanism that prevents a sender from overwhelming a receiver with data faster than it can be processed. In TCP, flow control is implemented via the receiver's advertised window (rwnd): the receiver tells the sender how much buffer space is available, and the sender limits its unacknowledged data to that amount. Flow control operates end-to-end between sender and receiver, while congestion control operates network-wide. Hardware-level flow control (Ethernet pause frames, PFC in RDMA networks) prevents buffer overflow in switches.

Related terms: Congestion control, TCP, RDMA, window, PFC
See also: Congestion control, TCP

G

Gossip Protocol

A decentralized, peer-to-peer protocol for propagating information (state updates, membership changes, failures) in a distributed system by having each node periodically exchange state with a random subset of peers. Gossip is epidemic: information spreads exponentially fast, reaching all nodes in O(log N) rounds. Gossip is resilient to node failures and network partitions. Used in: Cassandra (ring topology), Consul (service discovery), Serf (cluster membership), and Amazon DynamoDB (anti-entropy). Gossip provides eventual consistency for metadata propagation.

Related terms: Eventual consistency, replica, consistent hashing, membership, anti-entropy
See also: Eventual consistency, replica

H

Hash Ring

A conceptual circular data structure used in consistent hashing where both nodes and keys are mapped to positions on a ring using a hash function. Keys are assigned to the next node clockwise. Adding or removing a node affects only the adjacent portion of the ring. Virtual nodes (multiple hash positions per physical node) improve load distribution. Used in distributed caches, storage systems, and load balancers to achieve stable key-to-node mapping with minimal rebalancing.

Related terms: Consistent hashing, sharding, replica, load balancing
See also: Consistent hashing, sharding

HOL Blocking (Head-of-Line Blocking)

A performance problem where a queue's first item, delayed due to processing or loss, blocks all subsequent items even if they are ready to proceed. In HTTP/1.1, HOL blocking occurs within a single TCP connection: requests are processed in order, so a slow response blocks subsequent ones. HTTP/2 multiplexes streams over one TCP connection but suffers from TCP-level HOL blocking (a lost packet stalls all streams). HTTP/3 (QUIC) solves this by multiplexing over UDP with independent stream loss recovery.

Related terms: HTTP/2, QUIC, pipelining, multiplexing, TCP, latency
See also: HTTP/2, QUIC, pipelining

HTTP/2

The second major version of the HTTP protocol (RFC 7540), introducing binary framing, multiplexed streams, header compression (HPACK), and server push over a single TCP connection. HTTP/2 addresses HTTP/1.1's HOL blocking at the application layer by allowing multiple concurrent request/response streams. Header compression reduces overhead for repeated headers (cookies, user-agent). HTTP/2 is the dominant protocol for web traffic; HTTP/3 (QUIC-based) is its successor and resolves the remaining TCP-level HOL blocking.

Related terms: HTTP/3, QUIC, HOL blocking, TLS, SNI, multiplexing
See also: QUIC, HOL blocking, TLS

I

ICMP (Internet Control Message Protocol)

A network-layer protocol used for diagnostic and control purposes, operating alongside IP (protocol number 1 for ICMPv4, 58 for ICMPv6). ICMP messages include: Echo Request/Reply (ping), Destination Unreachable (various codes for port unreachable, fragmentation needed), Time Exceeded (TTL expiry, used by traceroute), and Redirect. ICMP does not carry application data and is not connection-oriented. ICMPv6 also handles Neighbor Discovery Protocol (NDP), replacing ARP for IPv6.

Related terms: IP routing, ping, traceroute, NDP, MTU, TTL
See also: IP routing, NAT

Idempotent

A property of an operation that can be applied multiple times without changing the result beyond the initial application. In distributed systems, idempotent operations are crucial for safe retry logic: if a network failure leaves a request's fate unknown, it can be retried without risk of double-effects. HTTP GET, PUT, and DELETE are defined as idempotent; POST is not. In messaging systems (Kafka, SQS), exactly-once semantics require idempotent producers. Idempotency keys (unique per-request IDs) enable idempotent APIs.

Related terms: Exactly-once, at-least-once, saga pattern, retry, two-phase commit
See also: Saga pattern, two-phase commit

InfiniBand

A high-speed, low-latency interconnect technology used in HPC clusters and data centers, providing 200+ Gb/s bandwidth per port and sub-microsecond latencies. InfiniBand supports RDMA (Remote Direct Memory Access), allowing a node to read or write directly into another node's memory without CPU involvement. InfiniBand uses a dedicated network fabric with switches, HCAs (Host Channel Adapters), and a subnet manager. It is used in supercomputers (Top500 list), deep learning clusters, and high-performance storage systems.

Related terms: RDMA, RoCE, IB, HCA, MPI, latency
See also: RDMA, RoCE

IP Routing

The process of forwarding IP packets from a source to a destination across potentially many intermediate networks (hops). Each router makes an independent forwarding decision based on the destination IP address and its local FIB, using longest-prefix-match to select the best route. Routing protocols (BGP, OSPF, IS-IS) propagate reachability information between routers. Routing decisions consider metric, administrative distance, and policy. Equal-Cost Multi-Path (ECMP) enables load balancing across multiple equal-cost routes.

Related terms: FIB, BGP, OSPF, ECMP, NAT, routing table, TTL
See also: FIB, BGP, OSPF, routing table

iptables / nftables

Linux kernel frameworks for packet filtering, NAT, and traffic manipulation. iptables (legacy, based on netfilter) uses chains and tables (filter, nat, mangle, raw) to match packets against rules and apply actions (ACCEPT, DROP, DNAT, SNAT). nftables (modern replacement) provides a single framework with better performance, atomic rule updates, and a cleaner syntax. Both hook into the netfilter framework at various points in the kernel's packet processing path. Used for firewalls, container networking (Docker, Kubernetes kube-proxy), and traffic shaping.

Related terms: Netfilter, NAT, conntrack, BPF/eBPF, XDP, eBPF
See also: Netfilter, NAT, XDP

ISR (In-Sync Replica) — Kafka

In Apache Kafka, the set of replicas that are fully caught up with the partition leader's log. Kafka's replication protocol requires that a message be acknowledged by all ISRs before it is considered committed (when acks=all). If a replica falls too far behind (exceeds replica.lag.time.max.ms), it is removed from the ISR. The leader monitors ISR membership; only ISR members are eligible to be elected as the new leader. A smaller ISR reduces durability guarantees while a replica is catching up.

Related terms: Kafka partition, replication, leader election, WAL, ACK
See also: Kafka partition, leader election, WAL

K

Kafka Partition

The fundamental unit of parallelism and ordering in Apache Kafka. Each Kafka topic is divided into one or more partitions; each partition is an ordered, immutable log of records with monotonically increasing offsets. Partitions are distributed across brokers in a cluster, and each partition has one leader and zero or more follower replicas. Producers write to partitions (using key-based or round-robin partitioning); consumers read from partitions independently. The number of partitions determines the maximum consumer parallelism for a consumer group.

Related terms: ISR, leader election, WAL, offset, replication, Kafka
See also: ISR, WAL, leader election

Keepalive

A mechanism for detecting dead connections by sending periodic probe messages when no data has been exchanged for a configured period. TCP keepalives (controlled by SO_KEEPALIVE, tcp_keepalive_time, tcp_keepalive_intvl, tcp_keepalive_probes) allow endpoints to detect silently dropped connections (e.g., due to NAT timeout, firewall state expiry). Application-level keepalives (HTTP Connection: keep-alive, gRPC PING frames) serve the same purpose. Keepalives also maintain NAT mappings and stateful firewall entries.

Related terms: TCP, NAT, socket, connection, mTLS
See also: TCP, NAT, socket

L

Latency

The time elapsed from the initiation of a request or operation to the receipt of a response or completion of the operation. Network latency components: propagation delay (speed of light × distance), transmission delay (packet size / link bandwidth), queuing delay (time in router buffers), and processing delay. In distributed systems, latency is often expressed as percentile distributions (p50, p95, p99) to capture tail behavior. Tail latency is critical for large-scale services where the slowest component in a fanout determines overall response time.

Related terms: RTT, p50/p95/p99, HOL blocking, congestion, queueing theory
See also: RTT, p50/p95/p99

Leader Election

The process by which nodes in a distributed system select one node to act as the coordinator (leader) for a given role. Leaders typically handle writes, coordinate replication, and make decisions on behalf of the group. Leader election must be safe (at most one leader at a time) and live (a new leader is elected if the current one fails). Algorithms include Raft's term-based election (majority vote), ZAB in ZooKeeper, and Bully algorithm. etcd and ZooKeeper are commonly used as leader election services.

Related terms: Raft, Paxos, consensus, failover, heartbeat, ISR
See also: Raft, Paxos, consensus, failover

Linearizability

The strongest consistency model for distributed systems, also called atomic consistency. An execution is linearizable if every operation appears to take effect instantaneously at some point between its invocation and completion, and the total order of operations is consistent with real time. Linearizability makes a distributed object behave as if it were on a single node. It is the consistency guarantee provided by consensus-backed systems: etcd, ZooKeeper, and Google Spanner. Linearizability is stronger than sequential consistency and serializability (database context).

Related terms: CAP theorem, consensus, eventual consistency, Raft, Paxos
See also: CAP theorem, consensus, eventual consistency

Liveness

A property of a distributed system guaranteeing that the system will eventually make progress or respond to requests (something good will eventually happen). Contrast with safety (something bad will never happen). In consensus algorithms, liveness means a decision will eventually be reached. In message queues, liveness means messages will eventually be delivered. Liveness can be violated by deadlock, livelock, starvation, or network partitions. FLP impossibility shows that strict liveness cannot be guaranteed in fully asynchronous systems under crash failures.

Related terms: Safety, consensus, deadlock, CAP theorem, Raft
See also: CAP theorem, consensus

Load Balancing

The distribution of incoming requests or workloads across multiple backend servers or replicas to maximize throughput, minimize response time, and avoid overloading any single node. Algorithms include: Round Robin, Weighted Round Robin, Least Connections, IP Hash (for session affinity), and Power of Two Choices. Layer 4 (L4) load balancers operate at the transport layer (TCP/UDP); Layer 7 (L7) load balancers operate at the application layer and can route based on URL, headers, or content. Hardware LBs (F5), software LBs (HAProxy, Nginx, Envoy), and cloud LBs (AWS ALB/NLB) are all common.

Related terms: Consistent hashing, service mesh, anycast, round-robin, health check
See also: Consistent hashing, service mesh, anycast

M

mTLS (Mutual TLS)

An extension of TLS where both the client and server authenticate each other using X.509 certificates. Standard TLS only authenticates the server; mTLS provides two-way authentication, ensuring both parties are who they claim to be. mTLS is foundational to zero-trust network architectures and service meshes (Istio, Linkerd), where every service-to-service call is authenticated and encrypted. Certificate management (rotation, revocation) is a significant operational challenge for mTLS deployments.

Related terms: TLS, service mesh, SNI, PKI, X.509, zero-trust
See also: TLS, service mesh, SNI

Multicast

A network addressing and routing scheme for one-to-many communication where a single packet is delivered to a group of interested receivers. IP multicast uses class D addresses (224.0.0.0–239.255.255.255). Receivers join multicast groups using IGMP (IPv4) or MLD (IPv6). Network routers use PIM (Protocol Independent Multicast) to build multicast distribution trees. Multicast is used for IPTV, financial market data feeds, and video conferencing. Anycast selects the single nearest receiver; multicast delivers to all group members.

Related terms: Unicast, anycast, IGMP, PIM, BGP, multicast routing
See also: Unicast, anycast, BGP

N

NAT (Network Address Translation)

A technique that translates IP addresses and/or port numbers in packet headers as they transit a router or firewall, allowing multiple devices on a private network to share a single public IP address. SNAT (Source NAT) translates the source address of outbound packets; DNAT (Destination NAT) translates the destination address of inbound packets (used for port forwarding and load balancing). NAT breaks end-to-end connectivity (a fundamental Internet principle) and complicates protocols that embed IP addresses in payloads (FTP, SIP). Connection tracking (conntrack) maintains state for NAT mappings.

Related terms: Conntrack, iptables, CIDR, IP routing, keepalive, STUN
See also: iptables/nftables, IP routing, conntrack

Netfilter

The Linux kernel framework that provides hooks throughout the network stack for packet filtering, NAT, and stateful connection tracking. Netfilter hooks intercept packets at five points: PREROUTING, INPUT, FORWARD, OUTPUT, and POSTROUTING. iptables and nftables register rules at these hooks. Connection tracking (nf_conntrack) enables stateful firewalls and NAT. XDP bypasses netfilter for highest performance. Netfilter is also used by Kubernetes kube-proxy (iptables and IPVS modes) for service load balancing.

Related terms: iptables/nftables, NAT, conntrack, XDP, BPF/eBPF, kube-proxy
See also: iptables/nftables, XDP, NAT

Network Namespace

A Linux kernel mechanism that provides isolated instances of the entire network stack, including interfaces, routing tables, firewall rules, and socket tables. Each network namespace has its own set of network interfaces (including a loopback), routing tables, ARP/neighbor tables, and netfilter rules. Processes in different network namespaces cannot see each other's sockets. Container runtimes create a new network namespace per container and connect it to the host via veth pairs. Network namespaces are the basis for Kubernetes pod networking.

Related terms: Namespace, veth, container, pod, routing table
See also: Namespace, veth, VLAN, VXLAN

NFS (Network File System)

A distributed file system protocol (originally developed by Sun Microsystems) that allows a client to access files on a remote server as if they were local. NFSv3 is stateless (all state in client and server, easy recovery); NFSv4 is stateful, adds strong security (Kerberos, RPCSEC_GSS), locking semantics, and compound operations. NFS uses Sun RPC/XDR for transport. Performance is affected by network latency and the choice of mount options (sync vs. async, close-to-open cache coherency). NFS is foundational in enterprise and HPC shared storage.

Related terms: VFS, RPC, inode, mount, Kerberos, SMB
See also: VFS, RPC

NVMe-oF (NVMe over Fabrics)

A protocol extension that allows NVMe commands (designed for local PCIe SSDs) to be transported over network fabrics, providing remote storage access with latencies comparable to local NVMe. NVMe-oF supports multiple transports: RDMA (NVMe/RDMA over InfiniBand or RoCE), Fibre Channel (NVMe/FC), and TCP (NVMe/TCP, most widely adopted due to ubiquitous TCP infrastructure). Used in disaggregated storage architectures where compute and storage nodes are separate, enabling flexible scaling.

Related terms: NVMe, RDMA, RoCE, InfiniBand, iSCSI, storage fabric
See also: RDMA, RoCE, NVMe

O

OSPF (Open Shortest Path First)

A link-state interior gateway protocol (IGP) used within an autonomous system to compute shortest-path routes. Each OSPF router maintains a complete map of the network topology (LSDB, Link-State Database) and runs Dijkstra's shortest-path algorithm to compute its routing table. OSPF routers form adjacencies, flood LSAs (Link State Advertisements) to all other routers, and elect a Designated Router (DR) on broadcast networks to reduce flooding overhead. OSPFv2 is for IPv4; OSPFv3 for IPv6. OSPF converges much faster than BGP.

Related terms: BGP, IS-IS, routing table, FIB, LSA, adjacency
See also: BGP, routing table, FIB

P

Paxos

A family of consensus protocols for reaching agreement among distributed nodes in the presence of crash failures, designed by Leslie Lamport. Basic Paxos requires two phases: Prepare/Promise (a proposer secures a ballot number from a majority) and Accept/Accepted (the proposer proposes a value, accepted by a majority). Multi-Paxos optimizes by electing a stable leader to skip Phase 1 for subsequent rounds. Paxos is notoriously difficult to implement correctly; Raft was designed as a more understandable alternative. Used in Google Chubby, Spanner, and Zab (ZooKeeper).

Related terms: Raft, consensus, leader election, quorum, linearizability
See also: Raft, consensus, leader election

PCIe

See PCIe in the Kernel and OS Glossary (01-kernel-and-os-terms.md). In networking context: PCIe is the bus used to attach high-speed NICs (100/400 GbE) and SmartNICs to servers; PCIe bandwidth can become a bottleneck for line-rate processing.

See also: 01-kernel-and-os-terms.md#PCIe, DPDK, SR-IOV

Pipelining

A technique where a client sends multiple requests without waiting for a response to each, allowing requests to be "in flight" simultaneously. HTTP/1.1 supports request pipelining (but not response reordering, leading to HOL blocking). TCP uses pipelining via its sliding window. CPU instruction pipelines are the hardware analogue. In database systems, pipelining query execution stages (scan → filter → aggregate) improves throughput. Pipelining improves efficiency when latency is significant relative to request processing time.

Related terms: HOL blocking, HTTP/2, QUIC, TCP window, latency
See also: HOL blocking, HTTP/2, QUIC

Q

QUIC

A modern transport protocol (RFC 9000) developed by Google and standardized by the IETF, designed as the transport layer for HTTP/3. QUIC runs over UDP and provides: multiplexed streams without TCP HOL blocking (a lost packet only blocks the stream it belongs to, not others), 0-RTT connection resumption, built-in TLS 1.3 encryption, and improved connection migration (connections survive IP address changes). QUIC moves the transport protocol to user space (implemented in QUIC libraries rather than the kernel), enabling faster evolution.

Related terms: HTTP/2, HTTP/3, UDP, TLS, HOL blocking, congestion control
See also: HTTP/2, UDP, TLS, HOL blocking

R

Raft

A consensus algorithm designed to be more understandable than Paxos, published by Diego Ongaro and John Ousterhout in 2014. Raft decomposes consensus into leader election, log replication, and safety. A Raft cluster elects a single leader (via randomized election timeouts); the leader appends all writes to its log and replicates to followers; an entry is committed once a majority acknowledges it. Leader failures trigger new elections. Raft is implemented in etcd (used by Kubernetes), CockroachDB, TiKV, and Consul.

Related terms: Paxos, consensus, leader election, etcd, log replication, quorum
See also: Paxos, consensus, leader election

RDMA (Remote Direct Memory Access)

A technology that allows a process on one machine to directly read from or write to the memory of a process on another machine, bypassing the OS kernel and CPU on the remote end. RDMA offers extremely low latency (sub-microsecond) and high bandwidth while consuming minimal CPU. It requires special NICs (RNIC or HCA) that handle the RDMA protocol in hardware. RDMA is used in HPC (MPI over RDMA), high-frequency trading, distributed databases (FaRM, RAMCloud), and machine learning parameter servers. RDMA transports include InfiniBand, RoCE, and iWARP.

Related terms: InfiniBand, RoCE, NVMe-oF, zero-copy, DMA, DPDK
See also: InfiniBand, RoCE, zero-copy networking

Replica

A copy of data maintained on a separate node to provide fault tolerance, improved read throughput, or geographic distribution. Replication strategies: synchronous (write confirmed only after all replicas acknowledge — strong durability, higher latency), asynchronous (write returns after leader writes — lower latency, risk of data loss), and semi-synchronous (write returns after a subset of replicas acknowledge). Replicas can be homogeneous (any can serve reads, as in Raft) or differentiated (leader for writes, followers for reads, as in MySQL replication).

Related terms: ISR, Raft, Paxos, consensus, failover, CAP theorem
See also: ISR, Raft, consensus, failover

RoCE (RDMA over Converged Ethernet)

A network protocol enabling RDMA over standard Ethernet infrastructure, providing InfiniBand-like performance without requiring a dedicated InfiniBand fabric. RoCEv1 operates at Layer 2 (same broadcast domain); RoCEv2 operates at Layer 3 (routable over IP/UDP), making it practical for data center deployment. RoCE requires lossless Ethernet (Priority Flow Control, ECN) since RDMA operations are sensitive to packet loss. Widely used in GPU clusters (AI/ML training), storage systems, and high-frequency trading infrastructure.

Related terms: RDMA, InfiniBand, PFC, ECN, NVMe-oF, DPDK
See also: RDMA, InfiniBand, NVMe-oF

Routing Table

The data structure maintained by a router or host OS that maps destination network prefixes to next-hop information. Entries include: destination prefix, next-hop IP or interface, metric (cost), and source (static, OSPF, BGP). The routing table (RIB) stores policy information; the FIB is the optimized forwarding table derived from it. The Linux kernel maintains per-protocol routing tables (main, local, policy) and uses ip route for management. Default routes (0.0.0.0/0) match any destination when no more specific route exists.

Related terms: FIB, BGP, OSPF, IP routing, ECMP, PBR
See also: FIB, IP routing, BGP

RPO / RTO (Recovery Point Objective / Recovery Time Objective)

Key metrics in business continuity and disaster recovery planning. RPO defines the maximum acceptable amount of data loss measured in time: a 1-hour RPO means at most 1 hour of data can be lost. RTO defines the maximum acceptable time to restore service after a failure. RPO drives backup frequency and replication strategies; RTO drives failover automation, redundancy, and disaster recovery testing. Lower RPO/RTO values require more sophisticated infrastructure (synchronous replication, hot standby) and are more expensive to achieve.

Related terms: Failover, replica, WAL, HA, backup, PITR
See also: Failover, replica, WAL

S

Saga Pattern

A pattern for managing distributed transactions across multiple microservices without two-phase commit. A saga is a sequence of local transactions, each publishing events or messages that trigger the next step. If a step fails, compensating transactions are executed to undo the previous steps (backwards recovery). Sagas come in two coordination styles: choreography (services react to events from each other, no central coordinator) and orchestration (a central saga orchestrator directs participants). Sagas trade ACID atomicity for availability and loose coupling.

Related terms: Two-phase commit, idempotent, eventual consistency, microservices
See also: Two-phase commit, idempotent, eventual consistency

SDN (Software-Defined Networking)

An architecture that decouples the control plane (routing decisions) from the data plane (packet forwarding), centralizing network control in a software controller. SDN controllers (OpenDaylight, ONOS, Ryu) communicate with switches via southbound APIs (OpenFlow, gRPC/gNMI). This enables programmatic network configuration, dynamic traffic engineering, and rapid service deployment. Cloud providers use SDN for their virtual network overlays (AWS VPC, Google Cloud VPC). eBPF-based networking (Cilium) applies SDN principles within individual nodes.

Related terms: OpenFlow, veth, VXLAN, VLAN, controller, data plane
See also: VXLAN, VLAN, BPF/eBPF

Service Mesh

An infrastructure layer for service-to-service communication in microservices architectures, typically implemented as a sidecar proxy (Envoy, Linkerd-proxy) alongside each service instance. The service mesh handles: mutual TLS authentication, load balancing, circuit breaking, retries, timeout propagation, distributed tracing, and metrics collection — transparently, without application code changes. The control plane (Istio, Linkerd, Consul Connect) configures the proxies. The data plane (the proxies themselves) enforces policy on each request.

Related terms: mTLS, circuit breaker, load balancing, Envoy, Istio, sidecar
See also: mTLS, circuit breaker, load balancing

Sharding

A horizontal scaling technique that partitions a large dataset or workload across multiple nodes (shards), with each shard holding a subset of the data. Each shard is an independent database or service instance, enabling parallelism. Sharding strategies: range-based (contiguous key ranges per shard, prone to hot spots), hash-based (hash of key modulo N shards, uniform distribution), and directory-based (lookup table mapping keys to shards). Resharding (changing the number of shards) is operationally complex; consistent hashing minimizes data movement.

Related terms: Consistent hashing, hash ring, replica, CRDT, distributed database
See also: Consistent hashing, hash ring, replica

SLA / SLO / SLI

SLI (Service Level Indicator): A quantitative measure of service behavior (e.g., request success rate, p99 latency, error rate). SLO (Service Level Objective): A target value or range for an SLI (e.g., "99.9% of requests succeed in under 200ms"). SLA (Service Level Agreement): A binding contract between a provider and customer that includes SLOs plus consequences for missing them (credits, penalties). SLOs define the reliability target; error budgets (100% minus the SLO) determine how much unreliability is acceptable before changes are frozen.

Related terms: SRE, latency, p50/p95/p99, error budget, availability
See also: RPO/RTO, latency

SNI (Server Name Indication)

A TLS extension that allows a client to specify the hostname it is connecting to during the TLS handshake, before any certificate is presented. SNI enables a single server (or load balancer) to host multiple TLS-protected virtual servers on the same IP address and port, presenting the appropriate certificate for each hostname. Without SNI, only one certificate could be served per IP:port combination. SNI is now universally supported and is used by reverse proxies, CDNs, and Kubernetes Ingress controllers for TLS termination routing.

Related terms: TLS, mTLS, HTTP/2, load balancing, virtual hosting
See also: TLS, mTLS, load balancing

Socket

An endpoint for network communication represented as a file descriptor in Unix-like systems. Sockets abstract the transport layer: an application creates a socket, binds/connects it, and reads/writes data using standard file I/O semantics. Socket types: SOCK_STREAM (TCP, connection-oriented), SOCK_DGRAM (UDP, connectionless), SOCK_RAW (raw IP packets). Unix domain sockets provide IPC via the filesystem namespace. The socket API (Berkeley sockets) is the universal networking interface across platforms.

Related terms: TCP, UDP, epoll, bind, connect, IPC, veth
See also: TCP, UDP, epoll

SR-IOV (Single Root I/O Virtualization)

A PCIe specification that allows a single physical NIC to present multiple lightweight Virtual Functions (VFs) to VMs or containers, each appearing as a separate PCIe device with direct hardware access. This bypasses the software virtual switch, dramatically reducing I/O latency and CPU overhead for virtualized workloads. The Physical Function (PF) manages the device; VFs are assigned to guests via VFIO or directly attached to containers. SR-IOV is used in NFV (Network Function Virtualization) and cloud provider bare-metal or enhanced networking offerings.

Related terms: DPDK, RDMA, PCIe, VF, VFIO, NIC
See also: DPDK, PCIe, RDMA

T

TCP (Transmission Control Protocol)

A connection-oriented, reliable, ordered, error-checked transport protocol (IANA protocol number 6) that provides a byte stream abstraction over IP. TCP uses a three-way handshake (SYN, SYN-ACK, ACK) for connection establishment and a four-way handshake (FIN/FIN-ACK or RST) for termination. Key mechanisms: sequence numbers (ordering and duplication detection), acknowledgements (reliability), sliding window (flow control), congestion control (slow start, CWND management). TCP is the transport for HTTP, SMTP, SSH, and most internet applications.

Related terms: UDP, QUIC, ACK, flow control, congestion control, socket
See also: UDP, QUIC, ACK, congestion control, flow control

TLS (Transport Layer Security)

A cryptographic protocol providing authenticated, encrypted communication over a network, successor to SSL. TLS 1.3 (RFC 8446) is the current standard, offering forward secrecy (ephemeral key exchange), reduced handshake latency (1-RTT, with 0-RTT session resumption), and a simplified cipher suite. The TLS handshake authenticates the server (and optionally the client in mTLS) via certificates, negotiates encryption parameters, and establishes session keys. TLS is used by HTTPS, SMTP, IMAP, and most secure protocols.

Related terms: mTLS, SNI, HTTP/2, QUIC, PKI, X.509, certificate
See also: mTLS, SNI, HTTP/2, QUIC

Two-Phase Commit (2PC)

A distributed transaction protocol that ensures all-or-nothing atomicity across multiple nodes. Phase 1 (Prepare): a coordinator sends a prepare request to all participants; each participant writes the transaction to a durable WAL and votes Yes or No. Phase 2 (Commit/Abort): if all vote Yes, the coordinator sends Commit; if any votes No, it sends Abort. 2PC is blocking: if the coordinator crashes after Phase 1, participants hold locks indefinitely waiting for Phase 2. Three-phase commit (3PC) and Paxos-based commit improve availability at higher complexity.

Related terms: Saga pattern, consensus, Paxos, WAL, ACID, idempotent
See also: Saga pattern, consensus, WAL

U

UDP (User Datagram Protocol)

A connectionless, unreliable transport protocol (IANA protocol number 17) providing a simple send-and-forget datagram service. UDP has minimal overhead (8-byte header vs. 20+ bytes for TCP) and no connection setup, making it suitable for latency-sensitive applications (DNS, NTP, gaming, VoIP) that handle their own reliability. UDP supports broadcasting and multicasting. QUIC and most modern real-time protocols are built on UDP. DPDK and XDP can process UDP at line rate with minimal CPU overhead.

Related terms: TCP, QUIC, DCCP, DNS, NTP, multicast, socket
See also: TCP, QUIC, DCCP

Unicast

The standard network communication model where a packet is addressed to and delivered to exactly one destination. The vast majority of internet traffic (HTTP, SSH, SMTP) is unicast. IPv4 unicast addresses fall outside multicast (224.0.0.0/4) and broadcast (255.255.255.255) ranges. Unicast routing uses the standard routing table with destination-based forwarding. Contrast with anycast (nearest of multiple nodes), multicast (all group members), and broadcast (all nodes on a segment).

Related terms: Anycast, multicast, IP routing, routing table, BGP
See also: Anycast, multicast, IP routing

V

Vector Clock

A data structure used in distributed systems to capture causal relationships between events. Each node maintains a vector of logical timestamps (one entry per node). When an event occurs, the node increments its own counter. When a message is sent, the sender includes its vector; the receiver takes the element-wise maximum and increments its own counter. Vector clocks determine causality: if vector A ≤ vector B element-wise, A happened-before B; otherwise, they are concurrent. Used in Dynamo-style databases and CRDTs for conflict detection.

Related terms: CRDT, eventual consistency, Lamport clock, replica, causal consistency
See also: CRDT, eventual consistency

Veth (Virtual Ethernet Pair)

A Linux kernel virtual network device that comes in connected pairs: data written to one end appears as received data on the other end. Veth pairs act as a virtual "patch cable" between two network namespaces, enabling container networking. Typically, one end is placed in a container's network namespace and the other in the host or a bridge namespace. Container runtimes (Docker, containerd, CRI-O) and Kubernetes use veth pairs to give containers their network interfaces. Veth pairs are also used to connect containers to OVS (Open vSwitch) bridges.

Related terms: Network namespace, container, bridge, OVS, VXLAN
See also: Network namespace, VLAN, VXLAN

VLAN (Virtual LAN)

A Layer 2 networking mechanism that partitions a physical network into multiple isolated logical networks using 802.1Q tag headers (4 bytes, including a 12-bit VLAN ID, supporting 4094 VLANs). Switches use VLAN tags to separate traffic between networks without requiring separate physical infrastructure. Trunk ports carry traffic for multiple VLANs (tagged); access ports connect to end devices on one VLAN (untagged). VLANs provide broadcast domain isolation, security segmentation, and multi-tenant network sharing in data centers and enterprises.

Related terms: VXLAN, network namespace, bridge, 802.1Q, SDN
See also: VXLAN, network namespace, SDN

VXLAN (Virtual Extensible LAN)

A network overlay encapsulation protocol (RFC 7348) that encapsulates Layer 2 Ethernet frames in UDP/IP packets (default port 4789), allowing Layer 2 network segments to span Layer 3 IP boundaries. VXLAN uses a 24-bit VXLAN Network Identifier (VNI), supporting up to 16 million logical networks (vs. 4094 for VLANs). VTEP (VXLAN Tunnel Endpoint) devices encapsulate and decapsulate traffic. VXLAN is the standard overlay protocol for cloud SDN (AWS VPC, Azure VNet, Kubernetes CNI plugins like Flannel, Calico, Cilium).

Related terms: VLAN, veth, SDN, overlay, VTEP, container networking
See also: VLAN, veth, SDN

W

WAL (Write-Ahead Log)

A durability mechanism where all changes are written sequentially to an append-only log on durable storage before being applied to the primary data store. WAL ensures durability: if a crash occurs, the log can be replayed to recover committed transactions. WAL also enables replication (streaming the log to followers) and point-in-time recovery. Used in: PostgreSQL, MySQL InnoDB (redo log), RocksDB, Kafka (each partition is a WAL), and etcd. Sequential WAL writes are significantly faster than random writes to the data file.

Related terms: Two-phase commit, Kafka partition, ISR, replication, PITR, journaling
See also: Two-phase commit, Kafka partition

X

XDP (eXpress Data Path)

A high-performance packet processing framework in the Linux kernel that allows eBPF programs to be executed at the earliest point in the network receive path, before the kernel allocates a socket buffer (skb). XDP programs can drop, pass, redirect, or modify packets at near line rate. XDP supports three modes: native (driver support, highest performance), offloaded (processing on the NIC hardware), and generic (any driver, lower performance). Used for DDoS mitigation, load balancing (Facebook Katran, Cloudflare), and software routing.

Related terms: BPF/eBPF, DPDK, netfilter, iptables, zero-copy, NIC
See also: BPF/eBPF, DPDK, netfilter, zero-copy networking

Zero-Copy Networking

A class of techniques that eliminate unnecessary data copies between kernel and user space during network I/O, reducing CPU overhead and memory bandwidth consumption. Traditional networking: data is copied from the NIC to kernel buffers, then to user-space buffers. Zero-copy approaches: sendfile() (kernel-to-kernel copy avoiding user space), splice(), scatter-gather DMA, and RDMA (zero copies end-to-end). DPDK achieves zero copy by mapping NIC DMA buffers directly into user space. Critical for high-throughput applications: web servers serving large files, storage gateways, video streaming.

Related terms: DPDK, RDMA, DMA, sendfile, splice, XDP
See also: DPDK, RDMA, DMA

End of Networking and Distributed Systems Terminology Glossary. Total entries: 60+ terms.