Kernel and OS Terminology Glossary
A comprehensive reference for kernel and operating system terminology. Each entry includes a precise technical definition, related terms, and cross-references.
A
Address Space
The range of memory addresses accessible to a process or the kernel. Each process typically has its own virtual address space, which the operating system and MMU map to physical memory pages. The address space is divided into user space and kernel space regions, with the boundary enforced by the CPU's privilege levels. On 64-bit systems, the theoretical address space is 2^64 bytes, though practical limits are smaller.
- Related terms: Virtual memory, page table, VMA, MMU, ASLR
- See also: Virtual memory, Page table, VMA
ASLR (Address Space Layout Randomization)
A kernel security technique that randomizes the base addresses of key memory regions — including the stack, heap, and loaded libraries — each time a process starts. ASLR makes it significantly harder for attackers to craft exploits that rely on knowing the exact memory location of code or data. The kernel implements ASLR during exec() by adding random offsets to the virtual memory layout. Linux implements several levels of ASLR controlled via /proc/sys/kernel/randomize_va_space.
- Related terms: Address space, exploit, DEP/NX, PIE, KASLR
- See also: KASLR, exec, virtual memory
Barrier (Memory)
A synchronization primitive that enforces ordering constraints on memory operations. A memory barrier prevents the CPU and compiler from reordering reads and writes across the barrier boundary. Hardware architectures with relaxed memory models (such as ARM and POWER) require explicit barriers; x86 provides a relatively strong memory model but still requires barriers for certain operations. The Linux kernel provides smp_mb(), smp_rmb(), and smp_wmb() macros for portable memory barriers.
- Related terms: Memory barrier, RCU, spinlock, cache coherence, LKMM
- See also: Memory barrier, RCU, lock
Boot Loader
A small program that runs immediately after power-on, before the operating system, responsible for loading the kernel image into memory and transferring control to it. Common boot loaders include GRUB2 (for x86 Linux), U-Boot (for embedded systems), and systemd-boot. The boot loader typically reads configuration from a boot partition, decompresses the kernel, sets up initial hardware state, and passes a device tree or boot parameters to the kernel. UEFI-based systems use EFI boot loaders that run as EFI applications.
- Related terms: GRUB, UEFI, EFI, kernel, initramfs, device tree
- See also: EFI, initramfs, kernel module
BPF / eBPF (Extended Berkeley Packet Filter)
A kernel subsystem that allows sandboxed programs to be loaded and run inside the kernel without modifying kernel source or loading kernel modules. eBPF programs are written in a restricted C-like language, compiled to BPF bytecode, and verified by the kernel's BPF verifier before execution. eBPF is used for networking (XDP, tc), observability (tracing syscalls, profiling), and security (seccomp, LSM hooks). The verifier ensures programs are memory-safe and always terminate.
- Related terms: XDP, seccomp, kprobe, tracepoint, BPF verifier, BTF
- See also: XDP, seccomp, softirq
Buddy Allocator
The primary physical memory allocator in the Linux kernel, which manages free memory in power-of-two-sized blocks called "orders." When a block of a given size is needed, the allocator either uses an available block of that size or splits a larger block into two "buddies." When blocks are freed, adjacent buddies are merged back into larger blocks, reducing fragmentation. The buddy allocator operates on pages (order-0) up to 2^MAX_ORDER pages.
- Related terms: Slab allocator, page, huge page, vmalloc, OOM killer
- See also: Slab allocator, page, huge page
C
cgroup (Control Group)
A Linux kernel mechanism for organizing processes into hierarchical groups and applying resource limits, accounting, and isolation to each group. cgroups v1 used a multi-hierarchy model with separate hierarchies per subsystem; cgroups v2 unifies all controllers under a single hierarchy rooted at /sys/fs/cgroup. Subsystems (controllers) include cpu, memory, blkio, and net_cls. cgroups are foundational to containers: Docker, Kubernetes pods, and systemd services all rely on cgroups for resource management.
- Related terms: namespace, container, OOM killer, process, scheduler
- See also: Namespace, OOM killer, scheduler
Context Switch
The kernel operation of saving the CPU state of a currently running process or thread and restoring the saved state of another. A context switch involves saving registers (including the program counter and stack pointer), switching the page table (which causes TLB flushes on most architectures), and updating kernel data structures like task_struct. Context switches are triggered by timer interrupts, blocking syscalls, or explicit yield operations. Frequent context switches add overhead due to TLB invalidations and cache warming costs.
- Related terms: task_struct, scheduler, TLB, process, thread, preemption
- See also: Scheduler, TLB, preemption, task_struct
Copy-on-Write (CoW)
An optimization strategy where two or more entities share the same memory pages until one of them attempts to modify the data, at which point a private copy is made. The Linux kernel uses CoW in fork(): both parent and child initially share the same physical pages (mapped read-only), and a page fault triggers copying only when a write occurs. CoW is also used in filesystems (btrfs, ZFS), virtual disks, and some memory allocators.
- Related terms: Fork, page fault, virtual memory, VMA, page table
- See also: Fork, page fault, VMA
CPU Ring
A hardware privilege level enforced by the CPU that controls what instructions and memory regions a running program may access. x86 defines four rings (0–3); Linux uses ring 0 (kernel mode) and ring 3 (user mode). Ring 0 code can execute privileged instructions (e.g., cli, hlt, I/O port access) and access all memory; ring 3 code cannot. Transitions between rings are controlled and expensive, occurring via system calls, interrupts, and exceptions.
- Related terms: Privilege level, syscall, interrupt, hypervisor, user space
- See also: Privilege level, syscall, trap, hypervisor
D
Deadlock
A condition in which two or more processes or threads are permanently blocked, each waiting for a resource held by another in a circular dependency chain. Deadlocks require four necessary conditions (Coffman conditions): mutual exclusion, hold-and-wait, no preemption of held resources, and circular wait. The Linux kernel uses lockdep to detect potential deadlock cycles at runtime during development. Kernel code avoids deadlocks by enforcing consistent lock ordering and using trylock patterns.
- Related terms: Spinlock, mutex, lockdep, starvation, lock ordering
- See also: Lock (spinlock/mutex), starvation, lockdep
Demand Paging
A virtual memory strategy in which pages are not loaded into physical memory until they are actually accessed. When a process accesses a page that is not currently in RAM, a page fault is triggered, and the kernel loads the required page from disk (or zeroes a new page for anonymous mappings). Demand paging allows processes to start quickly and use only the physical memory they actually touch, enabling efficient memory overcommitment.
- Related terms: Page fault, virtual memory, swap, page cache, OOM killer
- See also: Page fault, swap, page cache
Dentry (Directory Entry)
An in-memory kernel data structure representing a component of a filesystem path. The dentry cache (dcache) maps path components to inodes, enabling fast path lookups without repeated disk reads. Dentries form a tree structure mirroring the filesystem hierarchy and are reference-counted. The VFS layer uses dentries to resolve pathnames; negative dentries (caching misses) also exist to speed up failed lookups.
- Related terms: VFS, inode, superblock, page cache, filesystem
- See also: VFS, inode, superblock
Device Driver
Kernel code that provides an interface between the OS and a specific hardware device or class of devices. Device drivers abstract hardware details and expose standardized interfaces (e.g., block device, character device, network device) to higher kernel layers. Drivers register with the kernel via bus-specific APIs (PCI, USB, platform) and respond to interrupts from their hardware. A poorly written driver is one of the most common causes of kernel panics and oopses.
- Related terms: Kernel module, interrupt, DMA, PCIe, IRQ
- See also: Kernel module, interrupt, DMA, IRQ
DMA (Direct Memory Access)
A hardware capability that allows peripherals to transfer data directly to or from system memory without involving the CPU for each byte. The DMA controller (or IOMMU) manages the transfer, generating an interrupt when complete. DMA enables high-throughput I/O (NICs, NVMe drives) without saturating the CPU. The kernel uses DMA mapping APIs (dma_alloc_coherent, dma_map_single) to manage cache coherency and address translation between the CPU's view and the device's view of memory.
- Related terms: IOMMU, interrupt, PCIe, device driver, cache coherence
- See also: Device driver, interrupt, PCIe, IOMMU
E
Epoch-Based Reclamation
A memory reclamation technique used in concurrent data structures where objects are not freed immediately but are deferred until all threads that could possibly have a reference to them have advanced past a safe point (epoch). Threads read the current global epoch and record their local epoch; an object is safe to free once all threads have advanced past the epoch in which the object was logically deleted. EBR is simpler and lower-overhead than hazard pointers but requires threads to periodically announce their epoch.
- Related terms: RCU, grace period, hazard pointer, memory reclamation
- See also: RCU, grace period
Exception
A synchronous event generated by the CPU in response to an instruction that causes an error condition, such as a divide-by-zero, illegal instruction, or page fault. Unlike asynchronous interrupts, exceptions are tied to the instruction that caused them. The CPU transfers control to an exception handler registered in the IDT (Interrupt Descriptor Table). In Linux, exceptions may be handled transparently (e.g., copy-on-write page faults), deliver a signal to the process (e.g., SIGFPE for FP exceptions), or cause a kernel oops/panic.
- Related terms: Interrupt, trap, page fault, signal, IDT
- See also: Interrupt, trap, page fault, signal
exec (execve)
A family of system calls that replace the current process's memory image with a new program. execve() loads the specified executable (ELF or script with shebang), sets up a new stack, heap, and text segment, applies ASLR layout, and begins executing at the entry point. The process ID remains the same, but the address space and open file descriptors (unless O_CLOEXEC is set) are replaced. exec is typically called after fork() to launch a new program.
- Related terms: Fork, process, address space, ELF, dynamic linker, ASLR
- See also: Fork, address space, VMA
F
Fault (Page Fault)
See Page fault.
Fork
A system call that creates a new process (child) as a near-identical copy of the calling process (parent). After fork(), both parent and child run concurrently from the same point in the code. The kernel uses copy-on-write to efficiently share the parent's memory pages with the child until a write occurs. The child inherits file descriptors, signal handlers, and memory mappings. fork() is the fundamental process creation mechanism in Unix; the child typically calls exec() to replace itself with a new program.
- Related terms: Copy-on-write, exec, process, task_struct, zombie process
- See also: Copy-on-write, exec, zombie process
Futex (Fast Userspace muTEX)
A kernel primitive that enables efficient userspace synchronization by avoiding system calls in the uncontended case. A futex uses a shared integer in userspace; when uncontended, locking and unlocking are pure userspace atomic operations with no kernel involvement. When contention occurs, threads call into the kernel (futex() syscall) to sleep on or wake a wait queue associated with the futex address. POSIX mutexes, condition variables, and semaphores in glibc are built on futexes.
- Related terms: Mutex, spinlock, syscall, wait queue, synchronization
- See also: Lock (spinlock/mutex), syscall, semaphore
G
Grace Period (RCU)
The time interval in RCU during which the kernel ensures that all pre-existing RCU read-side critical sections have completed before freeing or modifying old data. After a writer performs an update and calls synchronize_rcu() (or call_rcu() for an asynchronous callback), it must wait for one full grace period before reclaiming the old version. A grace period ends when all CPUs have passed through a quiescent state (e.g., a context switch, idle, or user-space execution). Grace periods are the core mechanism enabling RCU's wait-free readers.
- Related terms: RCU, quiescent state, epoch-based reclamation, memory reclamation
- See also: RCU, epoch-based reclamation
H
Huge Page
A memory page larger than the standard 4 KB page size, typically 2 MB or 1 GB on x86_64. Using huge pages reduces TLB pressure, since a single TLB entry covers more memory, improving performance for memory-intensive workloads. Linux supports huge pages via HugeTLBFS (explicit, preallocated) and Transparent Huge Pages (THP, which automatically promotes 4 KB pages to 2 MB pages when possible). THP can cause latency spikes due to compaction; some latency-sensitive applications disable it.
- Related terms: TLB, page, page table, buddy allocator, THP
- See also: TLB, page, page table
Hypercall
The virtualization equivalent of a system call: an instruction or ABI by which a guest virtual machine requests services from the hypervisor (VMM). Just as a syscall transitions from user mode to kernel mode, a hypercall transitions from guest mode to hypervisor mode. Common hypercalls include requesting memory balloon inflation, paravirtual I/O operations, and clock synchronization. On x86, hypercalls use vmcall (VMX) or vmmcall (SVM) instructions; KVM and Xen both define hypercall ABIs.
- Related terms: Hypervisor, KVM, VMX, paravirtualization, EPT
- See also: Hypervisor, CPU ring, VMX
Hypervisor
Software (or firmware) that creates and manages virtual machines (VMs), multiplexing the physical hardware among multiple guest operating systems. A Type 1 (bare-metal) hypervisor runs directly on hardware (e.g., KVM, Xen, ESXi); a Type 2 hypervisor runs as a process inside a host OS (e.g., QEMU, VirtualBox). The hypervisor is responsible for CPU virtualization (using hardware-assisted VMX/SVM), memory virtualization (EPT/NPT), and I/O virtualization (VirtIO, SR-IOV).
- Related terms: KVM, VMX, EPT, hypercall, virtual machine, QEMU
- See also: Hypercall, CPU ring, VMX, EPT
I
Inode
A data structure in a filesystem that stores metadata about a file or directory: ownership, permissions, timestamps, size, and pointers to data blocks. Inodes do not store the filename; the dentry maps names to inode numbers. Each file has exactly one inode, but may have multiple hard links (directory entries pointing to the same inode). The VFS layer defines a generic inode structure (struct inode) that each filesystem implements. Running out of inodes (even with disk space remaining) prevents creating new files.
- Related terms: Dentry, VFS, superblock, page cache, filesystem
- See also: Dentry, VFS, superblock
Interrupt
An asynchronous hardware signal that causes the CPU to suspend the current execution context and jump to a registered handler (ISR). Interrupts are used by hardware devices to notify the CPU of events: network packet arrival, disk I/O completion, timer expiry. The CPU saves the current register state, looks up the handler address in the IDT, and executes the ISR in kernel mode. After the ISR completes, the interrupted context is resumed. Interrupts are categorized as maskable (can be disabled with cli) and non-maskable (NMI).
- Related terms: IRQ, interrupt handler, IDT, softirq, exception, NMI
- See also: Interrupt handler, IRQ, softirq, exception
Interrupt Handler (ISR)
The kernel function registered to handle a specific interrupt. Because interrupt handlers run with interrupts partially or fully disabled and preempt arbitrary kernel code, they must be fast, non-blocking, and must not sleep. The Linux interrupt model splits work into a "top half" (the ISR itself, runs immediately with interrupts disabled) and a "bottom half" (deferred work via softirqs, tasklets, or workqueues). The IRQ subsystem manages handler registration via request_irq().
- Related terms: Interrupt, IRQ, softirq, workqueue, tasklet
- See also: Interrupt, softirq, workqueue
IPC (Inter-Process Communication)
The set of mechanisms that allow processes to exchange data and synchronize. Linux IPC mechanisms include: pipes and FIFOs (byte-stream, unidirectional), message queues (discrete messages), shared memory (fastest, requires explicit synchronization), semaphores (counting synchronization), sockets (network and Unix domain), and signals. POSIX IPC and System V IPC are both supported. Modern systems also use D-Bus and Binder (Android) as higher-level IPC frameworks.
- Related terms: Signal, socket, shared memory, semaphore, pipe, futex
- See also: Signal, semaphore, syscall
IRQ (Interrupt Request)
A signal line (physical or logical) on which a device asserts a request for CPU attention, triggering an interrupt. On legacy x86 systems, IRQs were numbered 0–15 via the 8259 PIC; modern systems use the APIC, which supports many more interrupt vectors. The kernel maps IRQ numbers to registered interrupt handlers via the IRQ descriptor table. MSI and MSI-X allow PCIe devices to deliver interrupts directly as memory writes, bypassing IRQ lines and improving scalability.
- Related terms: Interrupt, MSI/MSI-X, APIC, interrupt handler, device driver
- See also: Interrupt, MSI/MSI-X, interrupt handler
J
Jiffies
The kernel's internal time unit, representing the number of timer interrupts (ticks) since system boot. One jiffy equals 1/HZ seconds, where HZ is a compile-time constant (typically 100, 250, or 1000 Hz on Linux). Jiffies are stored in a global jiffies variable and are used for coarse-grained time measurements, scheduling timeslices, and timeout calculations. For high-resolution timing, the kernel uses ktime_t and hardware performance counters.
- Related terms: Scheduler, timer interrupt, HZ, ktime, preemption
- See also: Scheduler, interrupt
K
Kernel Module
See Loadable Kernel Module.
Kernel Oops
A non-fatal kernel error condition where the kernel detects an inconsistency (such as a NULL pointer dereference in kernel space) and prints a diagnostic message (register dump, stack trace, module list) to the kernel log. After an oops, the kernel may continue running, but the process that triggered it is typically killed and the kernel's internal state may be compromised. An oops is distinct from a kernel panic: a panic is fatal, while an oops may be recoverable. CONFIG_PANIC_ON_OOPS can promote oopses to panics.
- Related terms: Kernel panic, BUG(), WARN(), stack trace, KASAN
- See also: Kernel panic, KASAN
Kernel Panic
A fatal, unrecoverable error condition in which the kernel determines that continuing execution would be unsafe and halts the system. Common causes include NULL pointer dereferences in atomic context, stack overflows, unhandled exceptions in kernel mode, and corrupted kernel data structures. On panic, the kernel prints diagnostic information, optionally dumps a core, and either halts or reboots (controlled by kernel.panic sysctl). A kernel panic is the kernel's equivalent of a crash.
- Related terms: Kernel oops, BUG(), WARN(), watchdog, kdump
- See also: Kernel oops, interrupt handler
Kthread (Kernel Thread)
A thread that runs entirely in kernel space, created by the kernel itself rather than a user process. Kernel threads are used for background work such as memory compaction (kcompactd), writeback (kworker), RCU callbacks (rcuop), and network processing. They are created with kthread_create() and kthread_run(), and can be stopped gracefully. Like user threads, kthreads are scheduled by the CFS or RT scheduler but always execute in kernel mode.
- Related terms: Process, workqueue, scheduler, RCU, softirq
- See also: Workqueue, scheduler, RCU
L
Loadable Kernel Module (LKM)
A piece of object code that can be dynamically loaded into a running kernel without rebooting, extending kernel functionality at runtime. LKMs are used for device drivers, filesystems, network protocols, and security modules. When loaded via insmod or modprobe, the module's init function is called; rmmod calls the exit function. LKMs run with full kernel privilege (ring 0) and have unrestricted access to kernel internals, making module security a significant concern.
- Related terms: Kernel module, device driver, BPF/eBPF, LSM, ring
- See also: Device driver, CPU ring
Lock (Spinlock / Mutex)
Synchronization primitives used to protect shared data from concurrent access. A spinlock busy-waits (spins in a loop) until the lock is available, making it suitable only for very short critical sections in contexts where sleeping is not allowed (interrupt handlers, NMI handlers). A mutex (mutual exclusion lock) puts the waiting thread to sleep if the lock is unavailable, making it appropriate for longer critical sections in process context. Linux also provides rwlocks (reader-writer spinlocks) and rw_semaphores for reader-writer access patterns.
- Related terms: Spinlock, mutex, deadlock, lockdep, futex, semaphore
- See also: Deadlock, futex, semaphore, RCU
M
Memory Barrier
An instruction (or compiler directive) that enforces ordering of memory operations. A full memory barrier ensures all loads and stores before it complete before any loads and stores after it. Read barriers (rmb) and write barriers (wmb) enforce ordering for only loads or only stores respectively. Memory barriers are essential for lock-free programming on weakly-ordered architectures (ARM, POWER, RISC-V) and for device driver I/O (where MMIO writes must be ordered). The Linux kernel provides architecture-portable barrier macros.
- Related terms: Barrier (memory), cache coherence, RCU, LKMM, spinlock
- See also: Barrier (memory), RCU
Memory-Mapped I/O (MMIO)
A technique in which hardware device registers are mapped into the CPU's physical address space, allowing the CPU to read and write device registers using ordinary load and store instructions rather than special I/O port instructions. PCIe devices expose their control registers and status via BAR (Base Address Register) regions, which the OS maps into kernel virtual address space using ioremap(). MMIO accesses bypass the CPU cache (using write-combining or uncacheable mappings) to ensure registers are always read/written directly.
- Related terms: PCIe, DMA, device driver, IOMMU, BAR
- See also: DMA, PCIe, device driver
Microkernel
An OS design philosophy in which the kernel contains only the minimal functionality required to run: typically scheduling, basic IPC, and memory management. Device drivers, filesystems, and protocol stacks run as user-space servers. Microkernels offer improved isolation and fault tolerance (a crashing driver doesn't crash the kernel) but historically suffered from IPC overhead. Modern examples include seL4, L4, and QNX. Contrast with monolithic kernel.
- Related terms: Monolithic kernel, unikernel, IPC, process, hypervisor
- See also: Monolithic kernel, unikernel
Monolithic Kernel
An OS architecture in which most OS services — device drivers, filesystems, network stacks, memory management — run in a single large kernel process in ring 0. The Linux and traditional Unix kernels are monolithic. Monolithic kernels achieve high performance via direct function calls between subsystems, but a bug in any component can crash the entire kernel. Linux mitigates this with loadable modules, LSM hooks, and eBPF for dynamic, sandboxed extensibility.
- Related terms: Microkernel, loadable kernel module, unikernel, Linux
- See also: Microkernel, unikernel, loadable kernel module
MSI / MSI-X (Message Signaled Interrupts)
A PCIe interrupt delivery mechanism in which devices signal interrupts by writing to a specific memory address rather than asserting a dedicated IRQ line. MSI eliminates interrupt sharing (multiple devices on one IRQ line) and reduces latency. MSI-X extends MSI to support up to 2048 interrupt vectors per device and allows each vector to be individually configured (different CPU affinity, priority). MSI-X enables modern NICs and NVMe SSDs to use per-CPU interrupt queues for high scalability.
- Related terms: IRQ, interrupt, PCIe, APIC, NVMe
- See also: IRQ, interrupt, PCIe
N
Namespace
A Linux kernel mechanism that partitions global system resources so that each namespace sees its own isolated instance. Linux currently supports eight namespace types: mount (filesystem trees), UTS (hostname/domain), IPC (System V IPC and POSIX MQs), PID (process IDs), network (network stack), user (UIDs/GIDs), cgroup (cgroup hierarchies), and time (system clocks). Namespaces are the core isolation primitive for containers. New namespaces are created via the clone(), unshare(), and setns() syscalls.
- Related terms: cgroup, container, process, VFS, network namespace
- See also: cgroup, process, VFS
NUMA (Non-Uniform Memory Access)
A multiprocessor architecture in which each CPU (or group of CPUs) has its own local memory bank, with faster access to local memory than to memory attached to other CPUs (remote memory). The OS and applications should place data close to the CPU that will use it to minimize remote memory accesses (NUMA penalties). Linux's NUMA-aware scheduler and memory allocator (via numactl and mbind) attempt to allocate memory on the same node as the running thread.
- Related terms: SMP, scheduler, page, memory allocation, cache
- See also: Scheduler, page, context switch
O
OOM Killer (Out-of-Memory Killer)
The kernel subsystem invoked when the system runs out of free memory and cannot reclaim any more through paging or cache eviction. The OOM killer scores all user-space processes using an algorithm that considers memory consumption, runtime, ownership, and OOM score adjustments, then kills the highest-scoring process to free memory. Processes can adjust their OOM score via /proc/[pid]/oom_score_adj. Container environments use cgroup memory limits, which trigger OOM kills at the cgroup level before the global OOM killer is invoked.
- Related terms: Swap, page cache, buddy allocator, cgroup, virtual memory
- See also: Swap, page cache, cgroup
P
Page
The fundamental unit of memory management in modern operating systems, typically 4 KB on x86_64 (though architectures support multiple page sizes). The MMU translates virtual addresses to physical addresses using page tables at page granularity. All physical memory management (allocation, tracking, swapping) is done at page granularity. The kernel represents each physical page with a struct page in the page frame descriptor array (mem_map).
- Related terms: Page table, page fault, buddy allocator, huge page, VMA
- See also: Page table, page fault, huge page, buddy allocator
Page Cache
The in-memory cache of file data (and block device data) managed by the kernel. When a file is read, its pages are loaded into the page cache and remain there for future reads. The page cache uses the buddy allocator for memory and LRU-like eviction policies. Dirty pages (modified data not yet written to disk) are flushed by the writeback subsystem. The page cache also serves as the backing store for memory-mapped files. On Linux, all free memory is effectively used by the page cache.
- Related terms: VFS, inode, dentry, swap, dirty page, writeback
- See also: VFS, inode, demand paging
Page Fault
An exception triggered when a process accesses a virtual memory address that is not currently mapped to a physical page in RAM. The kernel's page fault handler determines the cause: a valid access to a not-yet-mapped page (minor fault, satisfied from page cache or zero-fill), a valid access to a swapped-out page (major fault, requiring disk I/O), or an invalid access (causing a SIGSEGV signal). Page faults are the mechanism by which demand paging and copy-on-write are implemented.
- Related terms: Exception, virtual memory, demand paging, copy-on-write, swap
- See also: Exception, demand paging, copy-on-write, swap
Page Table
A data structure used by the MMU to translate virtual addresses to physical addresses. Linux uses a multi-level page table (4 or 5 levels on x86_64): PGD (Page Global Directory), P4D, PUD (Page Upper Directory), PMD (Page Middle Directory), and PTE (Page Table Entry). Each level is a 4 KB page of 512 8-byte entries. The TLB caches recent translations to avoid full page table walks on every memory access. Each process has its own page table; the CR3 register points to the current page table root.
- Related terms: TLB, page, VMA, virtual memory, MMU, context switch
- See also: TLB, virtual memory, page
PCIe (PCI Express)
A high-speed serial bus standard used to connect CPUs to peripherals such as GPUs, NICs, NVMe SSDs, and FPGAs. PCIe uses a lane-based architecture where each lane provides a full-duplex serial link (e.g., PCIe 4.0 ×16 provides ~32 GB/s bidirectional bandwidth). Devices are enumerated at boot, BAR regions are mapped into the physical address space, and MSI/MSI-X interrupts are configured. PCIe's root complex connects to the CPU; switches extend the topology.
- Related terms: MSI/MSI-X, DMA, IOMMU, NVMe, SR-IOV, BAR
- See also: DMA, MSI/MSI-X, IOMMU
Preemption
The ability of the kernel to interrupt a currently executing task and switch to a higher-priority task. In a non-preemptible kernel, the running thread can only be switched out voluntarily (at syscall boundaries or explicit yield). A preemptible kernel (CONFIG_PREEMPT) allows switching even in the middle of kernel code (excluding regions with preemption disabled). Real-time kernels (PREEMPT_RT) convert most spinlocks to sleeping locks to minimize preemption latency to microseconds.
- Related terms: Scheduler, context switch, real-time, spinlock, RT
- See also: Scheduler, context switch, real-time
Privilege Level
See CPU Ring.
Process
An instance of a running program, consisting of an address space, one or more threads of execution, open file descriptors, signal handlers, and associated kernel resources. The Linux kernel represents processes and threads uniformly as tasks (struct task_struct). A process is created by fork() and transitions through states: running, sleeping (interruptible or uninterruptible), stopped, and zombie. The kernel identifies each process with a PID (process ID) and TGID (thread group ID).
- Related terms: task_struct, fork, exec, thread, context switch, cgroup
- See also: task_struct, fork, exec, zombie process
R
RCU (Read-Copy Update)
A synchronization mechanism in the Linux kernel optimized for read-heavy workloads. RCU allows multiple readers to run concurrently with one writer, with zero overhead for readers (no locks, no atomic operations). Writers create a new version of the data, atomically replace the pointer to the old version, and then wait for a grace period (all ongoing read-side critical sections to finish) before freeing the old version. RCU is used extensively in the kernel for routing tables, file descriptor tables, and network protocol data.
- Related terms: Grace period, quiescent state, epoch-based reclamation, lock, barrier
- See also: Grace period, epoch-based reclamation, lock
Real-Time (RT)
A class of computing where responses to events must occur within deterministic, bounded time limits. Hard real-time systems guarantee worst-case response times (WCET); soft real-time systems aim to minimize average latency. The PREEMPT_RT patch set transforms Linux into a hard real-time OS by converting interrupt handlers and spinlocks into preemptible contexts, achieving worst-case interrupt latencies in the tens of microseconds. RT scheduling policies include SCHED_FIFO and SCHED_RR.
- Related terms: Preemption, scheduler, WCET, latency, interrupt
- See also: Preemption, scheduler, interrupt
Rootkit
Malware that runs at elevated privilege levels (kernel or firmware) to hide its presence and maintain persistent access to a compromised system. Kernel rootkits hook system calls, manipulate VFS operations, or modify kernel data structures (e.g., the process list, module list) to hide processes, files, and network connections. Defending against rootkits requires kernel integrity mechanisms such as Secure Boot, kernel lockdown, module signing, and IMA/EVM.
- Related terms: Kernel module, LSM, IMA, Secure Boot, loadable kernel module
- See also: Loadable kernel module, seccomp
S
Scheduler
The kernel subsystem responsible for deciding which task runs on which CPU at any given moment. Linux uses the Completely Fair Scheduler (CFS) for normal (SCHED_NORMAL/SCHED_BATCH) tasks, which models a weighted "virtual runtime" to approximate fair CPU sharing. Real-time tasks use SCHED_FIFO and SCHED_RR. The scheduler makes decisions on: task wakeup, timer tick (preemption check), migration between CPUs (load balancing), and voluntary sleep. The scheduler interacts heavily with NUMA topology, CPU frequency scaling, and power management.
- Related terms: Context switch, preemption, CFS, real-time, cgroup, NUMA
- See also: Context switch, preemption, real-time, cgroup
Seccomp (Secure Computing Mode)
A Linux kernel security mechanism that restricts the set of system calls a process may invoke. In its basic mode (seccomp mode 1), a process is limited to read, write, exit, and sigreturn. In filter mode (seccomp-BPF, mode 2), a BPF program inspects each syscall and its arguments, returning a verdict: allow, deny with ERRNO, kill the process, or trap to a supervisor. Seccomp is the primary sandboxing mechanism used by Chrome, Firefox, Docker, and most container runtimes.
- Related terms: BPF/eBPF, syscall, LSM, namespace, sandbox
- See also: BPF/eBPF, syscall, namespace
Segmentation Fault (SIGSEGV)
A signal delivered to a process when it attempts to access a memory address that is not valid in its virtual address space: accessing an unmapped region, writing to a read-only mapping, or executing a non-executable region. The kernel's page fault handler raises SIGSEGV when it determines no valid VMA covers the faulting address or the access permission does not match. Stack overflows typically manifest as segfaults when the stack pointer grows past the guard page.
- Related terms: Page fault, signal, VMA, address space, ASLR
- See also: Page fault, signal, VMA
Semaphore
A synchronization primitive that maintains an integer count representing the number of available resources. A wait (P) operation decrements the count; if the count would go negative, the calling thread blocks. A signal (V) operation increments the count and wakes a blocked waiter. Binary semaphores (count 0 or 1) behave like mutexes. Counting semaphores are useful for resource pools. Linux provides both POSIX semaphores (sem_init, sem_wait) and System V semaphores, as well as kernel-internal struct semaphore.
- Related terms: Mutex, futex, lock, IPC, synchronization
- See also: Lock (spinlock/mutex), futex, IPC
Signal
An asynchronous notification mechanism that the kernel uses to inform a process of an event. Signals have predefined numbers (e.g., SIGKILL=9, SIGTERM=15, SIGSEGV=11) and default actions (terminate, core dump, stop, ignore). User processes can register custom signal handlers for most signals; SIGKILL and SIGSTOP cannot be caught or ignored. Signals are delivered at safe points (returning from syscall or interrupt). The kernel supports real-time signals (POSIX.1b) with queuing and priorities.
- Related terms: IPC, exception, interrupt, process, kill(), sigaction
- See also: IPC, exception, process
Slab Allocator
A kernel memory allocator that manages pools of fixed-size objects to efficiently satisfy frequent allocation/deallocation requests for common kernel data structures. The slab allocator avoids fragmentation by maintaining per-type caches (e.g., task_struct cache, inode cache). Linux uses SLUB (the default), SLAB (classic), or SLOB (for very small embedded systems). The allocator is built on top of the buddy allocator and integrates with memory pressure via memory shrinkers.
- Related terms: Buddy allocator, kmalloc, OOM killer, page, fragmentation
- See also: Buddy allocator, page
Softirq (Software IRQ)
A deferred interrupt processing mechanism used to complete work initiated by hardware interrupt handlers. Because interrupt handlers must be brief, they defer expensive or non-urgent work to softirqs, which run at a lower interrupt priority when it is safe to do so. There are a fixed number of softirq types (e.g., NET_TX, NET_RX, TIMER, RCU). When softirq activity is high, the kernel may offload softirq processing to ksoftirqd kernel threads to prevent starvation of user processes.
- Related terms: Interrupt handler, workqueue, tasklet, IRQ, ksoftirqd
- See also: Interrupt, interrupt handler, workqueue
Starvation
A condition in which a process or thread is perpetually denied access to a resource it needs (typically CPU time or a lock), even though the system is not deadlocked. Starvation can occur when higher-priority tasks continuously preempt lower-priority ones, or when a lock protocol favors certain waiters. Linux's CFS scheduler prevents CPU starvation by tracking virtual runtimes and boosting tasks that have not run recently. Priority inversion (a high-priority task blocked by a low-priority one holding a lock) is a related problem.
- Related terms: Deadlock, scheduler, lock, priority inversion, CFS
- See also: Deadlock, scheduler, lock
Superblock
A data structure stored on disk (and cached in memory) that describes the overall state of a filesystem: type, size, block size, inode count, free block count, and a pointer to the root directory inode. Each mounted filesystem has one superblock. The VFS layer caches the superblock in memory via struct super_block. Filesystem-specific superblock operations (read, write, statfs) are provided by each filesystem driver. A corrupted superblock renders the filesystem unreadable; mkfs writes the superblock, and fsck can repair it.
- Related terms: Inode, dentry, VFS, filesystem, mount
- See also: Inode, dentry, VFS
Swap
Disk (or SSD) space used to store memory pages that have been evicted from physical RAM. When the system's physical memory is under pressure, the kernel's swappiness policy evicts cold pages (anonymous data or file-backed pages not recently accessed) to the swap space, freeing physical memory for other uses. Swapping adds high latency (disk access vs. DRAM access). On SSDs, swap is faster but incurs write amplification. Linux also supports zswap (compressed swap cache in RAM) to reduce swap I/O.
- Related terms: Page fault, demand paging, OOM killer, page cache, swappiness
- See also: Page fault, demand paging, OOM killer
Syscall (System Call)
The primary interface by which user-space programs request services from the kernel. A system call transitions the CPU from user mode (ring 3) to kernel mode (ring 0) via a controlled gate mechanism (software interrupt int 0x80 on legacy x86, or the syscall instruction on x86_64). The kernel validates arguments, performs the requested operation, and returns a result. Linux has ~400 system calls. The vDSO maps certain frequently-called syscalls (e.g., gettimeofday) into user space to avoid the mode-switch overhead.
- Related terms: CPU ring, trap, vDSO, process, privilege level
- See also: CPU ring, trap, exec, fork
T
task_struct
The central kernel data structure representing a process or thread in Linux. struct task_struct contains the process state, scheduling information (priority, runtime, wait queues), memory management pointer (mm_struct), open file descriptor table, signal state, credential information (UID/GID), cgroup membership, and much more. It is allocated by the slab allocator when a process is created and freed when the last reference is dropped. The scheduler operates on task_struct instances.
- Related terms: Process, scheduler, fork, mm_struct, cgroup
- See also: Process, scheduler, fork
TLB (Translation Lookaside Buffer)
A hardware cache inside the MMU that stores recent virtual-to-physical address translations. A TLB hit allows a virtual address to be translated to a physical address in a single cycle; a TLB miss requires a page table walk (multiple memory accesses). TLBs are typically small (64–2048 entries), split into instruction and data TLBs. Context switches and page table changes require TLB flushes (shootdowns on multiprocessors via IPIs), which are expensive. PCID (Process Context Identifier) on x86 reduces flush overhead.
- Related terms: Page table, context switch, huge page, PCID, MMU
- See also: Page table, context switch, huge page, PCID
Trap
A synchronous transfer of control from user mode to kernel mode, caused by a software event rather than hardware. Traps include system calls (intentional), exceptions (faults and aborts caused by erroneous instructions), and software breakpoints. Unlike asynchronous interrupts, traps are directly related to the instruction stream. After handling a trap, the kernel returns to the instruction that caused it (or the next one, for syscalls). The x86 int3 instruction triggers a trap used by debuggers.
- Related terms: Syscall, exception, interrupt, CPU ring, IDT
- See also: Syscall, exception, interrupt
U
Unikernel
An OS design in which a single application is compiled together with a minimal OS library into a single-address-space executable that runs directly on bare hardware or a hypervisor, with no user/kernel separation. Unikernels offer very small attack surfaces, fast boot times (milliseconds), and potentially high performance due to zero-overhead OS calls. Examples include MirageOS, OSv, and IncludeOS. Unikernels sacrifice generality (one application per VM) for specialization and security.
- Related terms: Microkernel, monolithic kernel, hypervisor, container
- See also: Microkernel, monolithic kernel
User Space
The CPU execution mode (ring 3 on x86) in which user-level applications run, with restricted access to hardware and memory. User-space processes cannot execute privileged instructions or directly access physical memory, device registers, or other processes' memory. All interaction with the kernel occurs through the controlled syscall interface. The separation between user space and kernel space is the fundamental security boundary in modern operating systems.
- Related terms: CPU ring, syscall, kernel space, privilege level, process
- See also: CPU ring, syscall
V
VFS (Virtual File System)
The kernel abstraction layer that provides a uniform file access interface regardless of the underlying filesystem type. VFS defines a set of generic operations (open, read, write, lookup) and data structures (superblock, inode, dentry, file) that each filesystem must implement. This allows user space to use the same open()/read()/write() calls for ext4, tmpfs, NFS, or procfs. The VFS layer also manages the dentry cache and coordinates page cache interaction.
- Related terms: Inode, dentry, superblock, page cache, filesystem, mount
- See also: Inode, dentry, superblock, page cache
Virtual Memory
The abstraction that gives each process the illusion of a large, contiguous, private address space, independent of physical memory layout. The kernel and MMU cooperate to map virtual addresses to physical addresses via page tables. Virtual memory enables: process isolation (each process has its own address space), demand paging, copy-on-write, memory-mapped files, and overcommitment (allocating more virtual memory than physical RAM). The Linux virtual memory subsystem manages VMAs, page tables, and the page fault handler.
- Related terms: Address space, page table, VMA, TLB, page fault, demand paging
- See also: Address space, page table, VMA, page fault
Vmalloc (Virtually Mapped Memory)
A kernel memory allocator that provides virtually contiguous but physically non-contiguous memory regions in the kernel address space. While kmalloc provides physically contiguous memory (backed by the buddy allocator), vmalloc maps arbitrary physical pages into a contiguous virtual range using page table entries. vmalloc is used for large kernel allocations where physical contiguity is not required (e.g., loading kernel modules). It is slower than kmalloc due to the additional TLB mappings.
- Related terms: Buddy allocator, slab allocator, page table, kmalloc, TLB
- See also: Buddy allocator, slab allocator, page table
VMA (Virtual Memory Area)
A contiguous range of virtual addresses in a process's address space with uniform properties (permissions, mapping type, backing file). The kernel represents each VMA as a struct vm_area_struct. A process's address space is a set of non-overlapping VMAs covering its stack, heap, text segment, mapped files, and anonymous regions. VMAs are stored in an RB-tree for fast lookup during page fault handling. The proc/[pid]/maps file lists all VMAs for a process.
- Related terms: Virtual memory, address space, page fault, mmap, mm_struct
- See also: Virtual memory, address space, page fault
W
Workqueue
A kernel mechanism for deferring work to be executed by kernel threads (kworkers) in process context, allowing sleeping and blocking operations that are not permitted in interrupt handlers or softirqs. Work items are submitted via queue_work() and executed asynchronously. Linux's concurrency-managed workqueue (cmwq) framework automatically manages the pool of kworker threads based on workload. Workqueues are used throughout the kernel for I/O completion, timer callbacks, and driver deferred work.
- Related terms: Softirq, kthread, interrupt handler, tasklet, concurrency
- See also: Softirq, kthread, interrupt handler
Z
Zombie Process
A process that has terminated but whose task_struct (including exit status) has not yet been reaped by its parent via wait(). When a process exits, it becomes a zombie: it holds no resources other than its PID and exit status, but it remains in the process table until the parent reads its exit status. If the parent never calls wait(), zombie processes accumulate; if the parent dies, orphaned zombies are reparented to PID 1 (init/systemd), which should reap them promptly.
- Related terms: Process, fork, task_struct, wait(), orphan, init
- See also: Process, fork, task_struct
End of Kernel and OS Terminology Glossary. Total entries: 60+ terms.