Kernel Memory Model

Technical Overview

The kernel's view of memory is fundamentally different from a user-space process's view. While a user process sees a clean, contiguous virtual address space from 0 to some maximum, the kernel must simultaneously manage: its own virtual address space (for its code, data, stacks, and temporary mappings), the mapping of all physical memory, the virtual memory of every user process, a pool of virtually contiguous but physically non-contiguous allocations, and a highly optimized object cache for the thousands of fixed-size structures it creates and destroys every second.

Understanding the kernel's memory model is essential for writing kernel code, debugging memory issues, and understanding performance behavior. Many kernel bugs — use-after-free, slab corruption, virtual memory leaks — are comprehensible only when you understand how memory is laid out and allocated.

Prerequisites

01-kernel-data-structures.md: struct page usage patterns
02-kernel-initialization.md: mm_init() in the boot sequence
Understanding of virtual memory and page tables (user-space perspective)
03-cpu-privilege-rings.md: kernel address space location

Core Content

Kernel Virtual Address Space Layout (x86-64)

The full 64-bit virtual address space is 128 PiB (2^47 bytes with 4-level paging, 2^56 with 5-level). Half belongs to user space, half to the kernel. The kernel half is fixed — the same mapping exists in every process's page table (modulo KPTI per-process kernel shadows). On a 4-level paging x86-64 kernel (Linux default before 5-level):

Kernel Virtual Address Space (x86-64, 4-level paging)

0xffff800000000000 (+128 TiB boundary, start of kernel VA space)
│
│  [Non-canonical hole: 0x00007fffffffffff to 0xffff800000000000]
│
0xffff888000000000  ──── Direct Physical Memory Mapping ──────────────
│  Maps all physical RAM directly into kernel VA space
│  Physical address P → virtual address 0xffff888000000000 + P
│  Size: up to 64 TiB of physical RAM mapped here
│  Accessed via __va(phys_addr) macro
│
0xffffc90000000000  ──── vmalloc / ioremap area ──────────────────────
│  Virtually contiguous, physically non-contiguous allocations
│  vmalloc(), ioremap(), vmap()
│  Size: 32 TiB
│
0xffffe90000000000  ──── Hole ──────────────────────────────────────────
│
0xffffea0000000000  ──── Virtual memory map (vmemmap) ──────────────────
│  Compact array of struct page entries
│  struct page for physical page N → vmemmap + N * sizeof(struct page)
│  Size: 1 TiB
│
0xfffffbff80000000  ──── Kernel modules ──────────────────────────────
│  Loadable kernel modules mapped here
│  Near kernel text for short (32-bit) call/jump instructions
│  Size: ~1.5 GiB
│
0xffffffff80000000  ──── Kernel text (vmlinux) ──────────────────────
│  Kernel code, read-only data, init sections
│  "at -2 GiB" from end of address space
│  Size: typically 20-50 MiB
│
0xffffffffffffffff  ──── End of virtual address space ─────────────────

Key macros (arch/x86/include/asm/page_64.h, arch/x86/include/asm/pgtable_64_types.h):

#define PAGE_OFFSET     _AC(0xffff888000000000, UL)
#define __va(x)         ((void *)((unsigned long)(x) + PAGE_OFFSET))
#define __pa(x)         ((unsigned long)(x) - PAGE_OFFSET)

// Physical page 0x1000 → virtual 0xffff888000001000
void *kernel_ptr = __va(0x1000);     // → 0xffff888000001000
unsigned long phys = __pa(kernel_ptr); // → 0x1000

With 5-level paging (CONFIG_X86_5LEVEL, enabled on machines with >128 TiB RAM), all offsets shift and the direct mapping starts at 0xff11000000000000.

`struct page`: Page Frame Metadata

include/linux/mm_types.h

Every physical page frame (4 KiB on x86) has a corresponding struct page in the mem_map array. On a system with 64 GiB of RAM, that's 16,777,216 pages, each with a struct page. The struct is carefully designed to be small (64 bytes on x86-64) via a union of many possible uses:

struct page {
    unsigned long flags;        // PG_locked, PG_dirty, PG_writeback, PG_lru, etc.

    // The big union: a page can be used in exactly one way at a time
    union {
        struct {    // Used when page is in the page cache or anonymous
            struct address_space *mapping;  // file mapping or anon_vma
            pgoff_t index;                  // offset within mapping
            unsigned long private;          // for fs private data
        };
        struct {    // Used when page is a slab object
            struct kmem_cache *slab_cache;
            void *freelist;
            union {
                void *s_mem;
                unsigned long counters;
            };
        };
        struct {    // Used when page is a free page in the buddy allocator
            unsigned long private;          // buddy order
        };
        // ... other uses: page table, compound page, etc.
    };

    atomic_t _refcount;         // page reference count
    atomic_t _mapcount;         // number of PTEs mapping this page (-1 = not mapped)

#ifdef CONFIG_MEMCG
    unsigned long memcg_data;   // memory cgroup tracking
#endif

    // For compound pages (huge pages): first page has compound_head/tail links
    // For buddy allocator: lru links for free lists
};

Key flags (in flags bitfield, defined in include/linux/page-flags.h): - PG_locked: page is locked (I/O in progress) - PG_dirty: page has been modified, needs writeback - PG_writeback: page is being written to disk - PG_lru: page is on an LRU list - PG_active: page is in the active LRU list (recently accessed) - PG_referenced: page was accessed (for LRU aging) - PG_uptodate: page content is valid - PG_reserved: page should never be freed

`mem_map` Array and SPARSEMEM

The mem_map (or vmemmap in modern kernels) is an array of struct page entries, indexed by physical page frame number (PFN). Three memory models determine how it is organized:

FLATMEM (simple systems, single continuous RAM region):

extern struct page *mem_map;
// struct page for PFN n: &mem_map[n]
#define pfn_to_page(pfn)  (mem_map + (pfn))
#define page_to_pfn(page) ((unsigned long)(page) - (unsigned long)mem_map)

DISCONTIGMEM (multiple RAM regions, each node has its own mem_map — mostly replaced): Each NUMA node has NODE_DATA(nid)->node_mem_map.

SPARSEMEM (modern, supports memory hotplug and non-contiguous physical memory): Memory is divided into 128 MiB "sections." Each section that exists (has physical RAM) gets a struct mem_section with a pointer to the struct page array for that section. SPARSEMEM_VMEMMAP (the common variant) maps all struct page entries contiguously into the vmemmap virtual address:

vmemmap layout (SPARSEMEM_VMEMMAP):
  Physical PFN 0:       struct page at 0xffffea0000000000 + 0 * 64
  Physical PFN 1:       struct page at 0xffffea0000000000 + 1 * 64
  ...
  Physical PFN n:       struct page at 0xffffea0000000000 + n * 64

pfn_to_page(pfn) = vmemmap + pfn

This gives O(1) PFN→struct page and struct page→PFN conversions without any branches, even with memory hotplug holes.

Memory Zones and Their Kernel Addresses

Physical memory is divided into zones based on access constraints:

Zone	Typical range (x86-64)	Purpose
`ZONE_DMA`	0 – 16 MiB	Legacy DMA devices that can only address 24-bit addresses
`ZONE_DMA32`	16 MiB – 4 GiB	32-bit DMA devices (PCIe without IOMMU)
`ZONE_NORMAL`	4 GiB – end of memory	Normal memory, directly mapped in kernel VA space
`ZONE_MOVABLE`	Configurable	Memory that can be defragmented/hot-removed
`ZONE_DEVICE`	For persistent memory (PMEM), CXL memory

On modern 64-bit systems with IOMMU, ZONE_DMA and ZONE_DMA32 are less relevant (the IOMMU handles DMA address translation). But the zone structure is maintained for compatibility.

/proc/buddyinfo shows free pages per zone per order:

Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3
Node 0, zone    DMA32      0      0      0      0      0      0      0      0      0      0      3
Node 0, zone   Normal  17802  10987   4821   2438   1289    650    255    109     42     15    876

The Buddy Allocator

The buddy allocator (mm/page_alloc.c) manages free pages in "orders" (powers of 2): - Order 0: 1 page (4 KiB) - Order 1: 2 pages (8 KiB) - Order 2: 4 pages (16 KiB) - ... - Order 10: 1024 pages (4 MiB, MAX_ORDER-1)

Buddy Allocator Free Lists (one set per zone, per NUMA node):

order 0: [page A] → [page C] → NULL
order 1: [pages B,B+1] → NULL
order 2: NULL
order 3: [pages D, D+1, D+2, D+3, D+4, D+5, D+6, D+7] → NULL
...

Allocating 1 page: take from order-0 list. If empty, split an order-1 block (put half on order-0, return the other half). If order-1 is also empty, split order-2, etc.

Freeing a page: check if its "buddy" (the adjacent page at the same alignment) is also free. If so, merge them into an order-1 block and recursively check if the order-1 buddy is also free, up to order 10.

This gives O(log n) allocation and O(log n) deallocation (where n = order).

Per-CPU page sets (PCP): For performance, each CPU caches up to pcp->high order-0 pages in a per-CPU list. Allocating one page usually hits the PCP cache, avoiding the per-zone spinlock entirely.

kmalloc, vmalloc, kzalloc, kcalloc

kmalloc(size, gfp_flags) (include/linux/slab.h, mm/slub.c): Returns physically contiguous kernel memory, suitable for DMA. Backed by the SLUB allocator (slab of pre-allocated objects for common sizes: 8, 16, 32, 64, 96, 128, 192, 256, 512, 1024, 2048, 4096, 8192 bytes). For sizes ≤ 8192 bytes: O(1) allocation from per-CPU slab cache. For sizes > 8192: falls back to page allocator.

Physical contiguity is guaranteed. Virtual address = physical address + PAGE_OFFSET (directly mapped).

vmalloc(size) (mm/vmalloc.c): Returns virtually contiguous but physically non-contiguous kernel memory. Uses the vmalloc address range. Each page is individually allocated from the buddy allocator and mapped into the vmalloc area. Slower than kmalloc (requires setting up page table entries). Not suitable for DMA (physically non-contiguous). Maximum allocation size: much larger (limited by vmalloc area size = 32 TiB).

kzalloc(size, gfp_flags): kmalloc + memset(ptr, 0, size). Always use instead of kmalloc + manual zeroing to prevent uninitialized memory bugs.

kcalloc(n, size, gfp_flags): Allocates n * size bytes, zeroed. Checks for overflow in n * size. Always use for array allocations.

kmem_cache_alloc(cache, gfp_flags): Allocates one object from a pre-defined slab cache. For fixed-size, frequently allocated objects (e.g., struct task_struct, struct inode, struct sk_buff). Fastest option when allocating/freeing many objects of the same size.

GFP Flags: Memory Allocation Context

The gfp_t (Get Free Pages) flags control how the allocator behaves when memory is tight:

Flag	Meaning	Where to use
`GFP_KERNEL`	May sleep, may do reclaim	Normal kernel context (process context)
`GFP_ATOMIC`	Cannot sleep, no reclaim	Interrupt context, softirq, spinlock held
`GFP_NOWAIT`	No sleep, no reclaim	Kernel context but cannot block
`GFP_NOIO`	No disk I/O during reclaim	Filesystem/storage code to prevent deadlock
`GFP_NOFS`	No filesystem operations	Filesystem code to prevent recursive reclaim
`GFP_USER`	User page allocation	mm code allocating user pages
`GFP_DMA`	Must come from ZONE_DMA	Legacy DMA devices
`GFP_DMA32`	Must come from ZONE_DMA32	32-bit DMA
`__GFP_ZERO`	Zero the allocation	Combined with others: `GFP_KERNEL\|__GFP_ZERO`
`__GFP_NOFAIL`	Must succeed, may block forever	Critical allocations

Critical rule: Never use GFP_KERNEL in interrupt context. The allocator may sleep (call schedule()) when memory is tight, which is illegal in interrupt context. This is enforced by might_sleep() assertions.

// In interrupt handler (GFP_ATOMIC required):
struct my_event *ev = kmalloc(sizeof(*ev), GFP_ATOMIC);
if (!ev)
    return;  // Must handle allocation failure — GFP_ATOMIC can fail

// In process context (GFP_KERNEL preferred):
struct my_buffer *buf = kzalloc(size, GFP_KERNEL);
if (!buf)
    return -ENOMEM;

Memory Allocation Failure Handling

Unlike user-space malloc(), kernel allocations can and do fail. Correct handling is mandatory — a kernel function that ignores a NULL return from kmalloc will dereference NULL and cause a kernel oops.

struct my_obj *obj = kmalloc(sizeof(*obj), GFP_KERNEL);
if (!obj)
    return -ENOMEM;  // propagate error to caller

// For objects needed in the hot path, use SLAB_PANIC:
static struct kmem_cache *my_cache;
my_cache = kmem_cache_create("my_objects", sizeof(struct my_obj),
                              0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
// SLAB_PANIC: if cache creation fails, kernel panics immediately
// Use only for truly critical caches (task_struct, mm_struct, etc.)

GFP_NOWAIT vs. GFP_ATOMIC: Both avoid sleeping. GFP_ATOMIC additionally turns off the emergency reserve watermarks, giving access to a small reserve for truly atomic contexts. GFP_NOWAIT respects watermarks and can fail faster. Use GFP_ATOMIC only in actual atomic (non-preemptable) contexts; use GFP_NOWAIT when preemption is technically possible but sleeping is undesirable.

Historical Context

Linux's kernel memory management has evolved significantly across versions:

SLAB allocator (Linux 2.0, 1996): Jeff Bonwick's SLAB allocator from SunOS was ported to Linux. It used "caches" of equal-size objects, per-CPU magazine caches, and cache coloring to improve TLB and cache utilization.

SLOB (Simple List Of Blocks, Linux 2.6.16, 2006): A compact allocator for embedded systems with very limited memory (< 32 MB). Trades off performance for reduced metadata overhead.

SLUB (Linux 2.6.22, 2007, by Christoph Lameter): Replaced SLAB as the default. Simpler design, better debugging support, superior SMP scalability (fewer locks, per-CPU partial slabs rather than per-CPU magazines). SLUB is the default slab allocator on all modern Linux systems.

SPARSEMEM and memory hotplug support were added in Linux 2.6.17 (2006), enabling servers to add RAM while running (common in enterprise server hardware).

5-level paging (Linux 4.14, 2017): Intel added hardware support for 5-level paging (57-bit virtual addresses), supporting up to 128 PiB of virtual address space and 4 PiB of physical RAM.

Production Examples

Memory accounting in databases: PostgreSQL and MySQL maintain their own buffer pools managed with mmap(MAP_ANONYMOUS|MAP_PRIVATE) from user space. The kernel's job is to provide these anonymous pages, track dirty pages, and manage eviction under memory pressure. PostgreSQL's shared buffers (typically 25% of RAM) are a large region of anonymous mmapped memory. When the system is under memory pressure, the kernel's page reclaim chooses between evicting application data (the database buffer pool) and filesystem cache — leading to the importance of tuning vm.dirty_ratio, vm.swappiness, and vm.min_free_kbytes.

/proc/slabinfo analysis: On a production web server, running cat /proc/slabinfo | sort -k 3 -n -r | head -20 shows which slab caches consume the most memory. Common culprits: dentry (directory entry cache, grows with filesystem metadata access), inode_cache, ext4_inode_cache, radix_tree_node (page cache tree nodes). A server that has processed millions of unique file paths will have a large dentry cache. slabtop provides a live view.

NUMA-aware allocation in HPC: High-performance computing applications running on multi-socket servers (e.g., 4 × Intel Xeon with 4 NUMA nodes) must ensure memory is allocated on the same NUMA node as the CPU that will access it. The kernel's alloc_pages_node(nid, gfp, order) allocates from a specific node. numactl --localalloc forces user-space allocations to be local. Misconfigured NUMA allocation can cost 2–4x memory access latency for cross-node accesses.

Debugging Notes

# Physical memory layout
cat /proc/iomem

# Memory zones and buddy allocator state
cat /proc/buddyinfo
cat /proc/zoneinfo

# Slab allocator statistics
cat /proc/slabinfo
# or:
slabtop -s c    # sort by cache size

# Memory usage breakdown
cat /proc/meminfo
# Key fields:
#   MemTotal: total physical RAM
#   MemFree:  completely unused
#   MemAvailable: can be used without swapping (includes reclaimable caches)
#   Buffers:  block device cache
#   Cached:   page cache
#   SlabTotal/SlabReclaimable/SlabUnreclaimable

# vmalloc usage
cat /proc/vmallocinfo | head -50

# Find a kernel virtual address's physical address (from kernel module):
# phys_addr_t pa = virt_to_phys(kernel_ptr);
# Or in user space (requires root):
# /proc/PID/pagemap maps user virtual → physical pages

# Check for memory corruption (SLUB debug):
# Boot with: slub_debug=FPZR (Freelist poison, Padding check, Zero on alloc, track Red-zone)
dmesg | grep "SLUB"

# eBPF: trace kmalloc calls over a threshold
bpftrace -e 'kprobe:__kmalloc { if (arg0 > 1048576) { printf("large alloc: %d bytes from %s\n", arg0, comm); } }'

Security Implications

Heap spraying: Kernel heap spraying attacks fill the kernel heap with attacker-controlled data before triggering a use-after-free vulnerability. The goal is to have the freed object's memory reallocated with attacker-controlled content. Kernel SLUB hardening mitigations: - CONFIG_SLAB_FREELIST_HARDENED: XOR-encodes freelist pointers, making pointer corruption detectable - CONFIG_SLAB_FREELIST_RANDOM: randomizes slab freelist order, making controlled heap layout harder - CONFIG_INIT_ON_ALLOC_DEFAULT_ON: zeroes all slab allocations, prevents uninitialized kernel memory leaks

KASLR and the direct map: With KASLR (CONFIG_RANDOMIZE_BASE), the kernel's virtual address layout (including the direct physical mapping offset) is randomized at boot. An attacker who learns the kernel's virtual address of a struct page must know PAGE_OFFSET to compute the corresponding physical address. KASLR makes this non-trivial.

Kernel pointer leaks: Exposing kernel virtual addresses (e.g., through /proc, ioctl return values, printk) allows attackers to defeat KASLR. CONFIG_SECURITY_DMESG_RESTRICT, the %pK format specifier (which hashes kernel pointers), and removal of /proc/kallsyms access for unprivileged users mitigate this.

Performance Implications

SLUB vs. SLAB performance: SLUB consistently outperforms SLAB on SMP systems because SLUB's per-CPU partial slabs have better locality and the lock granularity is smaller. On a single-threaded embedded system, the difference is negligible.

vmalloc TLB cost: Each vmalloc allocation requires separate page table entries. On x86-64, vmalloc mappings are not present in TLB until first access (lazy mapping). Large vmalloc areas cause TLB misses. This is why kernel modules are placed in the modules area (near kernel text, using 32-bit relative jumps) rather than vmalloc (which would require 64-bit indirect calls). For DMA buffers > 8MB, dma_alloc_coherent() (which may use vmalloc internally) can be a bottleneck.

Per-CPU page set (PCP) tuning: The PCP cache holds up to pcp->high pages per CPU. Under high page allocation/free rates (e.g., a server with many short-lived connections), tuning vm.percpu_pagelist_high_fraction can improve performance by keeping more pages in the PCP cache.

NUMA imbalance detection: numastat -p PID shows per-NUMA-node memory allocation for a process. A process with all memory on one NUMA node but threads running on multiple nodes (cross-node memory access) is a common performance problem. perf stat -e numa:all can confirm cross-NUMA traffic.

Failure Modes and Real Incidents

OOM killer activation: When the system exhausts both RAM and swap, the kernel's OOM (Out-Of-Memory) killer selects a process to kill based on oom_score (a function of memory usage, running time, and user). The OOM killer's selection is often suboptimal — it may kill a database process with large mappings rather than the actual memory hog. Production systems tune /proc/PID/oom_score_adj (set to -1000 for critical processes to exclude them from OOM selection) and vm.overcommit_memory to control OOM behavior.

SLUB corruption producing data corruption: A bug in a driver that double-frees a slab object can corrupt the slab's freelist. The next allocation from that slab may return an object that is still in use, leading to two pointers to the same memory — a type confusion bug. CONFIG_SLUB_DEBUG detects this: it writes poison patterns to freed objects and checks them before allocation.

vmalloc area exhaustion: The vmalloc area is 32 TiB on x86-64 but is used cumulatively — freed vmalloc regions return to the pool but fragmentation can prevent large reuses. On 32-bit kernels (VMALLOC size was only ~128 MiB), vmalloc exhaustion was a common production issue. On 64-bit kernels it is rarely a problem but can occur on systems with many loaded kernel modules, heavy ioremap use, or large vmap buffers.

Mellanox/RDMA and DMA zone depletion: RDMA NICs require physically contiguous DMA buffers in ZONE_DMA32. A server with heavy RDMA traffic allocating many small buffers can exhaust ZONE_DMA32, causing RDMA allocation failures even when plenty of ZONE_NORMAL memory is free. Fix: use IOMMU to allow RDMA to use any physical memory regardless of address range.

Modern Usage

Folio: Linux 6.0+ introduces struct folio as a replacement for struct page in many MM paths. A folio is a power-of-2 contiguous group of pages (1 page = order-0 folio, 512 pages = order-9 folio). This simplifies compound page handling and enables future optimizations for huge pages in the page cache. Much of mm/filemap.c and mm/writeback.c is being migrated to use folios.

memfd and anonymous shared memory: memfd_create(2) creates anonymous memory-backed files. Combined with ftruncate and mmap, it creates anonymous shared memory without a filesystem file. Used by container runtimes and Wayland for zero-copy buffer sharing between compositor and applications. The kernel allocates pages via shmem_alloc_page() — the tmpfs/shmem path in the MM.

Memory tagging (ARM MTE, HWASAN): ARM64's Memory Tagging Extension (MTE) tags pointer values and memory with 4-bit tags. Mismatched tags cause a fault. KASAN (Kernel Address SANitizer) — a software equivalent — instruments every kernel memory access with shadow memory tracking. CONFIG_KASAN_GENERIC or CONFIG_KASAN_HW_TAGS (using MTE) detect use-after-free and out-of-bounds accesses at runtime.

Future Directions

folio completion: The migration from struct page to struct folio throughout the MM subsystem is ongoing. When complete, it will simplify huge page handling and improve performance for large file I/O.
CXL memory: CXL 3.0 (PCIe 6.0-based) enables memory pooling where multiple hosts share a pool of DRAM. The kernel's ZONE_DEVICE and the new ZONE_CXL in development will manage CXL memory. This requires extending the NUMA model to include non-local, non-CPU-attached memory.
DAMON (Data Access MONitoring): A kernel feature (merged Linux 5.15) that profiles which memory regions are actually accessed. Used by damos (DAMON-based Memory Operations Scheme) to automatically apply huge pages, reclaim cold pages, or move data between memory tiers based on actual access patterns.

Exercises

On a Linux system, run cat /proc/iomem and identify all System RAM entries. Calculate the total RAM from these entries. Compare with cat /proc/meminfo | grep MemTotal. Why might they differ?
Use cat /proc/slabinfo | awk 'NR>2 {size=$3*$4; print size, $1}' | sort -rn | head -10 to find the 10 slab caches consuming the most memory. For each cache, look up its struct definition in the kernel source and explain what it represents.
Write a kernel module that allocates memory using all four methods: kmalloc, vmalloc, kzalloc, and kmem_cache_alloc. For each, log the virtual address returned. Using virt_to_phys(), determine which allocation methods return physically contiguous memory. Verify your answer against the expected behavior.
Read mm/slub.c, specifically kmem_cache_alloc() and slab_alloc_node(). Trace the fast path: what happens when the per-CPU cache has a free object? What happens when the per-CPU cache is empty (slow path)? How does this compare to a user-space malloc?
Read /proc/buddyinfo on a running system. Calculate the total free memory in ZONE_NORMAL across all orders. Compare with MemFree in /proc/meminfo. Why might there be a difference? What does MemAvailable include that MemFree does not?

References

Linux kernel source: mm/, include/linux/mm_types.h, include/linux/slab.h, include/linux/vmalloc.h, arch/x86/include/asm/pgtable_64_types.h
Linux kernel documentation: Documentation/admin-guide/mm/, Documentation/mm/
Mel Gorman, "Understanding the Linux Virtual Memory Manager": https://www.kernel.org/doc/gorman/ (most comprehensive reference)
Christoph Lameter, "SLUB: The unqueued slab allocator", Ottawa Linux Symposium 2007
Matthew Wilcox, "Folios", LWN.net: https://lwn.net/Articles/849538/
NUMA Best Practices: https://www.kernel.org/doc/Documentation/vm/numa.rst
Robert Love, Linux Kernel Development, 3rd ed., Chapter 12 (Memory Management)
Brendan Gregg, "Linux Memory Analysis" in BPF Performance Tools, Addison-Wesley, 2019
DAMON documentation: Documentation/admin-guide/mm/damon/