Kernel Decompression
Technical Overview
The Linux kernel is distributed as a compressed image. When a bootloader loads a Linux kernel, it loads not the kernel itself but a self-decompressing stub that contains a compressed kernel payload. This stub runs in protected mode (or long mode on x86-64), decompresses the actual kernel into its load address, applies KASLR (Kernel Address Space Layout Randomization) if enabled, and transfers control to the uncompressed kernel's entry point.
This design achieves three goals: smaller on-disk and in-memory footprint during transfer, faster I/O from bootloader to RAM (less data to read), and the ability to choose the load address dynamically (KASLR). The decompression process is one of the most intricate parts of the boot sequence, executing with no C library, no memory allocator, and minimal hardware abstraction.
Prerequisites
- x86 protected mode and long mode
- ELF binary format (sections, segments, entry points)
- Basic compression algorithm understanding (LZ77, Huffman coding)
- Linux boot protocol (how GRUB passes control to the kernel)
- Memory map concepts from BIOS/UEFI sections
Historical Context
Early Linux kernels (0.01, 1991) were loaded directly without compression. As the kernel grew, Linus Torvalds added gzip compression to the kernel image around 1992–1993. The "z" in "zImage" and "bzImage" denotes compression (not the bzip2 algorithm — the "bz" in bzImage stands for "Big zImage," a joke name that stuck).
The zImage format was limited to kernels that decompressed below 640KB (real mode). When kernels grew beyond this, the bzImage format was introduced to decompress above 1MB (in protected mode), lifting the size restriction. The setup code and the compressed kernel are combined into a single binary that the BIOS or GRUB loads.
The kernel gained xz support in 2010 (smaller images, slower decompression), lz4 in 2013 (fastest decompression, larger images), and zstd in 2019 (best ratio/speed tradeoff for modern hardware). The choice of compression algorithm is a kernel build-time configuration option.
Kernel Image Formats
Kernel Image Format Taxonomy
vmlinux
- Raw, uncompressed ELF binary
- Output of the kernel build (direct linker output)
- Contains all debug symbols if not stripped
- Not directly bootable — must be processed further
- Used for crash dump analysis (with kdump)
- Size: 50–300MB depending on config and debug info
vmlinuz (or vmlinux.bin)
- vmlinux stripped of debug symbols and compressed
- Generic name used by distributions for the bootable image
Image (arch/x86/boot/compressed/vmlinux)
- Intermediate: compressed vmlinux as ELF
- Not the final bootable image
arch/x86/boot/bzImage
- The actual bootable kernel image
- Contains: setup code (16-bit) + decompressor + compressed kernel
- This is what GRUB loads as 'linux /boot/vmlinuz-...'
- Installed as /boot/vmlinuz-<version> on the system
On other architectures:
- ARM: zImage or Image (uncompressed, for devices with XIP)
- ARM64: Image (typically externally compressed by bootloader)
- RISC-V: Image (uncompressed, bootloader handles compression)
bzImage Internal Structure
bzImage Layout (arch/x86/boot/bzImage)
+------------------------------------------+ Offset 0
| Real-Mode Setup Code |
| (arch/x86/boot/header.S + setup.S) |
| 16-bit code: "bootsector" protocol |
| |
| Linux Boot Header at offset 0x1F1: |
| setup_sects (number of 512B sectors) |
| syssize (size of protected-mode |
| kernel in 16B units) |
| vid_mode (video mode) |
| boot_flag = 0xAA55 |
| jump (JMP to startup_32) |
| header = "HdrS" (magic) |
| version (boot protocol version) |
| kernel_version (human-readable string)|
| type_of_loader (bootloader ID) |
| loadflags (CAN_USE_HEAP, etc.) |
| code32_start (load address for kernel)|
| ramdisk_image (initrd address) |
| ramdisk_size (initrd size) |
| cmd_line_ptr (cmdline address) |
| initrd_addr_max (max initrd address) |
| kernel_alignment |
| relocatable_kernel |
| min_alignment |
| xloadflags (64-bit entry, etc.) |
+------------------------------------------+ Offset (setup_sects+1)*512
| Protected-Mode Kernel |
| (arch/x86/boot/compressed/vmlinux) |
| |
| startup_32 (entry from bootloader) |
| startup_64 (entry for 64-bit) |
| |
| Decompressor code: |
| - decompress_kernel() |
| - KASLR: choose_random_location() |
| |
| piggy.o: compressed vmlinux payload |
| (embedded as .rodata section) |
| |
| vmlinux.bin.gz (or .xz/.lz4/.zstd) |
| (the actual kernel, compressed) |
+------------------------------------------+
The bootloader (GRUB2) reads the Linux boot header to determine how to load the kernel:
- setup_sects: number of 512-byte sectors of setup code
- relocatable_kernel: if set, bootloader can load kernel anywhere with the right alignment
- kernel_alignment: required alignment (typically 2MB for KASLR)
- loadflags & LOADED_HIGH: if set (always for bzImage), kernel decompresses above 1MB
Compression Algorithms
| Algorithm | Module | Size ratio | Decomp speed | Comp speed | Notes |
|---|---|---|---|---|---|
| gzip | CONFIG_KERNEL_GZIP |
baseline | good | good | Default, universal availability |
| bzip2 | CONFIG_KERNEL_BZIP2 |
best (classic) | slow | slow | Rarely used today |
| lzma | CONFIG_KERNEL_LZMA |
very good | medium | slow | Superseded by xz |
| xz | CONFIG_KERNEL_XZ |
very good | medium | slow | Best ratio, slow boot on slow systems |
| lz4 | CONFIG_KERNEL_LZ4 |
worst | fastest | fast | Embedded/fast-boot systems |
| zstd | CONFIG_KERNEL_ZSTD |
very good | fast | fast | Recommended for modern systems |
Typical bzImage sizes (x86-64 kernel, approximately): - gzip: ~12MB - xz: ~9MB - lz4: ~16MB - zstd: ~11MB (with comparable decomp speed to lz4 at level 1)
The choice matters for boot time on I/O-constrained systems (NVMe SSD vs spinning disk vs network boot). On NVMe, reading speed dominates — xz's smaller size wins. On a fast SSD, decompression time dominates — lz4 or zstd wins.
Decompression Process (Step by Step)
Kernel Decompression Flow
GRUB2 (or UEFI stub)
|
| Reads boot header, loads setup code to 0x90000
| Loads protected-mode kernel to 0x100000 (1MB)
| Sets up boot_params structure
| Sets CS:IP = 0x9020:0x0000 (setup code)
|
v
Real-Mode Setup Code (arch/x86/boot/main.c)
|
| Checks hardware (video, memory, CPU features)
| Calls INT 15h/E820 to get memory map
| Enables A20 gate
| Calls go_to_protected_mode()
|
v
Protected Mode Entry (arch/x86/boot/pmjump.S)
|
| Sets up GDT (flat 32-bit segments)
| Jumps to startup_32 in compressed kernel
|
v
startup_32 (arch/x86/boot/compressed/head_64.S)
|
| Establishes identity-mapped page tables (for decompressor)
| Enables PAE paging
| Jumps to startup_64
|
v
startup_64 (same file)
|
| Sets up 64-bit stack
| Parses UEFI memory map if booted via EFI stub
| Calls choose_random_location() for KASLR
| → Finds a suitable physical address range
| → Uses RDRAND or RDTSC for entropy
| → Ensures chosen range does not overlap
| firmware, ACPI, reserved memory
|
v
decompress_kernel() (arch/x86/boot/compressed/misc.c)
|
| Calls chosen decompressor:
| gunzip() / unxz() / unlz4() / unzstd()
| Output: uncompressed vmlinux at chosen_addr
|
v
parse_elf() (arch/x86/boot/compressed/misc.c)
|
| Parses ELF PT_LOAD segments from vmlinux header
| Copies each segment to its final virtual address
| (applying KASLR offset to all addresses)
|
v
handle_relocations() (arch/x86/boot/compressed/misc.c)
|
| Applies ELF relocation entries with KASLR delta
| Patches all absolute addresses in the kernel
| with the KASLR randomization offset
|
v
Jump to kernel entry point
| (startup_64 in the uncompressed kernel at randomized address)
|
v
Uncompressed kernel running at KASLR-randomized physical address
Proceeds to early CPU init, page table setup, kmain...
KASLR During Decompression
KASLR (Kernel Address Space Layout Randomization) is applied during decompression. The kernel does not decompress to a fixed address — it decompresses to a randomly chosen physical address within a range.
The function choose_random_location() in arch/x86/boot/compressed/kaslr.c:
1. Collects entropy: tries RDRAND (hardware RNG), then RDTSC, then a combination of memory map characteristics
2. Determines valid slots: physical memory regions large enough to hold the uncompressed kernel, aligned to kernel_alignment (2MB), not overlapping firmware/ACPI/device regions
3. Picks a random slot within the valid range
4. Returns the chosen physical address
The KASLR offset is typically in the range 0–(CONFIG_RANDOMIZE_BASE_MAX_OFFSET - kernel_size). The default range is 512MB with 2MB alignment granularity, giving ~256 possible positions.
KASLR on the physical address is separate from the virtual KASLR (CONFIG_RANDOMIZE_BASE) that randomizes where the kernel maps itself in virtual address space. Both operate during decompression.
KASLR limitations:
- Without hardware RNG (RDRAND), entropy is weak on embedded/VM systems
- Physical memory fragmentation limits available slots
- Disabled with nokaslr kernel parameter (or automatically if decompressor cannot find sufficient entropy)
- Side channels (Rowhammer, cache timing) can defeat KASLR in practice
KASLR Physical Address Selection:
Physical Memory
0MB 512MB 2GB
|------------------------+-----------+-----------|
^kernel zone^
|<------ valid slots -------->|
slot 0 slot 1 slot 2 ... slot N
^
| chosen by RNG
| kernel decompressed here
piggy.o — The Embedded Compressed Kernel
The compressed kernel payload is embedded in the decompressor binary as a data object called piggy.o. This is generated by arch/x86/boot/compressed/Makefile:
# Compress vmlinux
$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
$(call if_changed,gzip)
# Wrap compressed binary as ELF .o file
$(obj)/piggy.o: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/piggy.S FORCE
$(call if_changed,as_s_o)
piggy.S is a minimal assembly file that includes the compressed binary as .incbin:
/* arch/x86/boot/compressed/piggy.S */
.section ".rodata..compressed","a",@progbits
.globl z_input_len
z_input_len = <compressed_size>
.globl z_extract_offset
z_extract_offset = <kernel_size_aligned>
.globl input_data, input_data_end
input_data:
.incbin "arch/x86/boot/compressed/vmlinux.bin.gz"
input_data_end:
The decompressor reads input_data (address of start of compressed payload) and z_input_len (compressed size) to call the decompression function.
Early Output: early_printk
During decompression, the standard kernel printk infrastructure is not available. Early output is provided by early_printk which writes directly to a hardware device:
early_printk=vga: Write to VGA text buffer at0xB8000early_printk=serial,0x3f8,115200: Write to COM1 serial portearly_printk=efi: Write to UEFI ConOut (console)early_printk=dbgp: Write to USB debug port (EHCI DbC)
In the decompressor stage specifically, debug_putstr() in arch/x86/boot/compressed/misc.c provides serial output before the full early_printk infrastructure is set up.
Enable with kernel command line: earlyprintk=serial,ttyS0,115200 keep (the keep flag preserves the console device after main kernel init).
Build System Integration
Kernel Build Pipeline for bzImage:
Source files →
vmlinux (uncompressed ELF, all symbols)
↓ objcopy -O binary (strip ELF headers)
arch/x86/boot/compressed/vmlinux.bin
↓ gzip/xz/lz4/zstd
arch/x86/boot/compressed/vmlinux.bin.gz (etc.)
↓ as (assemble piggy.S with .incbin)
arch/x86/boot/compressed/piggy.o
↓ link with head_64.o, misc.o, kaslr.o, etc.
arch/x86/boot/compressed/vmlinux (the decompressor ELF)
↓ objcopy -O binary
arch/x86/boot/compressed/vmlinux.bin (binary)
↓ combine with setup code (arch/x86/boot/setup.bin)
arch/x86/boot/bzImage ← final bootable image
Production Examples
Inspecting a bzImage:
# Check kernel version encoded in bzImage header
file /boot/vmlinuz-$(uname -r)
# → Linux kernel x86 boot executable bzImage, version ...
# Extract and inspect boot protocol fields
python3 - <<'EOF'
import struct
with open('/boot/vmlinuz-' + __import__('os').uname().release, 'rb') as f:
f.seek(0x1F1)
setup_sects = struct.unpack('B', f.read(1))[0]
f.seek(0x202)
magic = f.read(4)
f.seek(0x20E)
version = struct.unpack('H', f.read(2))[0]
print(f"setup_sects={setup_sects}, magic={magic}, version={version:#x}")
EOF
# Decompress the kernel payload from a bzImage
# (vmlinux-extract tool or manual offset calculation)
SETUP_SECTS=$(python3 -c "import struct; \
d=open('/boot/vmlinuz-$(uname -r)','rb').read(); \
print(struct.unpack_from('B', d, 0x1F1)[0])")
OFFSET=$(( (SETUP_SECTS + 1) * 512 ))
dd if=/boot/vmlinuz-$(uname -r) bs=1 skip=$OFFSET | \
file - # → gzip compressed data, or XZ compressed data, etc.
KASLR offset visibility:
# View current kernel load address (KASLR result)
sudo cat /proc/kallsyms | grep ' _text$'
# Changes on every boot when KASLR is active
# Disable KASLR for debugging (never in production):
# Add to kernel cmdline: nokaslr
# Then: cat /proc/kallsyms | grep ' _text$' → always 0xffffffff81000000
Crash dump with vmlinux:
# kdump uses vmlinux (uncompressed) for symbol resolution
# Install kernel debug info:
dnf install kernel-debuginfo-$(uname -r)
# Analyze crash dump:
crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux \
/var/crash/127.0.0.1-<timestamp>/vmcore
Debugging Notes
Decompression fails silently: If the compressed kernel payload is truncated or corrupted, the decompressor may output partial data or jump to invalid code. The only indication is a system reset or hang immediately after "Decompressing Linux..." message. Use early_printk to see the message; verify bzImage checksum with sha256sum.
KASLR debugging: Add nokaslr to kernel command line to disable physical KASLR and norandmaps (userspace only) for debugging. With KASLR disabled, /proc/kallsyms addresses match the ELF addresses in vmlinux.
Boot hangs between GRUB and kernel messages: Often indicates a problem in the setup code or early decompressor. Try earlyprintk=serial,ttyS0,115200 to get output from the setup code. Also try adding noefi if booting via EFI stub.
Large kernel won't fit in MBR gap (BIOS mode): The decompressor + piggy need contiguous space. If the MBR gap (sectors 1–2047 = ~1MB) is too small for core.img, you'll see GRUB install errors. This is a GRUB core.img size issue, not a kernel decompression issue.
Security Implications
KASLR entropy quality: KASLR is only as strong as its entropy source. On VMs without RDRAND passthrough, KASLR may have weak entropy. kaslr_seed in the boot params (settable by UEFI firmware or GRUB) supplements this.
bzImage + initrd integrity: GRUB2 does not verify bzImage or initrd integrity by default. Under Secure Boot, the kernel is signed and verified, but initrd is not (as of current shim/GRUB2 implementations). An attacker with ESP write access can replace initrd. Full-chain integrity requires UKIs (kernel + initrd + cmdline as single signed unit).
Decompressor attack surface: The decompressor runs with full machine privileges before the kernel's security infrastructure exists. Vulnerabilities in the decompressor code (e.g., in unzstd, gunzip) would be catastrophic. The surface is small but historically understudied.
Kernel load address exposure: On systems without KASLR (embedded, certain VMs), _text is at a fixed address (0xffffffff81000000). This makes kernel ROP gadget addresses trivially predictable, enabling reliable kernel exploits.
Performance Implications
Decompression time by algorithm (rough benchmarks on modern hardware): - lz4: 200–400ms for 20MB compressed image → ~50MB/s decomp rate - zstd (level 1): 300–500ms, smaller input (better I/O, similar decomp time) - gzip: 400–700ms - xz: 1000–2000ms
Total time from bootloader to first kernel printk: - NVMe system with lz4: 400–600ms - NVMe system with xz: 800–1200ms - Network boot with xz: potentially 5–10s (smaller image = faster transfer, slow decomp)
Boot time optimization: Distributions targeting fast boot (e.g., automotive Linux, mobile) use lz4 or even uncompressed Image with CONFIG_KERNEL_UNCOMPRESSED=y. The overhead of compression is often worth eliminating on flash storage.
Failure Modes
Decompression output overlaps input: The decompressor checks that the decompression output region does not overlap the input (compressed) region. If KASLR chooses an address that overlaps, it chooses again. If no non-overlapping slot exists, the kernel falls back to a known-safe address.
vmlinux parse failure (parse_elf): If the ELF header of the decompressed kernel is corrupted, parse_elf() will fail. This manifests as a hang or triple fault. Indicates either decompressor bug or memory corruption during decompression.
Relocation failure: On kernels built without CONFIG_RELOCATABLE, the kernel must be loaded at its linked address. If KASLR is requested but the kernel is not relocatable, KASLR is silently disabled.
Modern Usage
The bzImage format remains the primary kernel image format for x86 Linux. Key modern developments:
EFI Stub + Unified Kernel Images (UKI): Instead of GRUB loading bzImage, the EFI stub allows the UEFI firmware to load bzImage directly as an EFI application. UKIs embed initramfs and cmdline inside the bzImage PE wrapper, signing everything as a unit for Secure Boot.
Zstd as default: Linux 5.9+ distributions increasingly default to CONFIG_KERNEL_ZSTD for the best modern balance of size and speed.
ARM64 Image: On ARM64, Image is typically uncompressed (the bootloader or firmware handles compression). The ARM64 kernel does not use the bzImage decompressor scheme — the kernel itself jumps directly to its entry point after being placed in memory by the bootloader.
Future Directions
- Post-decompression hardening: Zeroing compressed payload in memory after decompression (prevents cold-boot attacks on kernel code), now done automatically
- KASLR with hardware entropy: Platform requirements for RDRAND are increasing; cloud firmware provides strong entropy in boot params
- Measured decompression: Integrating TPM measurement of the kernel image into the decompressor (measure vmlinux before executing it) for full measured boot coverage
- Rust in decompressor: As Rust infrastructure in the kernel grows, the decompressor (safety-critical, no_std code) is a candidate for Rust rewrite for memory safety
Exercises
-
bzImage Header Parsing: Write a Python script that opens
/boot/vmlinuz-$(uname -r)and parses the Linux boot protocol header. Print: setup_sects, boot_protocol version, kernel version string (at offsetkernel_version + 0x200), loadflags, kernel_alignment, and init_size. Verify againstfile /boot/vmlinuz-*. -
Compression Algorithm Comparison: Build the kernel three times with
CONFIG_KERNEL_GZIP,CONFIG_KERNEL_LZ4, andCONFIG_KERNEL_ZSTD. Record: bzImage size, decompression time (measured withearlyprintktimestamps), and first kernel printk time. Plot the size/speed tradeoff. -
KASLR Entropy Source Inspection: Check if your CPU supports RDRAND (
grep rdrand /proc/cpuinfo). Boot withearlyprintk=serial,ttyS0,115200andkaslr(explicit). Observe the entropy source selection messages fromchoose_random_location(). Compare with a VM that lacks RDRAND. -
vmlinux Segment Analysis: Using
readelf -l vmlinux(the uncompressed kernel from a debug package), list all PT_LOAD segments. Calculate the total memory footprint. Identify the text, data, BSS, and percpu segments. Understand why bzImageinit_sizemust be larger than the sum of these segments. -
piggy.o Extraction: From a bzImage, extract the compressed payload using the offset calculation (setup_sects determines where the protected-mode code starts). Decompress with the appropriate algorithm. Use
readelf -hto verify it's a valid ELF file. Confirm by checking the ELF magic\x7fELFat the start.
References
- Linux kernel
Documentation/x86/boot.rst— x86 Linux boot protocol arch/x86/boot/— setup code sourcearch/x86/boot/compressed/— decompressor source (head_64.S, misc.c, kaslr.c)arch/x86/boot/compressed/Makefile— build system for bzImage construction- "Linux Kernel Development" — Robert Love, Chapter on kernel build system
- kernel.org booting guide: https://www.kernel.org/doc/html/latest/x86/boot.html
- KASLR: https://lwn.net/Articles/569635/
- "How the kernel boots" — LWN.net series
- vmlinux-to-elf tool: https://github.com/marin-m/vmlinux-to-elf
- Intel SDM Vol 3, Chapter 9 — Processor Management and Initialization (protected mode entry)