Skip to content

05 — Xen Hypervisor

Prerequisites

  • Virtualization fundamentals: hypervisor types, Popek-Goldberg, trap-and-emulate
  • Linux kernel: process scheduling, memory management, interrupt handling
  • x86 privilege rings: Ring 0 (kernel), Ring 3 (user), Ring -1 (hypervisor in VT-x)
  • Paravirtualization concept: guest kernel modification, hypercalls

Historical Context

Xen was developed at the University of Cambridge Computer Laboratory by Ian Pratt, Keir Fraser, and colleagues. The landmark paper "Xen and the Art of Virtualization" appeared at SOSP 2003 and introduced paravirtualization as a practical approach to x86 virtualization before Intel VT-x hardware was available.

The core insight was pragmatic: rather than attempting full software emulation of sensitive x86 instructions (as VMware did with binary translation), Xen required minimal changes to the guest OS — replacing ~3,000 lines in the Linux kernel. The result was near-native performance: overhead of 2–5% vs native for compute-bound workloads, compared to VMware's 20–30% overhead at the time.

Xen was the hypervisor that launched cloud computing. Amazon Web Services launched EC2 in 2006 using Xen. Rackspace, Linode, and virtually every early cloud provider ran Xen. The Linux kernel gained native Xen PV support in 2.6.23 (2007), meaning unmodified Linux distributions could boot as Xen guests without custom kernels.

AWS began transitioning from Xen to its custom Nitro hypervisor (KVM-based) in 2017, completing the transition for new instance types by 2019. However, Xen remains deployed on AWS's legacy instance families and continues to be actively maintained as an open-source project.


Xen Architecture Overview

Xen is a true Type 1 (bare-metal) hypervisor. It runs directly on hardware and manages multiple guest operating systems called domains (Dom):

Xen Architecture:

+-----------+     +----------+     +----------+
|  Dom0     |     | DomU 1   |     | DomU 2   |
|  (Linux)  |     | (Linux)  |     | (Windows)|
|  Privileged|    | Unprivilg|     | Unprivilg|
|  Management|   | Guest    |     | Guest    |
+-----------+     +----------+     +----------+
|  Xen Tools|     |  Ring 1  |     |  Ring 1  |
|  (xen-api)|     |  (guest  |     |  (guest  |
|  xenstore |     |   kernel)|     |   kernel)|
+-----------+     +----------+     +----------+
       |                |                |
+------+----------------+----------------+------+
|                   XEN HYPERVISOR               |
|           (Ring -1 / Privilege Level 0)        |
|   Domain scheduler | Memory mgmt | Event ch    |
+------------------------------------------------+
|                   HARDWARE                     |
+------------------------------------------------+

Dom0 (Domain Zero)

Dom0 is the privileged management domain — always the first domain started at boot. It runs a standard Linux kernel with Xen-specific drivers. Dom0 has:

  • Direct hardware access for most devices (it runs the backend drivers for storage and networking)
  • Dom builder privilege: can create, destroy, pause, and migrate DomU instances
  • xenstore: a key-value store used by all domains for configuration and discovery
  • Xen toolstack: xl, xend, or libxl for VM management

Dom0 is a critical trust boundary — a compromise of Dom0 compromises all DomU guests. This "fat Dom0" design was recognized as a security weakness, leading to the concept of driver domains and stub domains to minimize Dom0 privileges.

DomU (Domain Unprivileged)

Guest VMs. They have no direct hardware access. I/O is mediated through the split driver model via Dom0 (or a dedicated driver domain).


Xen PV — Paravirtualization

In Xen PV mode, the guest kernel is modified to: 1. Run at Ring 1 (instead of Ring 0) on classic x86 — Xen occupies Ring 0 2. Replace privileged instructions with explicit hypercalls 3. Use Xen-provided mechanisms for memory management, interrupts, and time

On x86-64 with VT-x, Xen PV runs guests in VMX non-root mode Ring 0, but guests still use the hypercall ABI (no hardware emulation path).

Hypercalls

Hypercalls are the guest→hypervisor ABI. On x86, the guest executes via the HYPERCALL_PAGE — a 4KB page mapped by Xen into every guest's address space at a fixed virtual address. Each hypercall slot contains architecture-appropriate instructions (typically SYSCALL or VMCALL).

Key hypercalls:

Hypercall Number Description
HYPERVISOR_set_trap_info 0 Register guest IDT entries with Xen
HYPERVISOR_mmu_update 1 Update page table entries (batch)
HYPERVISOR_set_callbacks 4 Register event/failsafe handlers
HYPERVISOR_update_va_mapping 14 Update single VA mapping
HYPERVISOR_event_channel_op 32 Event channel operations
HYPERVISOR_grant_table_op 36 Grant table operations
HYPERVISOR_sched_op 29 Yield, block, shutdown
HYPERVISOR_memory_op 12 Memory balloon, populate physmap
HYPERVISOR_xen_version 17 Query Xen version
HYPERVISOR_console_io 18 PV console I/O

Shared Info Page

Xen maps a shared_info page into every domain's address space. This page contains: - Wallclock time and system time (updated by Xen, read by guest without hypercall) - Per-vCPU info: per-cpu vcpu_info array with pending event bitmap, event mask, and time info - Domain-wide info: number of vCPUs, xen features

This avoids hypercalls for time queries — a critical optimization since time is read very frequently.


Xen HVM — Hardware Virtual Machine

With HVM mode (enabled by Intel VT-x / AMD-V), Xen can run unmodified guest OSes (Windows, unmodified Linux):

  • Guest runs in VMX non-root mode at Ring 0
  • Sensitive instructions cause VMEXIT to Xen
  • QEMU runs as a device model process in Dom0 for legacy device emulation (BIOS, IDE, NIC)
  • VirtIO and Xen PV drivers can be installed in HVM guests ("PVHVM") for better I/O performance

HVM adds ~5–10% overhead vs PV for CPU-bound workloads due to VMEXIT costs, but enables running any OS without modification.


Xen PVH — The Best of Both

PVH (PV in HVM container) introduced in Xen 4.4 (2014): - Uses PV boot protocol (no legacy BIOS/MBR, direct kernel loading) - Runs with HVM execution semantics (VMX non-root mode, hardware page tables with NPT) - No QEMU device model required for basic boot - Modern Linux and FreeBSD support PVH boot natively

PVH is now the recommended mode for new Linux guests: minimal overhead, no QEMU attack surface, full hardware support.


Event Channels

Xen event channels are the lightweight inter-domain notification mechanism, replacing hardware interrupts for most guest activity:

Event Channel Types:
1. Interdomain channels:  Dom0 ←→ DomU  (split driver notification)
2. Physical IRQ bindings: Hardware IRQ → Dom0 event
3. Virtual IRQ bindings:  Xen timer → DomU event
4. IPI channels:          vCPU ←→ vCPU  (same or different domain)

Event Channel Mechanics

Each domain has a 32-bit event pending bitmap (or 64-bit on 64-bit guests) in its vcpu_info structure. When Xen wants to notify a domain:

  1. Xen sets the corresponding bit in the pending bitmap
  2. Xen sets the evtchn_upcall_pending flag in vcpu_info
  3. If the vCPU is blocked (waiting), Xen schedules it

The guest polls the pending bitmap in its event callback (registered via HYPERVISOR_set_callbacks). This callback fires like an interrupt — but is delivered via a dedicated software path, not through the x86 IDT emulation.

Benefits: - No VMEXIT required to deliver an event to a running domain (the bit is set in shared memory) - Batching: multiple events can be pending simultaneously, reducing notification overhead - Scalable: up to 4096 event channel ports per domain


Xen Grant Tables

Grant tables enable zero-copy memory sharing between domains — the fundamental mechanism behind the split driver model:

Grant Table: DomU shares page with Dom0

  DomU grant table:               Dom0 receives:
  +------------------+            +------------------+
  | grant_ref 42:    |            | Domain 5,        |
  |   domid=Dom0     |   -------> | grant_ref 42     |
  |   frame=0x3a000  |            | → maps 0x3a000   |
  |   flags=RW       |            |   into Dom0 VA   |
  +------------------+            +------------------+

Grant reference: a numeric ID (uint32_t) that DomU gives to Dom0. Dom0 calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ...) to map the shared page into its own virtual address space. The page is now accessible from both domains simultaneously with no data copying.

This is how blkfront/blkback achieves zero-copy disk I/O: 1. DomU allocates a page for I/O buffer 2. DomU grants access to Dom0 (grant reference) 3. Dom0 backend reads/writes directly from DomU's page via grant map 4. Disk DMA writes directly into the guest page (Dom0's IOMMU mapping points to same hPA)


Xen Split Driver Model

Every I/O device in Xen is split into a frontend driver (in the guest DomU) and a backend driver (in Dom0 or a driver domain):

Xen Split Driver Model (netfront/netback example):

+---------------+          +--------------------+
|    DomU       |          |     Dom0           |
|               |          |                    |
|  Application  |          |  Physical NIC      |
|      |        |          |       ^            |
|  Guest TCP/IP |          |  e1000 driver      |
|      |        |          |       ^            |
|  netfront     |  xenbus  |  netback           |
|  (virtqueue   | -------> |  (reads shared     |
|  via shared   |          |   ring, calls      |
|  grant pages) |          |   dev_queue_xmit)  |
+---------------+          +--------------------+
        |                           |
        +------- Event channel -----+
                  (notification)

xenbus / xenstore: Dom0 and DomU communicate device configuration through xenstore. netback advertises itself at xenstore:/backend/vif/<domid>/<devid>/. netfront reads this path to find its backend.

Ring buffer (shared page protocol): netfront and netback share a page (via grant table). The ring buffer on this page contains: - netif_tx_request/netif_tx_response for transmit - netif_rx_request/netif_rx_response for receive

Each request grants the backend access to another page (the actual packet buffer), achieving zero-copy for large packets.


AWS EC2 and Xen History

AWS launched EC2 in August 2006 using Xen with Linux-based Dom0. Early EC2 instance types (m1.small through m2.4xlarge) all ran on Xen HVM or PVHVM.

Key events: - 2006: EC2 beta (Xen) - 2008: EC2 general availability; Windows HVM support added - 2014: C4 instances: first "enhanced networking" using SR-IOV (bypassing Xen for networking) - 2015: C5 preview: Nitro hypervisor (KVM-based) introduced - 2017: C5 GA: full Nitro deployment. Networking and storage offloaded to Nitro hardware cards - 2019: All new instance types use Nitro; Xen still used for legacy t2, m3, c3 families

The Nitro migration was driven by performance: Xen's Dom0 consumed CPU resources for I/O, limiting tenant VM CPU availability. Nitro offloads I/O to dedicated silicon, giving tenants 100% of the host CPU.


Xen Scheduler

Xen includes multiple CPU schedulers:

  • Credit scheduler (Xen 3.0): default for years. vCPUs earn credits based on their weight; credit runs out → vCPU preempted. Does not provide strong latency guarantees.
  • Credit2 scheduler (Xen 4.10+, now default): run queue per socket, work-stealing, better NUMA awareness, lower jitter.
  • RTDS (Real-Time Deferrable Server): for latency-sensitive domains; allows specifying period and budget.
  • ARINC 653 (Xen 4.5): for safety-critical embedded/avionics, fixed time-slicing.

Security Implications

  • Dom0 compromise = all VMs compromised: Dom0 has full hardware access and can read/write any guest's memory. AWS mitigated this with Nitro by removing Dom0 from the I/O path and replacing it with a minimal management microVM.
  • Inter-domain communication bugs: grant table vulnerabilities could allow DomU to access memory beyond what it granted. CVE-2021-28688: Xen grant table bugs allowing privilege escalation.
  • Event channel resource exhaustion: a malicious DomU flooding Dom0 with events can cause a DoS. Mitigated by per-domain event channel limits.
  • XSA (Xen Security Advisory) database: Xen maintains a rigorous CVE process. Major XSAs: XSA-108 (VENOM equivalent for Xen), XSA-155 (x86 null pointer dereference).
  • Spectre/Meltdown in Xen: Xen required extensive patching for all speculative execution side channels. XPTI (Xen Page Table Isolation) was implemented analogously to KPTI in Linux.
  • PV privilege escalation: in PV mode, the guest kernel runs at Ring 1. A guest kernel exploit can call hypercalls directly. Xen must validate all hypercall arguments carefully (type validation, range checking of frame numbers).

Performance Implications

Mode CPU overhead I/O overhead Memory overhead Requires modification
PV ~2-5% ~1-3% Low Yes (guest kernel)
HVM ~5-10% ~15-25% (emulated) Low No
PVHVM ~5-10% ~2-5% (PV drivers) Low No (PV drivers as modules)
PVH ~2-5% ~2-5% (PV drivers) Low Minimal (boot protocol)

Debugging Notes

# Dom0: list all domains
xl list
#   Name        ID   Mem VCPUs    State    Time(s)
#   Domain-0     0  8192     8     r-----  38400.2
#   ubuntu-vm    1  2048     2     -b----    342.8

# Dom0: query domain info
xl info ubuntu-vm
xenstore-ls /local/domain/1

# Dom0: view event channels
xenstore-read /local/domain/1/device/vif/0/state

# Dom0: view grant table stats
cat /proc/xen/privcmd

# DomU: verify Xen paravirt is active
dmesg | grep -i xen
cat /sys/hypervisor/type
systemd-detect-virt   # → xen

# Dom0: console output
xl console ubuntu-vm

# Xen debug via serial console
# Add 'loglvl=all guest_loglvl=all' to Xen bootloader args
# Connect to serial: minicom /dev/ttyS0

Failure Modes

  • Dom0 OOM: Dom0 kernel runs out of memory (backend buffers + management overhead). All DomU I/O halts. Monitor Dom0 memory carefully; pin Dom0 memory with dom0_mem=4096M in Xen bootloader.
  • Event channel storm: DomU receives bursts of events faster than its vCPU can process them. Xen event callback fires continuously, consuming 100% vCPU. Mitigated by event masking in the driver.
  • Grant map table exhaustion: Xen grants are limited per domain. Workloads with thousands of outstanding I/Os can exhaust grant slots. Default: 1024 grant frames; increase with gnttab_max_frame_count in Xen config.
  • xenstore corruption: xenstore database corruption (due to Dom0 crash or write errors) can prevent new DomU creation and break existing inter-domain communication. xenstore is a single point of failure.
  • Xen Hypervisor assertion / crash: rare but catastrophic — all guests lose state simultaneously. Xen kdump via kexec can capture a crash dump for post-mortem analysis.

Modern Usage and Future Directions

Qubes OS: uses Xen as the foundation for a security-focused desktop OS. Each application category (personal, work, banking) runs in a separate DomU. Dom0 is minimal; networking and USB are isolated in dedicated VMs. Xen's isolation is the core security property.

OpenXT (formerly XenClient): Citrix-developed Xen-based client hypervisor for enterprise laptops. Runs Windows and Linux simultaneously, each in separate domains, with hardware-accelerated GPU sharing.

Unikraft on Xen: Unikraft unikernels can boot directly as Xen PVH domains, achieving 200μs boot times with 1–2 MB memory footprint. Used for ephemeral function invocations.

Xen on ARM: Xen has supported ARM64 since Xen 4.4. Used in automotive (AUTOSAR hypervisor profile), aerospace, and mobile (some Android automotive implementations).


Exercises

  1. Install Xen on a bare-metal Ubuntu host. Boot a DomU using xl create. Observe the split driver model by examining xenstore-ls /local/domain/<id>/device/.
  2. Measure network throughput inside a Xen PV guest vs a Xen HVM guest (same hardware). Explain the difference in terms of driver paths.
  3. Examine the Xen netfront/netback source code (drivers/net/xen-netfront.c, drivers/net/xen-netback/). Trace a single TX packet from the guest application call to the physical NIC interrupt.
  4. Write a xenstore watch in Python: monitor a DomU's xenstore:/local/domain/<id>/cpu/0/availability and log when vCPU availability changes.
  5. Research XSA-155 (CVE-2016-9379, Xen null pointer dereference). Identify which Xen component was vulnerable, the exploit conditions, and the fix.

References

  • Barham, P. et al. (2003). "Xen and the Art of Virtualization." SOSP 2003.
  • Fraser, K. et al. (2004). "Safe Hardware Access with the Xen Virtual Machine Monitor." OASIS Workshop 2004.
  • Xen Project. Xen Hypervisor Documentation. https://xenbits.xen.org/docs/
  • Clark, C. et al. (2005). "Live Migration of Virtual Machines." NSDI 2005. (Uses Xen as platform)
  • Hand, S. (2007). "Xen 3.0 and the Art of Virtualization." Linux Symposium 2007.
  • Nikolaev, S. & Back, G. (2013). "Perfctr-Xen: a Framework for Performance Counter Virtualization." VEE 2013.
  • AWS re:Invent 2017. "A Closer Look at the AWS Nitro System." (Transition from Xen).