Real-Time Kernels
Technical Overview
A real-time operating system (RTOS) guarantees that computational tasks will complete within defined time bounds. The key property is not speed but determinism — the ability to bound the worst-case response time to external events. A system that responds in 1ms on average but occasionally spikes to 100ms is not real-time; a system that always responds within 5ms (even if the average is 4ms) is.
Real-time requirements span a spectrum from hard real-time (missing a deadline causes system failure — fly-by-wire, airbag controllers, pacemakers) to soft real-time (degraded quality but not failure — media streaming, gaming, audio processing).
Prerequisites
- Preemptive scheduling fundamentals
- Interrupt handling and interrupt latency
- Priority inversion and priority inheritance
- Cache effects on latency
- Basic understanding of lock types (spinlocks, mutexes)
Core Concepts
Hard vs. Soft Real-Time
Real-Time Spectrum
==================
HARD REAL-TIME SOFT REAL-TIME GENERAL PURPOSE
| | |
Missed deadline Missed deadline No deadline
= catastrophic = degraded service guarantee
failure (tolerable)
| | |
Examples: Examples: Examples:
- Fly-by-wire control - Video streaming - Web server
- Airbag controller - Audio playback - Database
- Pacemaker - Gaming - Desktop apps
- Nuclear rod control - VoIP - Batch jobs
- ABS brakes - Industrial HMI
- Mars Rover servo
|
Typical deadline: 1µs - 10ms
Worst-case response
time MUST be bounded
RTOS Requirements
For a kernel to support hard real-time applications, it must provide:
- Preemptible kernel: The kernel itself must be preemptible at almost any point, not just at explicit yield points
- Bounded interrupt latency: Time from interrupt arrival to ISR execution must be bounded
- Priority inheritance: Mutexes must implement priority inheritance to prevent priority inversion
- Bounded lock hold times: No lock may be held for unbounded duration
- Deterministic memory allocation: Dynamic allocation in real-time paths must have bounded latency (or be avoided entirely)
- No unbounded deferrals: No RCU grace periods, no deferred work with unbounded latency in critical paths
Preemption Models in Linux
Linux supports multiple preemption configurations, trading latency for throughput:
Linux Preemption Modes
=======================
CONFIG_PREEMPT_NONE (Server/HPC default)
- Kernel code runs to completion or voluntary schedule()
- No preemption within kernel
- Best throughput, worst latency
- Typical worst-case latency: 100ms - 10s (rare cases)
CONFIG_PREEMPT_VOLUNTARY
- Added explicit preemption points with might_sleep()
- Reduced latency, slight throughput impact
- Typical worst-case latency: 10ms - 100ms
CONFIG_PREEMPT (Desktop default)
- Kernel preemptible except in spinlock/IRQ contexts
- Most kernel code can be preempted
- Typical worst-case latency: 100µs - 10ms
- Suitable for soft real-time (audio, desktop responsiveness)
CONFIG_PREEMPT_RT (PREEMPT_RT patch set, merged into mainline ~6.12)
- Fully preemptible kernel
- Spinlocks converted to sleeping mutexes (RT mutexes)
- IRQ handlers run as real-time threads
- All previously non-preemptible sections now preemptible
- Typical worst-case latency: 20µs - 200µs (hardware dependent)
- Hard real-time capable on appropriate hardware
PREEMPT_RT: Converting Linux to Hard Real-Time
The PREEMPT_RT patch set (Thomas Gleixner, Ingo Molnár, Steven Rostedt, and others) transforms Linux into a hard real-time kernel through several key changes:
1. Threaded IRQ Handlers
Conventional IRQ handling runs in a non-preemptible context (hard IRQ context). PREEMPT_RT converts interrupt handlers to run as real-time kernel threads:
Traditional IRQ handling:
Hardware interrupt fires
→ CPU vectors to interrupt handler (non-preemptible)
→ Handler runs to completion
→ Return from interrupt
[During handler: nothing can preempt, even RT thread]
PREEMPT_RT threaded IRQs:
Hardware interrupt fires
→ Minimal hardirq handler (just acknowledges IRQ)
→ Wakes IRQ thread (kernel thread with RT priority)
→ Scheduler runs: may preempt lower-priority work
→ IRQ thread executes handler
[IRQ thread can be preempted by higher-priority RT thread]
2. PI Mutexes (Priority Inheritance Mutexes)
Classic priority inversion scenario:
Priority Inversion Problem
===========================
Time →
T=0: Task L (low prio) acquires mutex M
T=1: Task H (high prio) tries to acquire M → blocks
T=2: Task M (medium prio) preempts L (M doesn't need mutex)
[M runs, L is blocked, H waits for L which can't run]
[H effectively runs at L's priority — INVERSION]
Priority Inheritance Solution:
T=0: Task L acquires mutex M (prio = low)
T=1: Task H blocks on M → kernel boosts L's priority to H's level
T=2: Task M tries to preempt L → can't (L now at H's priority)
T=3: L releases M (priority reverts to low)
T=4: H acquires M, continues
// PREEMPT_RT pi_mutex usage (from kernel code, simplified)
#include <linux/rtmutex.h>
static DEFINE_RT_MUTEX(my_lock);
void rt_critical_section(void) {
rt_mutex_lock(&my_lock); // blocks with priority inheritance
// ... critical section ...
rt_mutex_unlock(&my_lock); // releases, restores boosted task's priority
}
3. Preemptible RCU
Standard RCU (Read-Copy-Update) has grace periods where readers run without preemption. PREEMPT_RT uses SRCU (Sleepable RCU) or preemptible RCU variants that allow preemption within RCU read-side critical sections.
Latency Comparison Across RTOS Options
RTOS Latency Comparison (approximate, hardware-dependent)
===========================================================
System | Worst-Case | Typical | Notes
| Latency | Latency |
----------------------|---------------|--------------|------------------
PREEMPT_RT Linux | 20-200 µs | 10-50 µs | Hardware quality matters
(x86-64, isolated) | | |
PREEMPT_RT Linux | 50-500 µs | 20-100 µs | SMI/BIOS interference
(x86-64, production)| | |
QNX Neutrino 7.x | 1-10 µs | 2-5 µs | POSIX RTOS, x86/ARM
VxWorks 7.x | 1-10 µs | 2-5 µs | Avionics certified
FreeRTOS (ARM M4) | 0.5-5 µs | 1-2 µs | Bare metal, no MMU
Zephyr (ARM M33) | 1-10 µs | 2-5 µs | Embedded RTOS
RTEMS (ARM/x86) | 1-20 µs | 5-10 µs | Space/avionics
Windows (non-RT) | 10ms-1s | 1-10 ms | No RT guarantees
Linux (PREEMPT only) | 1ms-10s | 0.5-2 ms | Soft RT only
Notes on measurement:
- Hardware SMI (System Management Interrupts) can add 100µs-10ms
of unpreemptible latency on x86 — major PREEMPT_RT challenge
- Hyperthreading, CPU frequency scaling, NUMA effects all add jitter
- ARM SoCs often have simpler interrupt topology → lower jitter
- All numbers from cyclictest measurements
VxWorks (Wind River Systems)
VxWorks is the dominant commercial RTOS for aerospace and defense:
- Certification: DO-178C (avionics software) level A/B, ARINC 653 for partitioned avionics
- Architecture: Monolithic kernel with a real-time scheduler, POSIX APIs
- Scheduler: Priority-based preemptive with round-robin at each priority level
- IPC: Message queues, shared memory, semaphores, pipes — all with deterministic latency
- Memory model: Flat address space or optional MMU protection (POSIX option)
Deployments: - Mars Rovers (Spirit, Opportunity, Curiosity): VxWorks is the flight computer OS - Boeing 787 flight management system - F-35 Joint Strike Fighter mission computer - NASA Mars Pathfinder (famous Priority Inversion Incident — see below)
// VxWorks task creation (POSIX-like)
#include <taskLib.h>
#include <sysLib.h>
void myRealtimeTask(void) {
int ticksPerMs = sysClkRateGet() / 1000;
while (1) {
// Do work with bounded execution time
process_sensor_data();
// Delay exactly 1ms
taskDelay(ticksPerMs); // deterministic sleep
}
}
TASK_ID taskId = taskSpawn(
"tRealtime", // name
50, // priority (0=highest, 255=lowest in VxWorks)
0, // options
4096, // stack size
(FUNCPTR)myRealtimeTask,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // arguments
);
QNX Neutrino
QNX is covered in detail in 02-microkernels.md. Key RTOS properties:
- Microkernel: The QNX kernel is genuinely minimal — all drivers and services run as user-space processes with real-time priorities
- Adaptive partitioning: CPU time can be partitioned into guaranteed budgets (critical automotive control gets guaranteed 30% CPU, infotainment gets the rest)
- POSIX certified: Highly conformant POSIX implementation enables application portability
FreeRTOS
Amazon FreeRTOS (acquired 2017) is the dominant embedded RTOS for microcontrollers (no MMU, often <512KB RAM):
// FreeRTOS task and queue example
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
QueueHandle_t xQueue;
void vSensorTask(void *pvParameters) {
int32_t reading;
for (;;) {
reading = read_adc_channel(0);
// Send to queue (block 10ms if full)
xQueueSend(xQueue, &reading, pdMS_TO_TICKS(10));
vTaskDelay(pdMS_TO_TICKS(100)); // 100ms period
}
}
void vControlTask(void *pvParameters) {
int32_t reading;
for (;;) {
// Block until data arrives
if (xQueueReceive(xQueue, &reading, portMAX_DELAY)) {
apply_control_output(reading);
}
}
}
int main(void) {
xQueue = xQueueCreate(10, sizeof(int32_t));
xTaskCreate(vSensorTask, "Sensor", 256, NULL, 2, NULL);
xTaskCreate(vControlTask, "Control", 256, NULL, 3, NULL); // higher prio
vTaskStartScheduler(); // never returns
}
FreeRTOS is deployed on Arduino, ESP32, STM32, and essentially every MCU platform. It handles most of the IoT edge device market.
Zephyr RTOS (Linux Foundation)
Zephyr (2016) is designed for resource-constrained connected devices with modern features: - Formal threat modeling in design - Hardware security abstractions - Bluetooth, WiFi, Thread/Zigbee stacks built in - Supports POSIX-subset API
RTEMS (Real-Time Executive for Multiprocessor Systems)
RTEMS is an open-source RTOS with a strong space systems heritage: - Space heritage: Parker Solar Probe, OSIRIS-REx, Mars Science Laboratory ground systems - Standards: POSIX-subset, ARINC 653, DO-178 - Architecture: Executes on SMP; supports partitioned scheduling - Open source: Unlike VxWorks, entirely open-source — important for long-lived space missions
RT Benchmarking Tools
# cyclictest: measures scheduling latency
# Standard tool for PREEMPT_RT qualification
# Basic measurement (run as root, set RT priority):
cyclictest -p 80 -m -t 1 -i 500 -D 60
# Options:
# -p 80 : run at RT priority 80
# -m : lock memory (mlockall, avoid paging)
# -t 1 : 1 thread
# -i 500 : 500µs interval between tests
# -D 60 : run for 60 seconds
# -h 400 : histogram with 400µs max, output to file
# Typical output on PREEMPT_RT system:
# T: 0 (12345) P:80 I:500 C:120000 Min: 3 Act: 15 Avg: 12 Max: 48
# ^ ^ ^
# typical avg worst
# latency(µs)
# hwlatdetect: detect hardware (SMI) latency spikes
# Runs a tight polling loop, measures gaps > threshold
hwlatdetect --duration=120 --threshold=20
# Output:
# hwlatdetect: sample window width(us): 1000000
# hwlatdetect: sample window size(us): 950000
# hwlatdetect: thresh(us): 20
# hwlatdetect: number of samples: 120
# Samples exceeding threshold:
# 2024-01-15T10:23:45: inner=112 outer=114
# ^SMI caused 112µs non-preemptible delay
# rt-tests suite: comprehensive RT characterization
# pip install rt-tests or build from source
# hackbench: stress scheduler
hackbench -p -T -l 1000
# pip_stress: priority inheritance stress test
pip_stress
# ssdd: semaphore/signal/data race stress
ssdd -D /dev/null -s 4
# Isolate a CPU for RT testing:
# Add to kernel cmdline: isolcpus=3 nohz_full=3 rcu_nocbs=3
taskset -c 3 cyclictest -p 80 -t 1 -i 200 -D 300 -m
Historical Context
NASA Mars Pathfinder Priority Inversion (1997)
The most famous real-time incident in computing history. Mars Pathfinder's rover (Sojourner) used VxWorks. The system experienced periodic total resets — the watchdog timer wasn't being serviced because the system was hung.
Root cause: classic priority inversion - Low-priority task L held a shared memory mutex - High-priority task H needed the mutex, blocked - Medium-priority task M (which didn't need the mutex) preempted L - H remained blocked while M ran — priority inversion - The watchdog task was higher priority than M but lower than H - Eventually the system appeared hung and the watchdog fired
VxWorks had priority inheritance as an option but it was disabled for Pathfinder. The fix was deployed via a patch uploaded from Earth — the first remote software patch to an interplanetary spacecraft. The patch enabled priority inheritance, and the resets stopped.
This incident became the canonical teaching example for priority inheritance in every OS textbook.
DO-178C Certification
DO-178C (Software Considerations in Airborne Systems and Equipment Certification) defines rigorous software development standards for aviation. Level A (catastrophic failure consequence) requires: - Full MC/DC (Modified Condition/Decision Coverage) testing - Formal requirements traceability - Structural coverage analysis at source and object code level - Tool qualification for any tools used in development
VxWorks 653 (for ARINC 653 partitioned avionics) and QNX carry DO-178C Level A qualifications — a multi-year, multi-million dollar certification effort.
Production Examples
Tesla Autopilot
Tesla's Autopilot system runs a hard real-time component for actuator control (steering, braking, acceleration) alongside a soft real-time component for perception and planning. The hard RT component uses a deterministic loop at ~100Hz, tolerating no more than ~5ms jitter for safe operation.
Medical Device: Infusion Pump
A drug infusion pump must deliver precise doses at programmed rates. A timing error that delivers medication 30% early could cause patient harm. These devices run RTOSes (typically VxWorks, QNX, or INTEGRITY) with bounded interrupt latency to ensure dosing accuracy.
Industrial Robot Control
ABB, Fanuc, and Kuka industrial robots use RTOSes for the servo control loop — typically running at 1-4kHz with latency requirements under 250µs. PREEMPT_RT Linux is increasingly used for the coordination layer while dedicated DSPs handle the lowest-latency servo loops.
Debugging Notes
# PREEMPT_RT latency tracing with ftrace
# Set up tracer:
echo 0 > /sys/kernel/debug/tracing/tracing_on
echo "preemptirqsoff" > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
# Trigger: the tracer captures the worst observed latency automatically
# Read the trace:
cat /sys/kernel/debug/tracing/trace | head -50
# Typical output shows which function was preempt-disabled longest:
# irqsoff latency trace v1.1.5 on 5.15.0-rt17
# latency: 47 us, #4/4, CPU#0 | (M:RT VP:0, KP:0, SP:0 HP:0)
# ...
# <idle>-0 0dNh3 0µs: IRQ disabled in intel_idle
# perf trace for RT analysis
# Capture scheduling events for RT task
perf trace -p $PID -e sched:sched_switch,sched:sched_wakeup
# Observe: time from sched_wakeup to sched_switch = scheduling latency
Security Implications
Safety-Critical System Attack Surface
For safety-critical RTOS deployments, security concerns are inseparable from safety: - DO-326A: Airworthiness Security for aviation — requires security alongside DO-178C safety - ISO 21434: Cybersecurity engineering for road vehicles - IEC 62443: Industrial automation and control systems security
A security breach in a hard real-time safety system may manifest as a safety failure (the attacker disrupts timing guarantees → missed deadline → physical consequence).
QNX Adaptive Partitioning as Security
QNX's adaptive partitioning ensures that security monitoring tasks receive guaranteed CPU budget. Even under denial-of-service attack (CPU exhaustion), the security monitor continues to run.
Failure Modes and Real Incidents
SMI Interference on x86
System Management Interrupts (SMIs) are x86 hardware events that cause the CPU to enter System Management Mode (SMM), running firmware code that's completely invisible to the OS. SMIs can hold the CPU for 100µs to several milliseconds.
There is no software solution to SMI latency — it's a hardware/firmware issue. For hard real-time on x86, hardware qualification must include SMI measurement. This is why embedded RTOS deployments often prefer ARM SoCs, which have less aggressive SMM behavior.
Priority Ceiling Protocol Deadlock
The priority ceiling protocol (an alternative to priority inheritance) assigns each mutex a "ceiling" equal to the highest priority that might lock it. Any thread acquiring the mutex is temporarily raised to the ceiling priority.
Incorrect ceiling assignment — setting a ceiling too low — can cause unexpected priority inversions. Verifying ceiling assignments in complex systems is non-trivial and a known source of certification effort.
Modern Usage
PREEMPT_RT in Linux Mainline: After years as a patch set, PREEMPT_RT was substantially merged into the Linux mainline kernel in version 6.12 (2024). This makes Linux-based hard real-time much more accessible without maintaining out-of-tree patches.
ROS 2 on PREEMPT_RT: The Robot Operating System 2 recommends PREEMPT_RT Linux for robot control applications, bringing modern robotics middleware to real-time Linux.
Automotive AUTOSAR Classic: The AUTOSAR Classic platform (for ECUs) builds on OSEK/VDX, a standardized embedded RTOS API. Most automotive ECUs run some variant of OSEK.
Future Directions
PREEMPT_RT Completion and SMP Refinement: Continued improvements to PREEMPT_RT for x86 SMP, addressing per-CPU data structures and inter-processor interrupt latency.
Time-Sensitive Networking (TSN): IEEE 802.1 TSN standards provide hardware-level determinism in Ethernet. PREEMPT_RT Linux with TSN network drivers can provide end-to-end bounded latency for industrial control over Ethernet.
Rust for RTOS: FreeRTOS has a Rust task abstraction library (freertos-rust). Zephyr is adding Rust support. The memory safety of Rust addresses a critical RTOS concern — a buffer overflow in real-time code can corrupt timing data structures with catastrophic results.
Exercises
-
cyclictest Baseline: On a Linux system with PREEMPT_RT (or standard Linux for comparison), run cyclictest for 10 minutes at different CPU loads (idle, 50%, 100%). Record min/avg/max latency and the latency histogram. What causes the maximum latency spikes you observe?
-
Priority Inversion Demo: Write a three-task POSIX program (using pthreads with SCHED_FIFO) that demonstrates priority inversion: low-priority task holds a mutex, medium-priority task preempts it, high-priority task blocks waiting for the mutex. Instrument with timestamps. Then add
pthread_mutexattr_setprotocol(PTHREAD_PRIO_INHERIT)and observe the change. -
FreeRTOS Task Scheduling: Implement a FreeRTOS application (on ESP32 or simulated via QEMU) with three tasks: a 10ms sensor reading task, a 100ms processing task, and a 1000ms reporting task. Use queue communication. Measure actual task period jitter using GPIO toggling or hardware timers.
-
PREEMPT_RT Configuration: Configure and compile a Linux kernel with PREEMPT_RT. Measure the impact on system throughput (iperf3 TCP throughput, disk I/O bandwidth) compared to CONFIG_PREEMPT. Quantify the real-time/throughput tradeoff.
-
Mars Pathfinder Simulation: Implement the priority inversion scenario from the Mars Pathfinder incident using POSIX threads. Include a "watchdog" thread that panics if it doesn't run within a deadline. Demonstrate the system hanging, then fix it with priority inheritance. Document the exact timing behavior difference.
References
- Yodaiken, V. "Against Priority Inheritance." FSMLabs Technical Report, 2004. [Critique of the standard solution]
- Sha, L., Rajkumar, R., and Lehoczky, J. "Priority Inheritance Protocols: An Approach to Real-Time Synchronization." IEEE Transactions on Computers, 1990.
- Corbet, J., Rubini, A., Kroah-Hartman, G. Linux Device Drivers, 3rd ed. O'Reilly, 2005.
- Gleixner, T., et al. PREEMPT_RT patchset and documentation: https://rt.wiki.kernel.org/
- Jones, M.T. "Inside the Real-time Linux Kernel." IBM developerWorks, 2008.
- VxWorks documentation: https://docs.windriver.com/
- QNX RTOS documentation: https://www.qnx.com/developers/docs/
- FreeRTOS documentation: https://www.freertos.org/Documentation/
- Zephyr documentation: https://docs.zephyrproject.org/
- Jones, M. "Mars Pathfinder Incident." Inside the Real-time Microkernel, 1997. [Priority inversion incident]
- Cyclictest tool: https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest