Skip to content

Embedded Linux

Overview

Embedded Linux applies the mainline Linux kernel and GNU userspace to resource-constrained hardware. It occupies the space between bare-metal/RTOS systems (too limited for complex networking or filesystems) and full desktop Linux (far too resource-heavy). The canonical target runs with 64MB to 1GB of RAM, uses eMMC or NAND flash for storage, and may boot in 1-20 seconds depending on optimization.

Embedded Linux unlocks the full TCP/IP networking stack, POSIX APIs, package management, and a vast ecosystem of open-source software. The cost is a minimum hardware footprint of roughly 32MB RAM and 64MB flash, power consumption in the hundreds of milliwatts range, and a complex toolchain encompassing bootloader, kernel configuration, and root filesystem construction.

Prerequisites

  • Linux system administration fundamentals
  • Embedded systems fundamentals (see 01-embedded-systems-fundamentals.md)
  • Basic understanding of cross-compilation toolchains
  • Familiarity with shell scripting and make-based build systems
  • ARM architecture basics (Cortex-A series)

When Embedded Linux Makes Sense

Choose embedded Linux when the product requires:

  • Full TCP/IP stack (HTTP, MQTT, TLS, WebSocket, IPv6)
  • Persistent filesystem (logging, databases, OTA update packages)
  • POSIX compatibility (porting existing Linux software)
  • Complex display rendering (X11, Wayland, Qt, OpenGL ES)
  • USB host connectivity (webcams, mass storage, USB audio)
  • High-bandwidth data processing (H.264 video, audio DSP, ML inference)
  • Multiple concurrent user-space processes with isolation

Do not choose embedded Linux when:

  • Deterministic sub-millisecond response time is required (use RTOS or bare-metal)
  • Total system RAM is below ~32MB
  • Boot time under 500ms is mandatory (bare-metal or RTOS boots in milliseconds)
  • Power budget is in the microwatt range

Common products running embedded Linux: Raspberry Pi (BCM2711), Amazon Echo (MediaTek), smart TVs (ARM Cortex-A), home routers (Qualcomm IPQ series, running OpenWRT), industrial HMIs (NXP i.MX6/8), IP cameras (Ambarella SoC), automotive infotainment secondary processors.


Embedded Linux Boot Chain

Power-On
    |
    v
+------------------+
|  ROM Bootloader  |  On-chip ROM code (vendor-specific)
|  (BootROM)       |  Initializes minimum DRAM, finds SPL
+------------------+
    |
    v  (loads from NAND/eMMC/SD/SPI flash)
+------------------+
|  SPL             |  Secondary Program Loader
|  (U-Boot SPL)    |  Minimal init: PLL, DRAM full init
|  ~32-64KB        |  Loads full U-Boot into DRAM
+------------------+
    |
    v
+------------------+
|  U-Boot          |  Universal Bootloader
|  (~500KB)        |  Network boot, DHCP, TFTP
|                  |  Environment variables (env in flash)
|                  |  Loads kernel + DTB + initrd into DRAM
|                  |  Passes DTB to kernel via register r2 (ARM32)
|                  |         or x0 (ARM64)
+------------------+
    |
    v  (kernel Image/zImage/Image.gz + dtb)
+------------------+
|  Linux Kernel    |  Decompresses, sets up MMU
|  (~5-20MB image) |  Parses Device Tree
|                  |  Initializes subsystems: MM, VFS, net
|                  |  Probes platform devices from DT
|                  |  Mounts root filesystem
|                  |  Spawns init (PID 1)
+------------------+
    |
    v  (rootfs: ext4/squashfs/ubifs)
+------------------+
|  Init (PID 1)    |  systemd or BusyBox init
|                  |  Starts system services
|                  |  Launches application daemons
|                  |  System ready
+------------------+

Historical Context

Linux first ran on embedded hardware in the late 1990s. uClinux (microcontroller Linux) enabled operation on MMU-less processors. By 2001-2002, Lineo and Montavista were shipping commercial embedded Linux distributions. The iPAQ running Linux in 2001 was an early consumer product.

The embedded Linux tooling ecosystem matured significantly around 2007-2010 with OpenEmbedded, Ångström, and eventually the formation of the Yocto Project in 2010 (a Linux Foundation project backed by Intel, ST, and others). The Buildroot project provided a simpler alternative. ARM's dominance of the Cortex-A application processor market standardized toolchains around arm-linux-gnueabihf-gcc and later aarch64-linux-gnu-gcc.


Components in Detail

BusyBox

BusyBox implements over 300 POSIX tools (ls, sh, awk, sed, grep, vi, wget, ifconfig, mount, ...) in a single ~1MB binary. Each tool is a symlink to the BusyBox binary. BusyBox init, BusyBox sh, and BusyBox networking tools replace GNU coreutils, bash, and iproute2 in minimal root filesystems.

A minimal BusyBox rootfs with kernel and U-Boot fits in ~12MB flash, boots in ~2 seconds, and provides a functional Linux shell. It is the foundation of many embedded products that add their application daemon on top.

systemd on Embedded

systemd has made inroads on higher-end embedded systems (those with ≥64MB RAM and eMMC storage). It provides:

  • Parallel service startup (faster boot than sequential init.d scripts)
  • Service supervision and automatic restart
  • Journal logging (journald)
  • D-Bus socket activation

The tradeoff: systemd binary is ~1MB+, plus dependencies (libsystemd, udev). Not suitable for flash-constrained targets. BusyBox init with /etc/inittab remains the default for minimal systems.


U-Boot Deep Dive

U-Boot (Universal Boot Loader, formerly PPCBoot) is the standard first-stage bootloader for embedded Linux. Written in C with board-specific configuration.

Two-Stage Boot (SPL)

Most modern SoCs (i.MX6, AM335x, Allwinner A-series) cannot load a full 400KB+ U-Boot into the tiny on-chip SRAM available at power-on. The solution is a two-stage boot:

BootROM -> U-Boot SPL (~32KB, fits in on-chip SRAM) -> U-Boot proper (in DRAM)

SPL initializes the DRAM controller (DDR PHY training), then loads full U-Boot from the boot device.

U-Boot Environment

U-Boot stores configuration in a dedicated flash region. Environment variables control boot behavior:

# Typical U-Boot environment variables
bootcmd=run mmcboot
bootargs=console=ttymxc0,115200 root=/dev/mmcblk0p2 rootwait rw
bootdelay=3
mmcboot=mmc dev 0; fatload mmc 0:1 0x80800000 Image; \
        fatload mmc 0:1 0x83000000 imx8mm-evk.dtb; \
        booti 0x80800000 - 0x83000000

bootargs is passed directly to the Linux kernel command line. booti boots an ARM64 Image (uncompressed), passing the DTB address.

A/B Redundant Boot

Production embedded devices implement A/B update for resilient OTA:

Flash Layout:
+----------+----------+----------+----------+----------+
| U-Boot   |  env     | Kernel A | Kernel B | RootFS A |
| SPL+full |          | DTB A    | DTB B    | RootFS B |
+----------+----------+----------+----------+----------+

U-Boot logic:
  - Try slot A; if boot fails N times -> switch to slot B
  - After successful boot: mark slot as "good"

U-Boot's bootcount mechanism and upgrade_available environment variable implement this pattern. Google uses A/B on Android (using U-Boot or proprietary bootloaders). The Raspberry Pi uses a similar mechanism with tryboot on RPi 4.


Device Tree

The Device Tree Specification (derived from Open Firmware / IEEE 1275) describes hardware topology in a hierarchical data structure. Its purpose: decouple hardware description from kernel source code.

Before Device Tree, ARM boards required board-specific C files (arch/arm/mach-xxx/board-yyy.c) describing every peripheral. The result was thousands of tiny, unmergeable board files. Linus Torvalds famously called this "a f***ing mess" in 2011. The Device Tree solution moved hardware description out of C into .dts text files, compiled to binary .dtb files.

DTS Anatomy

/ {
    model = "My Custom Board based on i.MX6Q";
    compatible = "mycompany,myboard", "fsl,imx6q";

    cpus {
        cpu@0 {
            compatible = "arm,cortex-a9";
            reg = <0>;
        };
    };

    memory@10000000 {
        device_type = "memory";
        reg = <0x10000000 0x40000000>; /* 1GB at 256MB */
    };

    soc {
        #address-cells = <1>;
        #size-cells = <1>;

        uart1: serial@2020000 {
            compatible = "fsl,imx6q-uart", "fsl,imx21-uart";
            reg = <0x2020000 0x4000>;
            interrupts = <0 26 IRQ_TYPE_LEVEL_HIGH>;
            clocks = <&clks IMX6QDL_CLK_UART_IPG>,
                     <&clks IMX6QDL_CLK_UART_SERIAL>;
            clock-names = "ipg", "per";
            status = "okay";
        };

        i2c1: i2c@21a0000 {
            compatible = "fsl,imx6q-i2c", "fsl,imx21-i2c";
            reg = <0x021a0000 0x4000>;
            status = "okay";
            clock-frequency = <400000>;

            temp_sensor: lm75@48 {
                compatible = "national,lm75";
                reg = <0x48>;  /* I2C address */
            };
        };
    };
};

compatible strings are how the kernel matches DT nodes to kernel drivers. The kernel walks the Device Tree, and for each node finds a matching of_device_id table entry in a platform driver.

DTS Compilation

.dts source -> dtc compiler -> .dtb binary (flattened device tree)

U-Boot loads the .dtb at a known DRAM address, passes that address to the kernel in register x0 (ARM64) or r2 (ARM32). The kernel decompresses and walks the DTB to instantiate platform_device structures for each enabled node.

Overlay files (.dtbo) allow dynamic modification at boot time — used by Raspberry Pi's config.txt to enable/disable interfaces without recompiling the base DTB.


Yocto Project

Yocto is not a Linux distribution; it is a build framework for creating custom embedded Linux distributions. It produces: bootloader, kernel, root filesystem image, and optionally an SDK for application development.

Core Concepts

+--------------------------------------------------+
|                  Yocto Build                     |
|                                                  |
|  meta-layer-A   meta-layer-B   meta-bsp-vendor  |
|  (base recipes) (your recipes) (board support)  |
|         |              |              |          |
|         v              v              v          |
|  +--------------------------------------------+ |
|  |           bitbake build engine             | |
|  |  - Resolves recipe dependencies            | |
|  |  - Fetches sources (git/tarball/local)     | |
|  |  - Cross-compiles for target arch          | |
|  |  - Packages (ipk/rpm/deb)                 | |
|  |  - Assembles root filesystem image        | |
|  +--------------------------------------------+ |
|         |                                        |
|         v                                        |
|   tmp/deploy/images/[machine]/                  |
|   - core-image-minimal.wic.gz                  |
|   - u-boot.imx                                 |
|   - Image (kernel)                             |
|   - board.dtb                                  |
+--------------------------------------------------+

Recipes (.bb files): Define how to fetch, configure, compile, and package a software component. A recipe for a custom daemon might look like:

DESCRIPTION = "My sensor daemon"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://LICENSE;md5=..."

SRC_URI = "git://github.com/mycompany/sensord.git;branch=main"
SRCREV = "abc1234..."

inherit cmake systemd

SYSTEMD_SERVICE:${PN} = "sensord.service"

do_install() {
    install -d ${D}${bindir}
    install -m 0755 sensord ${D}${bindir}/
}

Layers: Recipes are organized into layers (meta-, meta-openembedded, meta-freescale). BSP layers provide the board-specific kernel configuration, U-Boot patch sets, and device tree files.

Image recipes: Define the content of the root filesystem. core-image-minimal produces ~6MB squashfs. core-image-full-cmdline adds more utilities. Custom images inherit from base images and add IMAGE_INSTALL += "my-package".

Reproducible builds: Yocto's sstate cache enables reproducible builds across developers and CI servers. Once a component is built, it is cached by hash and reused without rebuilding. This is critical for compliance and audit trails in regulated industries.


Buildroot

Buildroot is a simpler, faster alternative to Yocto. Where Yocto models complex layer dependencies and cross-compilation with full recipe metadata, Buildroot uses a flat menuconfig-driven configuration:

make menuconfig       # Configure: arch, packages, bootloader, kernel
make linux-menuconfig # Kernel configuration
make busybox-menuconfig
make                  # Build everything
# Output in output/images/

Buildroot builds faster than Yocto (minutes vs. first-build hours), has a smaller learning curve, and is excellent for getting a working system quickly. Its limitations: less flexible for complex custom packages, less support for generating SDKs, less active vendor BSP ecosystem.

Choose Buildroot for: research, prototyping, products with simple software stacks. Choose Yocto for: products requiring ongoing maintenance, large teams, complex package dependency management, or when vendor-provided BSP layers exist.


Kernel Configuration for Embedded

The mainline kernel has thousands of configuration options. For embedded targets, start with allnoconfig or tinyconfig and add back what is needed.

make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- allnoconfig
# allnoconfig: start from nothing
# tinyconfig: absolute minimum runnable kernel (~100KB uncompressed)

# Add required items:
make ARCH=arm64 menuconfig

Key reductions for embedded:

  • Disable loadable modules (CONFIG_MODULES=n) for read-only squashfs root
  • Disable unneeded filesystems (XFS, Btrfs, NFS client) if not needed
  • Enable only relevant network protocols
  • Disable SLAB debugger, KASAN, KMEMLEAK in production images
  • Enable CONFIG_EXPERT=y to access advanced size reductions

The scripts/config utility allows scripting kernel config changes without interactive menuconfig:

scripts/config --enable CONFIG_USB_GADGET
scripts/config --disable CONFIG_SOUND
make olddefconfig  # Fill in unset options with defaults

Cross-Compilation Toolchain

Cross-compilation means compiling on a host architecture (x86-64 Ubuntu) for a target architecture (ARM64). The toolchain prefix encodes the target:

  • arm-linux-gnueabihf- : ARM 32-bit, Linux, GNU libc, hard-float ABI
  • aarch64-linux-gnu- : ARM 64-bit, Linux, GNU libc
  • arm-none-eabi- : ARM, bare-metal (no OS), EABI (used for MCU development)

Toolchains are provided by: - Linaro / ARM Developer: Pre-built GCC toolchains for ARM - Crosstool-NG: Build custom toolchains from source - Yocto SDK: Generated by bitbake -c populate_sdk — includes sysroot with target libraries for application development


Production Examples

Amazon Echo (4th gen): MediaTek MT8516 quad-core Cortex-A35. Runs Amazon's customized Linux (AL2) with Alexa Voice Service as a user-space process. BusyBox init. Read-only squashfs root + writable overlay. A/B OTA partition scheme.

OpenWRT on Qualcomm IPQ4019: Dual-core Cortex-A7 home router SoC. OpenWRT Linux ~10MB image on NAND flash. Kernel + uClibc-ng + BusyBox + OpenWRT's netifd, dnsmasq, hostapd. Configuration stored in UCI (Unified Configuration Interface) flat files on JFFS2.

Raspberry Pi Compute Module 4 (Industrial): BCM2711 quad-core Cortex-A72, up to 8GB RAM, optional eMMC. Buildroot or Yocto used for production; Raspberry Pi OS for development. Videocore VI GPU driver enables OpenGL ES 3.1. Used in industrial HMIs, digital signage, kiosks.

GENIVI / AGL Automotive: NXP i.MX8M or Renesas R-Car SoC. Yocto-based AGL (Automotive Grade Linux) distribution. Wayland/Weston compositor. systemd-based service management. Used by Toyota, Renault, Suzuki in vehicle infotainment.


Debugging Notes

  • Serial console is essential: Always bring up serial UART before anything else. console=ttyS0,115200 on the kernel command line ensures early boot messages are visible even before the display driver initializes.
  • Kernel oops/panic: Enable CONFIG_KALLSYMS=y and CONFIG_KALLSYMS_ALL=y for symbolic stack traces. scripts/decode_stacktrace.sh translates raw addresses to function names with line numbers.
  • Boot argument debugging: initcall_debug logs every kernel initcall with timing. Identifies slow drivers delaying boot. ignore_loglevel ensures all printk messages appear regardless of log level.
  • NFS rootfs for development: Mount root filesystem over NFS during development — eliminates flash reflashing cycle. root=/dev/nfs nfsroot=192.168.1.1:/nfsroot,v3,tcp ip=dhcp on kernel command line.
  • TFTP kernel loading in U-Boot: dhcp; tftp 0x80080000 Image; tftp 0x83000000 board.dtb; booti 0x80080000 - 0x83000000 — load new kernel from network without touching flash.
  • strace on embedded: Cross-compiled strace is invaluable for diagnosing daemon startup failures. If a binary fails silently, strace reveals the missing library, file permission, or device node.

Security Implications

  • Read-only root filesystem: Mount rootfs as squashfs (compressed, read-only). Overlay a writable tmpfs for /tmp and /var. Persistent writable data on a separate partition. Prevents malware persistence across reboots and flash wear.
  • Secure boot chain: U-Boot verifies kernel signature (FIT image with SHA256 + RSA2048/4096 signature). Kernel verified by U-Boot before execution. Enabled via U-Boot's CONFIG_FIT_SIGNATURE=y and a fused public key in OTP efuses.
  • Kernel hardening: Enable CONFIG_SECURITY_YAMA, CONFIG_HARDENED_USERCOPY, CONFIG_FORTIFY_SOURCE, CONFIG_STACKPROTECTOR_STRONG. Disable CONFIG_DEVMEM (disables /dev/mem access). Enable CONFIG_RANDOMIZE_BASE (KASLR) if available for the architecture.
  • SELinux on embedded: Android enforces SELinux. Industrial Linux with stringent security requirements (IEC 62443) increasingly uses SELinux in enforcing mode with domain-specific policies.
  • Default credentials: The most exploited vulnerability in deployed embedded Linux devices is default or hardcoded credentials. Require unique per-device credentials provisioned during manufacturing.
  • CVE exposure: Shipping a frozen kernel version means shipping known CVEs as they are published. Establish a kernel update cadence. Use a long-term stable (LTS) kernel (5.15 LTS, 6.1 LTS, 6.6 LTS) with backported security fixes.

Performance Implications

  • Boot time optimization: systemd-analyze blame (for systemd) or manual timing with kernel timestamps identifies bottlenecks. Parallel fsck, reduced kernel initcall overhead, pre-built initramfs, and swapping to squashfs from ext4 all reduce boot time.
  • CPU frequency scaling: cpufreq drivers with ondemand governor dynamically scale CPU frequency. For headless IoT, powersave governor reduces power. For real-time control: performance governor ensures maximum frequency.
  • Memory footprint: smem reports per-process private, shared, and PSS memory. /proc/meminfo shows system-wide breakdown. Reducing kernel memory: CONFIG_SLUB_TINY, reduced buffer-bloat via TCP settings, smaller dentry/inode caches.
  • I/O performance on NAND: UBIFS is the correct filesystem for raw NAND (not ext4). UBIFS handles bad block management, wear leveling, and provides journaling. ext4 on MTD (via gluebi + mtdblock) is fragile and slow.
  • tmpfs for writes: Placing /tmp, /var/run, /var/log on tmpfs eliminates flash wear from frequent small writes. Combine with logrotate and remote syslog for persistent logging.

Failure Modes

  • Corrupt flash after power-cut during write: Journaling filesystem (UBIFS, ext4 with journal) prevents this, but the root cause is inadequate brownout detection or not using read-only root. Products that allow writes to rootfs are vulnerable.
  • NAND bad block exhaustion: NAND flash has a finite block erase cycle count (10K-100K cycles). Wear leveling distributes writes. Without UBIFS/UBI, manually managing bad blocks is error-prone.
  • Kernel OOM killer: Under memory pressure, the kernel OOM killer terminates processes. Critical daemons should set /proc/<pid>/oom_score_adj to -1000 (exempt from OOM killing). Alternatively, limit all processes with cgroups.
  • Time sync failure: Embedded Linux devices without a hardware RTC boot with epoch time (1970) until NTP sync. SSL certificate validation fails because the system clock predates certificate issuance. Fix: use a hardware RTC + battery backup, or accept a post-NTP-sync initialization for time-sensitive operations.
  • DNS resolution in early boot: systemd-resolved or BusyBox's nslookup requires /etc/resolv.conf. On minimal systems, this file is often missing or empty. Hardcode DNS in /etc/resolv.conf or ensure DHCP populates it before daemons start.

Modern Usage

  • Yocto kirkstone / scarthgap LTS: Modern Yocto releases with 4-year LTS support cycles, matching Linux LTS kernel branches.
  • Containers on embedded: Docker and Podman run on embedded Linux (with Cortex-A + sufficient RAM). Balena.io provides Docker-based OTA for embedded Linux. Enables cloud-native deployment patterns on edge devices.
  • Real-time + Linux (PREEMPT_RT): See Section 35. Combining embedded Linux with the PREEMPT_RT patchset enables <1ms determinism while retaining the Linux userspace.
  • OP-TEE: Open Portable Trusted Execution Environment. ARM TrustZone OS running alongside Linux normal world. Provides secure key storage, cryptographic operations, and DRM. Used in Android's Keystore and on industrial devices requiring HSM functionality.

Future Directions

  • RISC-V embedded Linux: RISC-V application processors (SiFive U74, StarFive JH7110) running Linux. OpenWRT support added. Growing ecosystem of dev boards (BeagleV, VisionFive 2).
  • Rust in the Linux kernel: Rust as a second language for kernel drivers (merged in Linux 6.1). Embedded Linux device drivers increasingly available in Rust, with improved memory safety.
  • Zephyr + Linux heterogeneous: NXP i.MX8M runs Cortex-A73 (Linux) + Cortex-M4 (Zephyr) with RPMsg IPC. This pattern of Linux for connectivity + RTOS for deterministic I/O is growing.
  • OCI-compliant edge runtimes: Kata Containers and Firecracker-based micro-VMs for workload isolation on embedded Linux gateways, bringing cloud security models to the edge.

Exercises

  1. Build a minimal embedded Linux for Raspberry Pi 4 using Buildroot. Target: boot to BusyBox shell in under 5 seconds. Measure total image size and boot time. Strip unnecessary kernel modules to achieve the minimum.
  2. Write a Device Tree overlay (.dtbo) that enables an SPI-connected MCP3204 ADC on a Raspberry Pi. Load it via config.txt. Verify the SPI device node appears and write a user-space program to read it via the /dev/spidev interface.
  3. Configure U-Boot A/B redundant boot for a custom board. Implement the boot selection logic in the bootcmd environment variable. Test by corrupting the kernel in slot A and verifying fallback to slot B.
  4. Profile the boot time of a Yocto minimal image using systemd-analyze. Identify the three slowest initcall operations using initcall_debug on the kernel command line. Investigate if any can be made asynchronous or deferred.
  5. Implement read-only rootfs with overlayfs. Mount a squashfs rootfs, overlay a tmpfs writable layer. Verify that changes survive only until the next reboot, and that the squashfs signature is verifiable with dm-verity.

References

  • Chris Simmonds, Mastering Embedded Linux Programming (3rd ed., Packt, 2021)
  • Karim Yaghmour, Building Embedded Linux Systems (2nd ed., O'Reilly, 2008)
  • Yocto Project Documentation: https://docs.yoctoproject.org/
  • Buildroot User Manual: https://buildroot.org/downloads/manual/manual.html
  • U-Boot Documentation: https://u-boot.readthedocs.io/
  • Device Tree Specification: https://github.com/devicetree-org/devicetree-specification
  • Linux Kernel Documentation, Device Tree: https://www.kernel.org/doc/html/latest/devicetree/
  • eLinux.org (Embedded Linux Wiki): https://elinux.org/Main_Page
  • OpenWRT Project: https://openwrt.org/docs/start
  • OP-TEE Documentation: https://optee.readthedocs.io/