Init Systems

Technical Overview

The init system is PID 1 — the first user-space process started by the kernel, the parent of all other processes, and the process responsible for bringing the system to a usable state. When the kernel finishes its own initialization, it executes /sbin/init (or the path specified by the init= kernel parameter). From that point forward, the kernel handles hardware and scheduling; init handles everything else: starting services, mounting filesystems, managing service restarts, handling shutdown.

The evolution of init systems — from UNIX init through SysVinit, Upstart, and systemd — reflects the growing complexity of modern Linux systems: more services, complex dependency ordering, service supervision, on-demand activation, and integration with hardware event systems. The init system is one of the most consequential and contested pieces of software in the Linux ecosystem.

Prerequisites

Process creation and PID concepts
Shell scripting (SysVinit uses shell scripts heavily)
Basic understanding of UNIX sockets and D-Bus
Linux kernel concepts: cgroups, namespaces (for systemd section)
initramfs handoff (see 06-initramfs.md)

Historical Context

UNIX init (1969)

The original UNIX init was a simple process that read /etc/inittab, spawned getty processes on each terminal, and respawned them when they died. If PID 1 died, the kernel panicked (this is still true today — if systemd crashes, the kernel panics).

SysVinit (1983–present in legacy systems)

SysVinit (System V init) introduced the concept of runlevels: discrete system states, each associated with a set of running services. A transition between runlevels involved running /etc/rc.d/rcN.d/ scripts. SysVinit shipped with System V UNIX in 1983 and became the standard for Linux distributions throughout the 1990s and 2000s.

Upstart (2006–2014)

Ubuntu introduced Upstart as a replacement for SysVinit's fundamentally sequential, dependency-unaware boot process. Upstart used an event-driven model: instead of scripts that ran in order, Upstart jobs started when events occurred (e.g., the "filesystem mounted" event triggered services that needed that filesystem).

systemd (2010–present)

Lennart Poettering and Kay Sievers created systemd as an Upstart-inspired but more comprehensive replacement. Systemd became the default on Fedora (2011), then RHEL, Ubuntu (2015), Debian (2015), Arch, and essentially all major distributions. Its adoption was the most contentious technical change in Linux history.

SysVinit

Runlevels

Runlevel	Meaning
0	Halt (shutdown)
1	Single-user mode (maintenance, no network)
2	Multi-user without NFS (Debian: full multiuser)
3	Multi-user with networking, no GUI
4	Undefined (reserved for local use)
5	Multi-user, networking, GUI (X display manager)
6	Reboot

/etc/rc*.d/ Directory Structure

/etc/init.d/        ← actual service scripts
  apache2
  sshd
  networking
  mysql

/etc/rc0.d/         ← symlinks for halt
  K20apache2 → ../init.d/apache2
  K01mysql   → ../init.d/mysql

/etc/rc3.d/         ← symlinks for runlevel 3
  S20networking → ../init.d/networking
  S50sshd      → ../init.d/sshd
  S80apache2   → ../init.d/apache2
  S90mysql     → ../init.d/mysql

/etc/rc5.d/         ← symlinks for runlevel 5
  S...(same as rc3 + X display manager)

Naming convention: S prefix = Start (when entering runlevel), K prefix = Kill (when leaving runlevel). Numbers (01–99) determine execution order — lower numbers run first.

A typical init.d script (LSB-compliant):

#!/bin/bash
### BEGIN INIT INFO
# Provides:          sshd
# Required-Start:    $network
# Required-Stop:     $network
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: OpenSSH server daemon
### END INIT INFO

case "$1" in
  start)
    echo "Starting SSH daemon..."
    /usr/sbin/sshd -D &
    ;;
  stop)
    echo "Stopping SSH daemon..."
    kill $(cat /var/run/sshd.pid)
    ;;
  restart)
    $0 stop; $0 start
    ;;
  status)
    status_of_proc sshd "OpenSSH daemon"
    ;;
  *)
    echo "Usage: $0 {start|stop|restart|status}"
    exit 1
esac

SysVinit Limitations

Sequential startup: SysVinit executes scripts one at a time. If script S50sshd takes 5 seconds, script S80apache2 must wait. The total boot time is the sum of all script times.

No dependency management: The numbering system is a crude approximation of dependency ordering. There is no mechanism to express "start when this specific service is ready" vs "start after this script returns."

No service supervision: If sshd crashes, SysVinit does not restart it. External tools (daemontools, runit, monit) were used to fill this gap.

No on-demand activation: Services either run all the time or not at all. Sockets are not activated on demand.

Race conditions: Attempts to parallelize by running scripts in background (used in some distributions) introduced race conditions because ordering was not tracked.

Upstart

Ubuntu 6.10 (2006) replaced SysVinit with Upstart. Upstart's design:

Event-driven model: Services are defined as jobs that start when specific events fire.

Stanzas (job syntax):

# /etc/init/ssh.conf (Upstart job definition)
description "OpenSSH server"
start on (filesystem and started network-services)
stop on runlevel [!2345]

respawn           ← restart if process exits unexpectedly
respawn limit 5 60  ← max 5 restarts in 60 seconds

exec /usr/sbin/sshd -D

Upstart events: - runlevel: Triggered by init level changes - startup, started, stopping, stopped: Service lifecycle events - filesystem: All filesystems mounted - net-device-up: Network interface came up

Upstart was a genuine improvement over SysVinit in expressiveness and parallelism, but it had its own issues: the event model could produce subtle ordering bugs, it required understanding a new configuration format, and it lacked the comprehensive scope of systemd.

Ubuntu EOL'd Upstart in 2014 with Ubuntu 15.04 switching to systemd.

systemd

Architecture

systemd is more than a service manager — it is a suite of daemons and tools that manages: - Service lifecycle (start, stop, restart, enable, disable) - Socket activation (start services on first connection) - Timer activation (cron replacement) - Filesystem mounts (fstab replacement for many cases) - Device management (udev integration) - Login session management (logind) - Journal (logging infrastructure) - Network configuration (networkd) - DNS resolution (resolved) - Time synchronization (timesyncd) - Container management (nspawn)

systemd Architecture (PID 1)

  +----------------------------------------------+
  |                systemd (PID 1)               |
  |                                              |
  |  Unit Manager:                               |
  |  +----------+ +--------+ +-------+ +------+ |
  |  | service  | | socket | | timer | | mount| |
  |  | units    | | units  | | units | | units| |
  |  +----------+ +--------+ +-------+ +------+ |
  |                                              |
  |  +----------+ +---------+ +---------+        |
  |  | target   | | device  | | path    |        |
  |  | units    | | units   | | units   |        |
  |  +----------+ +---------+ +---------+        |
  |                                              |
  |  Subsystems:                                 |
  |  journald  logind  networkd  resolved         |
  |  timesyncd hostnamed localed timedatectl      |
  +----------------------------------------------+
        |            |             |
     cgroup        dbus          udev
     hierarchy    communication  device events

Unit File Syntax

Units are declarative configuration files describing a managed entity. All unit types share the [Unit] and [Install] sections; each type has its own section ([Service], [Socket], etc.).

Example: Service unit

# /usr/lib/systemd/system/sshd.service
[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8)
After=network.target sshd-keygen.target
Wants=sshd-keygen.target

[Service]
Type=notify
EnvironmentFile=-/etc/crypto-policies/back-ends/opensshserver.config
ExecStartPre=/usr/sbin/sshd -t         # config test
ExecStart=/usr/sbin/sshd -D $OPTIONS   # main process
ExecReload=/bin/kill -HUP $MAINPID     # graceful reload
KillMode=process
Restart=on-failure
RestartSec=42s
RuntimeDirectory=sshd
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

Unit section directives: | Directive | Meaning | |-----------|---------| | Description= | Human-readable name | | After= | Start after these units (but don't require them) | | Requires= | Hard dependency — if dependency fails, this fails | | Wants= | Soft dependency — try to start, don't fail if unavailable | | BindsTo= | If bound-to unit stops, stop this unit | | PartOf= | If parent stops/restarts, do the same | | Conflicts= | Cannot run simultaneously | | ConditionPathExists= | Only start if path exists |

Service section directives: | Directive | Meaning | |-----------|---------| | Type=simple | ExecStart is the main process (default) | | Type=forking | Service forks; PID file tracks main process | | Type=notify | Service sends READY=1 via sd_notify() | | Type=oneshot | Service runs to completion (not a daemon) | | Type=dbus | Service registers on D-Bus | | Restart=no/on-failure/always | Restart policy | | ExecStart= | Main command | | ExecStartPre= | Run before main command | | ExecReload= | Command for systemctl reload | | User=/Group= | Run as this user/group | | WorkingDirectory= | CWD for process | | Environment= | Environment variables |

Dependency Model

systemd Dependency Graph (boot target):

  sysinit.target
    ├── local-fs.target
    │     └── /etc/fstab mounts
    ├── swap.target
    │     └── swap entries
    └── systemd-udevd.service

  basic.target
    ├── sysinit.target
    ├── sockets.target
    │     └── dbus.socket → (activates) dbus.service
    └── timers.target

  multi-user.target
    ├── basic.target
    ├── sshd.service
    ├── NetworkManager.service
    └── crond.service

  graphical.target
    ├── multi-user.target
    └── display-manager.service
              ↓
          gdm.service or sddm.service

Targets replace SysVinit runlevels. Common targets: - rescue.target: single-user equivalent - multi-user.target: runlevel 3 equivalent - graphical.target: runlevel 5 equivalent - reboot.target, poweroff.target, halt.target

Socket Activation

Socket activation starts a service when a connection arrives on its socket, not at boot time. The systemd socket unit holds the socket file descriptor; the service inherits it when started.

# /usr/lib/systemd/system/sshd.socket
[Socket]
ListenStream=22
Accept=yes   ← spawn one sshd per connection

[Install]
WantedBy=sockets.target

For more complex cases (like D-Bus, where all clients connect to one daemon):

# dbus.socket
[Socket]
ListenStream=/run/dbus/system_bus_socket
[Install]
WantedBy=sockets.target

# dbus.service
[Unit]
Requires=dbus.socket

Advantage: system appears "ready" (accepting connections) immediately after boot even before services are fully initialized. Services start on-demand, reducing boot time and memory usage for rarely-used services.

systemd-journald

journald is systemd's logging system, replacing syslogd. It collects stdout/stderr from all systemd-managed services, kernel messages (kmsg), and structured log entries from programs using the sd_journal API.

The journal is stored in binary format at /var/log/journal/ (persistent) or /run/log/journal/ (volatile, lost on reboot).

# Basic journal queries
journalctl                          # all logs, oldest first
journalctl -f                       # follow (like tail -f)
journalctl -u sshd                  # logs for sshd unit
journalctl -b                       # current boot only
journalctl -b -1                    # previous boot
journalctl -k                       # kernel messages only
journalctl --since "2024-01-01" --until "2024-01-02"
journalctl -p err                   # error priority and above
journalctl _SYSTEMD_UNIT=sshd.service _PID=1234  # structured fields
journalctl -o json-pretty           # JSON output
journalctl --disk-usage             # space used by journal
journalctl --vacuum-time=2weeks     # purge old entries

cgroup Integration

systemd creates a cgroup hierarchy per service, providing: - Resource accounting: CPU, memory, I/O usage per service - Resource limits: MemoryMax=, CPUQuota=, IOWeight= in unit files - Process containment: Processes in a service's cgroup cannot escape it without explicit capability - Clean process killing: When stopping a service, systemd kills all processes in the cgroup (no orphans)

# View cgroup tree
systemd-cgls

# Resource limits in unit file:
# [Service]
# MemoryMax=512M        ← hard memory limit
# MemoryHigh=256M       ← soft limit (throttled above this)
# CPUQuota=50%          ← max 50% of one CPU core
# TasksMax=256          ← max number of tasks (threads/processes)

# View resource usage
systemd-cgtop

The systemd-oomd daemon uses cgroup memory pressure metrics to proactively kill processes before the kernel OOM killer acts.

systemctl Management

# Service lifecycle
systemctl start sshd
systemctl stop sshd
systemctl restart sshd
systemctl reload sshd      # send SIGHUP or ExecReload command
systemctl status sshd      # detailed status including recent journal

# Boot behavior
systemctl enable sshd      # create symlink in wants/ dir
systemctl disable sshd     # remove symlink
systemctl is-enabled sshd

# System-wide operations
systemctl list-units --type=service --state=running
systemctl list-units --failed
systemctl daemon-reload    # reload unit file changes
systemctl isolate rescue.target  # switch to single-user

# Dependency inspection
systemctl list-dependencies sshd
systemctl cat sshd.service    # show effective unit file
systemctl show sshd.service   # show all properties

systemd Criticisms

systemd adoption triggered significant debate, raising real engineering concerns:

"Do one thing well" (Unix philosophy): systemd violates the Unix philosophy of small, focused tools. Its scope includes init, logging, DNS, time sync, network config, containers, and more. Critics argue this creates a single point of failure and makes the system harder to understand.

Binary log format: journald's binary logs require journalctl to read. Traditional syslog is plain text, viewable with any text tool. The binary format can be corrupted (though journald has checksums), and reading logs from a dead system requires specific tools.

PID 1 complexity: PID 1 is uniquely dangerous — if it crashes, the kernel panics. The complexity of systemd as PID 1 (vs minimal SysVinit) increases the blast radius of bugs. In practice, systemd PID 1 has been extremely stable.

D-Bus dependency: systemd deeply integrates with D-Bus for communication between its components. Debugging D-Bus communication is non-trivial.

Opaque behavior: Complex unit dependencies, generator scripts, and drop-in files make it harder to trace why a system does what it does at boot compared to explicit shell scripts.

These criticisms have merit but have not prevented systemd's dominance. The performance benefits (parallel startup, socket activation) and features (journal, cgroup integration, unified service management) have proven compelling enough for virtually all major distributions to adopt it.

OpenRC and runit Alternatives

OpenRC (Gentoo, Alpine Linux, Artix Linux): - Init scripts are shell scripts (familiar SysVinit-like syntax) - Supports parallel service startup - Dependency-based ordering (not just numbered scripts) - Service supervision optional (via third-party) - Does not require D-Bus - Used in Alpine Linux (containers often use Alpine for minimal footprint)

# OpenRC commands
rc-service sshd start
rc-service sshd status
rc-update add sshd default    ← enable at default runlevel
rc-status                     ← view all services

runit (Void Linux, used in some Alpine configs): - Extremely minimal (< 5000 lines of C) - Three stages: Stage 1 (system init), Stage 2 (service supervision), Stage 3 (shutdown) - Service directories: each service has a run script and optional log service - Automatic restart of crashed services - Process supervision tree: parent-child relationships enforced

# runit service structure
/etc/runit/runsvdir/default/
  sshd -> /etc/sv/sshd    ← symlink enables service

/etc/sv/sshd/
  run:   #!/bin/sh\nexec /usr/sbin/sshd -D
  log/
    run: #!/bin/sh\nexec svlogd -tt /var/log/sshd/

# runit commands
sv start sshd
sv stop sshd
sv status sshd

s6 (used in some embedded systems): - Similar supervision model to runit but with strict process lifecycle guarantees - Designed for container PID 1 scenarios - Very small attack surface

Debugging Notes

Service fails to start:

systemctl status sshd.service    # shows exit code + recent log
journalctl -u sshd --since "5 min ago"  # detailed logs
systemctl cat sshd.service       # view unit file
# Check: ExecStartPre= may fail; check ConditionPathExists=

Boot is slow — finding bottlenecks:

systemd-analyze                  # total boot time
systemd-analyze blame            # per-service time
systemd-analyze critical-chain   # critical path through dependencies
systemd-analyze plot > boot.svg  # visual waterfall chart

Service cannot be stopped (timeout):

# Default TimeoutStopSec=90 — service gets 90s to stop
# If process ignores SIGTERM, SIGKILL is sent after timeout
# Add to [Service]: TimeoutStopSec=5

Dependency cycle:

systemd-analyze verify sshd.service  # verify unit, detect cycles
systemctl list-dependencies --all sshd.service

PID 1 issues (container environments): In Docker containers, PID 1 must handle SIGTERM for clean shutdown. Using systemd as container PID 1 requires --privileged and mounting cgroupfs. Lighter alternatives (tini, dumb-init) are often used instead.

Security Implications

systemd unit hardening:

[Service]
# Reduce attack surface:
NoNewPrivileges=yes           # no setuid execution
ProtectSystem=strict          # /usr and /boot read-only
ProtectHome=yes               # /home, /root inaccessible
PrivateTmp=yes                # private /tmp (namespace)
PrivateNetwork=yes            # network namespace isolation
RestrictSUIDSGID=yes          # no SUID/SGID file creation
SystemCallFilter=@system-service  # syscall whitelist
CapabilityBoundingSet=        # drop all capabilities

Running systemd-analyze security sshd.service rates the service's security hardening (0–10 scale, 10 = most restricted).

Drop-in file injection: A compromised package could place a .conf file in /etc/systemd/system/sshd.service.d/ to add ExecStartPost= that runs malicious code. Auditing drop-in directories is important in security-sensitive environments.

Journal tampering: journald stores a Forward Secure Sealing (FSS) key to detect tampering with log entries. Enable with journalctl --setup-keys. This creates a verification key; future log verification proves logs were not modified.

Performance Implications

Boot time comparison: - SysVinit (sequential): 30–90 seconds typical boot on 2010s hardware - systemd (parallel): 5–15 seconds on the same hardware - systemd with socket activation: services "available" in <3 seconds, fully started later

Runtime overhead: systemd's PID 1 is resident in memory always. Memory usage: ~20–50MB for systemd + journald + basic units. Trivial on modern hardware, significant on embedded 128MB systems.

Socket activation latency: The first connection to a socket-activated service incurs latency (service startup time). Subsequent connections are served directly. This tradeoff is acceptable for infrequently-used services.

Failure Modes

systemd fails to start a critical service, kernel eventually panics: If default.target cannot be reached and systemd exhausts recovery options, it will drop to emergency mode. If emergency mode fails, the kernel panics.

Dependency loop causes boot deadlock: Requires= cycles cause systemd to fail both units involved. Detection: systemd-analyze verify. Resolution: break cycle with Wants= (soft dependency).

Unit file syntax error: A typo in a unit file may cause it to be silently ignored or fail with a cryptic error. Always use systemd-analyze verify <unit> after editing.

journald disk full: If /var/log/journal/ fills the disk, journald drops log entries. Set SystemMaxUse= in /etc/systemd/journald.conf to limit journal disk usage.

Modern Usage

systemd is now the standard on all major Linux distributions. Key modern developments:

systemd-nspawn: Container tool for OS-level virtualization; used for Fedora development containers and machinectl.

systemd-homed: User home directory management with per-user LUKS encryption.

systemd portable services: Containers that integrate with systemd unit management.

systemd 253+ features: Credentials management (LoadCredential=), service log rate limiting, more fine-grained cgroup v2 delegation.

Future Directions

cgroup v2 full adoption: systemd 244+ fully supports cgroup v2 unified hierarchy; migration is ongoing
systemd in Fedora Atomic / OSTree: System services managed by systemd on immutable OS images
service credentials: LoadCredential= provides secret injection into services without environment variable exposure
Varlink API replacement for D-Bus: systemd is migrating some internal communication from D-Bus to Varlink (simpler, faster protocol)

Exercises

SysVinit Script Analysis: On a system with /etc/init.d/ scripts (or a Debian VM), examine three service scripts. Identify the LSB header, the start/stop/status handlers, and the runlevel configuration. Manually trace what S50 and K20 prefixes imply about the startup and shutdown order relative to other services.
systemd Unit Authoring: Write a systemd service unit for a simple Python HTTP server (python3 -m http.server 8080). Add: restart on failure, run as a non-root user, private /tmp, no new privileges, resource limits (50% CPU, 256MB RAM). Test with systemctl start, systemctl status, and simulate a crash with kill.
Boot Time Analysis: Run systemd-analyze blame and systemd-analyze critical-chain. Identify the three slowest services. For each, determine if the delay is genuinely necessary or can be reduced (e.g., by using socket activation, changing After= to Wants=, or disabling unused services).
Journal Structured Logging: Write a program in Python or C that sends structured log messages to the systemd journal using sd_notify or the /run/systemd/journal/socket protocol (or use systemd-cat as a wrapper). Query these messages with journalctl using the custom field you defined.
OpenRC vs systemd Comparison: Install Void Linux (runit) and Arch Linux (systemd) in two separate VMs. Start with the same set of services (sshd, httpd, a custom script). Compare: configuration syntax, startup time, resource usage, failure recovery behavior, and the mechanism used to enable/disable services at boot.

References

systemd project: https://systemd.io/
systemd man pages: https://www.freedesktop.org/software/systemd/man/
Lennart Poettering's systemd blog series (2010): http://0pointer.de/blog/projects/systemd.html
"Rethinking PID 1" — Lennart Poettering (original systemd announcement)
OpenRC project: https://github.com/OpenRC/openrc
runit: https://smarden.org/runit/
s6: https://skarnet.org/software/s6/
systemd-analyze(1) man page
"The Tragedy of systemd" — Benno Rice (critical but fair analysis)
Linux kernel Documentation/admin-guide/init.rst
SysVinit source: https://savannah.nongnu.org/projects/sysvinit
Upstart cookbook: https://upstart.ubuntu.com/cookbook/