Init Systems
Technical Overview
The init system is PID 1 — the first user-space process started by the kernel, the parent of all other processes, and the process responsible for bringing the system to a usable state. When the kernel finishes its own initialization, it executes /sbin/init (or the path specified by the init= kernel parameter). From that point forward, the kernel handles hardware and scheduling; init handles everything else: starting services, mounting filesystems, managing service restarts, handling shutdown.
The evolution of init systems — from UNIX init through SysVinit, Upstart, and systemd — reflects the growing complexity of modern Linux systems: more services, complex dependency ordering, service supervision, on-demand activation, and integration with hardware event systems. The init system is one of the most consequential and contested pieces of software in the Linux ecosystem.
Prerequisites
- Process creation and PID concepts
- Shell scripting (SysVinit uses shell scripts heavily)
- Basic understanding of UNIX sockets and D-Bus
- Linux kernel concepts: cgroups, namespaces (for systemd section)
- initramfs handoff (see 06-initramfs.md)
Historical Context
UNIX init (1969)
The original UNIX init was a simple process that read /etc/inittab, spawned getty processes on each terminal, and respawned them when they died. If PID 1 died, the kernel panicked (this is still true today — if systemd crashes, the kernel panics).
SysVinit (1983–present in legacy systems)
SysVinit (System V init) introduced the concept of runlevels: discrete system states, each associated with a set of running services. A transition between runlevels involved running /etc/rc.d/rcN.d/ scripts. SysVinit shipped with System V UNIX in 1983 and became the standard for Linux distributions throughout the 1990s and 2000s.
Upstart (2006–2014)
Ubuntu introduced Upstart as a replacement for SysVinit's fundamentally sequential, dependency-unaware boot process. Upstart used an event-driven model: instead of scripts that ran in order, Upstart jobs started when events occurred (e.g., the "filesystem mounted" event triggered services that needed that filesystem).
systemd (2010–present)
Lennart Poettering and Kay Sievers created systemd as an Upstart-inspired but more comprehensive replacement. Systemd became the default on Fedora (2011), then RHEL, Ubuntu (2015), Debian (2015), Arch, and essentially all major distributions. Its adoption was the most contentious technical change in Linux history.
SysVinit
Runlevels
| Runlevel | Meaning |
|---|---|
| 0 | Halt (shutdown) |
| 1 | Single-user mode (maintenance, no network) |
| 2 | Multi-user without NFS (Debian: full multiuser) |
| 3 | Multi-user with networking, no GUI |
| 4 | Undefined (reserved for local use) |
| 5 | Multi-user, networking, GUI (X display manager) |
| 6 | Reboot |
/etc/rc*.d/ Directory Structure
/etc/init.d/ ← actual service scripts
apache2
sshd
networking
mysql
/etc/rc0.d/ ← symlinks for halt
K20apache2 → ../init.d/apache2
K01mysql → ../init.d/mysql
/etc/rc3.d/ ← symlinks for runlevel 3
S20networking → ../init.d/networking
S50sshd → ../init.d/sshd
S80apache2 → ../init.d/apache2
S90mysql → ../init.d/mysql
/etc/rc5.d/ ← symlinks for runlevel 5
S...(same as rc3 + X display manager)
Naming convention: S prefix = Start (when entering runlevel), K prefix = Kill (when leaving runlevel). Numbers (01–99) determine execution order — lower numbers run first.
A typical init.d script (LSB-compliant):
#!/bin/bash
### BEGIN INIT INFO
# Provides: sshd
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: OpenSSH server daemon
### END INIT INFO
case "$1" in
start)
echo "Starting SSH daemon..."
/usr/sbin/sshd -D &
;;
stop)
echo "Stopping SSH daemon..."
kill $(cat /var/run/sshd.pid)
;;
restart)
$0 stop; $0 start
;;
status)
status_of_proc sshd "OpenSSH daemon"
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
esac
SysVinit Limitations
Sequential startup: SysVinit executes scripts one at a time. If script S50sshd takes 5 seconds, script S80apache2 must wait. The total boot time is the sum of all script times.
No dependency management: The numbering system is a crude approximation of dependency ordering. There is no mechanism to express "start when this specific service is ready" vs "start after this script returns."
No service supervision: If sshd crashes, SysVinit does not restart it. External tools (daemontools, runit, monit) were used to fill this gap.
No on-demand activation: Services either run all the time or not at all. Sockets are not activated on demand.
Race conditions: Attempts to parallelize by running scripts in background (used in some distributions) introduced race conditions because ordering was not tracked.
Upstart
Ubuntu 6.10 (2006) replaced SysVinit with Upstart. Upstart's design:
Event-driven model: Services are defined as jobs that start when specific events fire.
Stanzas (job syntax):
# /etc/init/ssh.conf (Upstart job definition)
description "OpenSSH server"
start on (filesystem and started network-services)
stop on runlevel [!2345]
respawn ← restart if process exits unexpectedly
respawn limit 5 60 ← max 5 restarts in 60 seconds
exec /usr/sbin/sshd -D
Upstart events:
- runlevel: Triggered by init level changes
- startup, started, stopping, stopped: Service lifecycle events
- filesystem: All filesystems mounted
- net-device-up: Network interface came up
Upstart was a genuine improvement over SysVinit in expressiveness and parallelism, but it had its own issues: the event model could produce subtle ordering bugs, it required understanding a new configuration format, and it lacked the comprehensive scope of systemd.
Ubuntu EOL'd Upstart in 2014 with Ubuntu 15.04 switching to systemd.
systemd
Architecture
systemd is more than a service manager — it is a suite of daemons and tools that manages: - Service lifecycle (start, stop, restart, enable, disable) - Socket activation (start services on first connection) - Timer activation (cron replacement) - Filesystem mounts (fstab replacement for many cases) - Device management (udev integration) - Login session management (logind) - Journal (logging infrastructure) - Network configuration (networkd) - DNS resolution (resolved) - Time synchronization (timesyncd) - Container management (nspawn)
systemd Architecture (PID 1)
+----------------------------------------------+
| systemd (PID 1) |
| |
| Unit Manager: |
| +----------+ +--------+ +-------+ +------+ |
| | service | | socket | | timer | | mount| |
| | units | | units | | units | | units| |
| +----------+ +--------+ +-------+ +------+ |
| |
| +----------+ +---------+ +---------+ |
| | target | | device | | path | |
| | units | | units | | units | |
| +----------+ +---------+ +---------+ |
| |
| Subsystems: |
| journald logind networkd resolved |
| timesyncd hostnamed localed timedatectl |
+----------------------------------------------+
| | |
cgroup dbus udev
hierarchy communication device events
Unit File Syntax
Units are declarative configuration files describing a managed entity. All unit types share the [Unit] and [Install] sections; each type has its own section ([Service], [Socket], etc.).
Example: Service unit
# /usr/lib/systemd/system/sshd.service
[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8)
After=network.target sshd-keygen.target
Wants=sshd-keygen.target
[Service]
Type=notify
EnvironmentFile=-/etc/crypto-policies/back-ends/opensshserver.config
ExecStartPre=/usr/sbin/sshd -t # config test
ExecStart=/usr/sbin/sshd -D $OPTIONS # main process
ExecReload=/bin/kill -HUP $MAINPID # graceful reload
KillMode=process
Restart=on-failure
RestartSec=42s
RuntimeDirectory=sshd
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
Unit section directives:
| Directive | Meaning |
|-----------|---------|
| Description= | Human-readable name |
| After= | Start after these units (but don't require them) |
| Requires= | Hard dependency — if dependency fails, this fails |
| Wants= | Soft dependency — try to start, don't fail if unavailable |
| BindsTo= | If bound-to unit stops, stop this unit |
| PartOf= | If parent stops/restarts, do the same |
| Conflicts= | Cannot run simultaneously |
| ConditionPathExists= | Only start if path exists |
Service section directives:
| Directive | Meaning |
|-----------|---------|
| Type=simple | ExecStart is the main process (default) |
| Type=forking | Service forks; PID file tracks main process |
| Type=notify | Service sends READY=1 via sd_notify() |
| Type=oneshot | Service runs to completion (not a daemon) |
| Type=dbus | Service registers on D-Bus |
| Restart=no/on-failure/always | Restart policy |
| ExecStart= | Main command |
| ExecStartPre= | Run before main command |
| ExecReload= | Command for systemctl reload |
| User=/Group= | Run as this user/group |
| WorkingDirectory= | CWD for process |
| Environment= | Environment variables |
Dependency Model
systemd Dependency Graph (boot target):
sysinit.target
├── local-fs.target
│ └── /etc/fstab mounts
├── swap.target
│ └── swap entries
└── systemd-udevd.service
basic.target
├── sysinit.target
├── sockets.target
│ └── dbus.socket → (activates) dbus.service
└── timers.target
multi-user.target
├── basic.target
├── sshd.service
├── NetworkManager.service
└── crond.service
graphical.target
├── multi-user.target
└── display-manager.service
↓
gdm.service or sddm.service
Targets replace SysVinit runlevels. Common targets:
- rescue.target: single-user equivalent
- multi-user.target: runlevel 3 equivalent
- graphical.target: runlevel 5 equivalent
- reboot.target, poweroff.target, halt.target
Socket Activation
Socket activation starts a service when a connection arrives on its socket, not at boot time. The systemd socket unit holds the socket file descriptor; the service inherits it when started.
# /usr/lib/systemd/system/sshd.socket
[Socket]
ListenStream=22
Accept=yes ← spawn one sshd per connection
[Install]
WantedBy=sockets.target
For more complex cases (like D-Bus, where all clients connect to one daemon):
# dbus.socket
[Socket]
ListenStream=/run/dbus/system_bus_socket
[Install]
WantedBy=sockets.target
# dbus.service
[Unit]
Requires=dbus.socket
Advantage: system appears "ready" (accepting connections) immediately after boot even before services are fully initialized. Services start on-demand, reducing boot time and memory usage for rarely-used services.
systemd-journald
journald is systemd's logging system, replacing syslogd. It collects stdout/stderr from all systemd-managed services, kernel messages (kmsg), and structured log entries from programs using the sd_journal API.
The journal is stored in binary format at /var/log/journal/ (persistent) or /run/log/journal/ (volatile, lost on reboot).
# Basic journal queries
journalctl # all logs, oldest first
journalctl -f # follow (like tail -f)
journalctl -u sshd # logs for sshd unit
journalctl -b # current boot only
journalctl -b -1 # previous boot
journalctl -k # kernel messages only
journalctl --since "2024-01-01" --until "2024-01-02"
journalctl -p err # error priority and above
journalctl _SYSTEMD_UNIT=sshd.service _PID=1234 # structured fields
journalctl -o json-pretty # JSON output
journalctl --disk-usage # space used by journal
journalctl --vacuum-time=2weeks # purge old entries
cgroup Integration
systemd creates a cgroup hierarchy per service, providing:
- Resource accounting: CPU, memory, I/O usage per service
- Resource limits: MemoryMax=, CPUQuota=, IOWeight= in unit files
- Process containment: Processes in a service's cgroup cannot escape it without explicit capability
- Clean process killing: When stopping a service, systemd kills all processes in the cgroup (no orphans)
# View cgroup tree
systemd-cgls
# Resource limits in unit file:
# [Service]
# MemoryMax=512M ← hard memory limit
# MemoryHigh=256M ← soft limit (throttled above this)
# CPUQuota=50% ← max 50% of one CPU core
# TasksMax=256 ← max number of tasks (threads/processes)
# View resource usage
systemd-cgtop
The systemd-oomd daemon uses cgroup memory pressure metrics to proactively kill processes before the kernel OOM killer acts.
systemctl Management
# Service lifecycle
systemctl start sshd
systemctl stop sshd
systemctl restart sshd
systemctl reload sshd # send SIGHUP or ExecReload command
systemctl status sshd # detailed status including recent journal
# Boot behavior
systemctl enable sshd # create symlink in wants/ dir
systemctl disable sshd # remove symlink
systemctl is-enabled sshd
# System-wide operations
systemctl list-units --type=service --state=running
systemctl list-units --failed
systemctl daemon-reload # reload unit file changes
systemctl isolate rescue.target # switch to single-user
# Dependency inspection
systemctl list-dependencies sshd
systemctl cat sshd.service # show effective unit file
systemctl show sshd.service # show all properties
systemd Criticisms
systemd adoption triggered significant debate, raising real engineering concerns:
"Do one thing well" (Unix philosophy): systemd violates the Unix philosophy of small, focused tools. Its scope includes init, logging, DNS, time sync, network config, containers, and more. Critics argue this creates a single point of failure and makes the system harder to understand.
Binary log format: journald's binary logs require journalctl to read. Traditional syslog is plain text, viewable with any text tool. The binary format can be corrupted (though journald has checksums), and reading logs from a dead system requires specific tools.
PID 1 complexity: PID 1 is uniquely dangerous — if it crashes, the kernel panics. The complexity of systemd as PID 1 (vs minimal SysVinit) increases the blast radius of bugs. In practice, systemd PID 1 has been extremely stable.
D-Bus dependency: systemd deeply integrates with D-Bus for communication between its components. Debugging D-Bus communication is non-trivial.
Opaque behavior: Complex unit dependencies, generator scripts, and drop-in files make it harder to trace why a system does what it does at boot compared to explicit shell scripts.
These criticisms have merit but have not prevented systemd's dominance. The performance benefits (parallel startup, socket activation) and features (journal, cgroup integration, unified service management) have proven compelling enough for virtually all major distributions to adopt it.
OpenRC and runit Alternatives
OpenRC (Gentoo, Alpine Linux, Artix Linux): - Init scripts are shell scripts (familiar SysVinit-like syntax) - Supports parallel service startup - Dependency-based ordering (not just numbered scripts) - Service supervision optional (via third-party) - Does not require D-Bus - Used in Alpine Linux (containers often use Alpine for minimal footprint)
# OpenRC commands
rc-service sshd start
rc-service sshd status
rc-update add sshd default ← enable at default runlevel
rc-status ← view all services
runit (Void Linux, used in some Alpine configs):
- Extremely minimal (< 5000 lines of C)
- Three stages: Stage 1 (system init), Stage 2 (service supervision), Stage 3 (shutdown)
- Service directories: each service has a run script and optional log service
- Automatic restart of crashed services
- Process supervision tree: parent-child relationships enforced
# runit service structure
/etc/runit/runsvdir/default/
sshd -> /etc/sv/sshd ← symlink enables service
/etc/sv/sshd/
run: #!/bin/sh\nexec /usr/sbin/sshd -D
log/
run: #!/bin/sh\nexec svlogd -tt /var/log/sshd/
# runit commands
sv start sshd
sv stop sshd
sv status sshd
s6 (used in some embedded systems): - Similar supervision model to runit but with strict process lifecycle guarantees - Designed for container PID 1 scenarios - Very small attack surface
Debugging Notes
Service fails to start:
systemctl status sshd.service # shows exit code + recent log
journalctl -u sshd --since "5 min ago" # detailed logs
systemctl cat sshd.service # view unit file
# Check: ExecStartPre= may fail; check ConditionPathExists=
Boot is slow — finding bottlenecks:
systemd-analyze # total boot time
systemd-analyze blame # per-service time
systemd-analyze critical-chain # critical path through dependencies
systemd-analyze plot > boot.svg # visual waterfall chart
Service cannot be stopped (timeout):
# Default TimeoutStopSec=90 — service gets 90s to stop
# If process ignores SIGTERM, SIGKILL is sent after timeout
# Add to [Service]: TimeoutStopSec=5
Dependency cycle:
systemd-analyze verify sshd.service # verify unit, detect cycles
systemctl list-dependencies --all sshd.service
PID 1 issues (container environments):
In Docker containers, PID 1 must handle SIGTERM for clean shutdown. Using systemd as container PID 1 requires --privileged and mounting cgroupfs. Lighter alternatives (tini, dumb-init) are often used instead.
Security Implications
systemd unit hardening:
[Service]
# Reduce attack surface:
NoNewPrivileges=yes # no setuid execution
ProtectSystem=strict # /usr and /boot read-only
ProtectHome=yes # /home, /root inaccessible
PrivateTmp=yes # private /tmp (namespace)
PrivateNetwork=yes # network namespace isolation
RestrictSUIDSGID=yes # no SUID/SGID file creation
SystemCallFilter=@system-service # syscall whitelist
CapabilityBoundingSet= # drop all capabilities
Running systemd-analyze security sshd.service rates the service's security hardening (0–10 scale, 10 = most restricted).
Drop-in file injection:
A compromised package could place a .conf file in /etc/systemd/system/sshd.service.d/ to add ExecStartPost= that runs malicious code. Auditing drop-in directories is important in security-sensitive environments.
Journal tampering:
journald stores a Forward Secure Sealing (FSS) key to detect tampering with log entries. Enable with journalctl --setup-keys. This creates a verification key; future log verification proves logs were not modified.
Performance Implications
Boot time comparison: - SysVinit (sequential): 30–90 seconds typical boot on 2010s hardware - systemd (parallel): 5–15 seconds on the same hardware - systemd with socket activation: services "available" in <3 seconds, fully started later
Runtime overhead: systemd's PID 1 is resident in memory always. Memory usage: ~20–50MB for systemd + journald + basic units. Trivial on modern hardware, significant on embedded 128MB systems.
Socket activation latency: The first connection to a socket-activated service incurs latency (service startup time). Subsequent connections are served directly. This tradeoff is acceptable for infrequently-used services.
Failure Modes
systemd fails to start a critical service, kernel eventually panics:
If default.target cannot be reached and systemd exhausts recovery options, it will drop to emergency mode. If emergency mode fails, the kernel panics.
Dependency loop causes boot deadlock:
Requires= cycles cause systemd to fail both units involved. Detection: systemd-analyze verify. Resolution: break cycle with Wants= (soft dependency).
Unit file syntax error:
A typo in a unit file may cause it to be silently ignored or fail with a cryptic error. Always use systemd-analyze verify <unit> after editing.
journald disk full:
If /var/log/journal/ fills the disk, journald drops log entries. Set SystemMaxUse= in /etc/systemd/journald.conf to limit journal disk usage.
Modern Usage
systemd is now the standard on all major Linux distributions. Key modern developments:
systemd-nspawn: Container tool for OS-level virtualization; used for Fedora development containers and machinectl.
systemd-homed: User home directory management with per-user LUKS encryption.
systemd portable services: Containers that integrate with systemd unit management.
systemd 253+ features: Credentials management (LoadCredential=), service log rate limiting, more fine-grained cgroup v2 delegation.
Future Directions
- cgroup v2 full adoption: systemd 244+ fully supports cgroup v2 unified hierarchy; migration is ongoing
- systemd in Fedora Atomic / OSTree: System services managed by systemd on immutable OS images
- service credentials:
LoadCredential=provides secret injection into services without environment variable exposure - Varlink API replacement for D-Bus: systemd is migrating some internal communication from D-Bus to Varlink (simpler, faster protocol)
Exercises
-
SysVinit Script Analysis: On a system with
/etc/init.d/scripts (or a Debian VM), examine three service scripts. Identify the LSB header, the start/stop/status handlers, and the runlevel configuration. Manually trace whatS50andK20prefixes imply about the startup and shutdown order relative to other services. -
systemd Unit Authoring: Write a systemd service unit for a simple Python HTTP server (
python3 -m http.server 8080). Add: restart on failure, run as a non-root user, private /tmp, no new privileges, resource limits (50% CPU, 256MB RAM). Test withsystemctl start,systemctl status, and simulate a crash withkill. -
Boot Time Analysis: Run
systemd-analyze blameandsystemd-analyze critical-chain. Identify the three slowest services. For each, determine if the delay is genuinely necessary or can be reduced (e.g., by using socket activation, changingAfter=toWants=, or disabling unused services). -
Journal Structured Logging: Write a program in Python or C that sends structured log messages to the systemd journal using
sd_notifyor the/run/systemd/journal/socketprotocol (or usesystemd-catas a wrapper). Query these messages withjournalctlusing the custom field you defined. -
OpenRC vs systemd Comparison: Install Void Linux (runit) and Arch Linux (systemd) in two separate VMs. Start with the same set of services (sshd, httpd, a custom script). Compare: configuration syntax, startup time, resource usage, failure recovery behavior, and the mechanism used to enable/disable services at boot.
References
- systemd project: https://systemd.io/
- systemd man pages: https://www.freedesktop.org/software/systemd/man/
- Lennart Poettering's systemd blog series (2010): http://0pointer.de/blog/projects/systemd.html
- "Rethinking PID 1" — Lennart Poettering (original systemd announcement)
- OpenRC project: https://github.com/OpenRC/openrc
- runit: https://smarden.org/runit/
- s6: https://skarnet.org/software/s6/
systemd-analyze(1)man page- "The Tragedy of systemd" — Benno Rice (critical but fair analysis)
- Linux kernel
Documentation/admin-guide/init.rst - SysVinit source: https://savannah.nongnu.org/projects/sysvinit
- Upstart cookbook: https://upstart.ubuntu.com/cookbook/