Skip to content

05 — IPC Mechanisms

Technical Overview

Inter-Process Communication (IPC) is the set of mechanisms by which separate processes exchange data or synchronize actions. No single IPC mechanism suits all situations: the right choice depends on latency requirements, bandwidth needs, whether processes share a machine or are distributed, whether the communication is structured or stream-based, and whether data ordering or message boundaries matter. Linux provides at least eight distinct IPC families, each with its own trade-offs and legacy baggage.

This file covers the full taxonomy, with emphasis on the mechanisms used in modern production systems.


Prerequisites

  • 01-process-concept.md: address space, file descriptors, mm_struct
  • 02-fork-and-exec.md: dup_fd, inherited file descriptors
  • Basic understanding of kernel buffers and reference counting

Core Content

IPC Comparison Table

┌─────────────────────┬───────────┬──────────────┬────────────────────────────────────┐
│ Mechanism           │ Latency   │ Bandwidth    │ Best use case                       │
├─────────────────────┼───────────┼──────────────┼────────────────────────────────────┤
│ Anonymous pipe      │ ~1 µs     │ ~4 GB/s      │ Parent↔child, shell pipelines       │
│ Named pipe (FIFO)   │ ~1 µs     │ ~4 GB/s      │ Unrelated processes, same host      │
│ POSIX MQ            │ 2–5 µs    │ ~1 GB/s      │ Priority messages, real-time        │
│ System V msg queue  │ 5–10 µs   │ ~500 MB/s    │ Legacy; avoid in new code           │
│ System V shared mem │ <100 ns   │ memory speed │ Legacy; avoid; use POSIX shm        │
│ POSIX shared memory │ <100 ns   │ memory speed │ Fastest bulk data transfer          │
│ AF_UNIX stream      │ ~5 µs     │ ~3 GB/s      │ Structured protocol, fd passing     │
│ AF_UNIX dgram       │ ~3 µs     │ ~2 GB/s      │ Message-oriented, no ordering need  │
│ memfd + mmap        │ <100 ns   │ memory speed │ Anonymous file-backed shared mem    │
│ eventfd             │ ~500 ns   │ N/A (events) │ Lightweight event notification      │
│ signalfd            │ ~1 µs     │ N/A (signals)│ Signal handling in event loop       │
│ timerfd             │ ~500 ns   │ N/A (timers) │ Timer events in epoll loop          │
└─────────────────────┴───────────┴──────────────┴────────────────────────────────────┘

Latency figures are rough order-of-magnitude for same-machine localhost, kernel 5.x, modern x86-64. Actual figures depend heavily on CPU cache state and copy sizes.


Anonymous Pipes

The simplest IPC: a unidirectional byte stream between two related processes.

int pipefd[2];
pipe2(pipefd, O_CLOEXEC);   // pipefd[0] = read end, pipefd[1] = write end

pid_t pid = fork();
if (pid == 0) {
    // child: writer
    close(pipefd[0]);
    write(pipefd[1], "hello", 5);
    close(pipefd[1]);
    _exit(0);
}
// parent: reader
close(pipefd[1]);
char buf[64];
ssize_t n = read(pipefd[0], buf, sizeof(buf));

Internals: - Pipe is backed by a pipe buffer in kernel memory (not a file on disk). - Default capacity: 65,536 bytes (PIPE_BUF = 4096 for atomic writes; total pipe buffer = 16 pages = 65,536 bytes by default on Linux). - Adjustable per-pipe via fcntl(fd, F_SETPIPE_SZ, size) up to /proc/sys/fs/pipe-max-size (default 1 MB). - Atomic writes: writes of ≤ PIPE_BUF (4096) bytes are atomic — they will not be interleaved with writes from other processes to the same pipe. - Blocking semantics: write blocks when pipe is full; read blocks when pipe is empty. Both return 0/EOF when all write ends are closed.

  Writer process                   Reader process
  ┌──────────┐                     ┌──────────┐
  │ write()  │──►┌─────────────┐──►│ read()   │
  └──────────┘   │ kernel pipe │   └──────────┘
                 │ buffer      │
                 │ [65536 B]   │
                 └─────────────┘

Named Pipes (FIFOs)

A FIFO is a pipe with a name in the filesystem. Any process with appropriate permissions can open it.

mkfifo /tmp/my_pipe
# Writer:
echo "data" > /tmp/my_pipe &
# Reader:
cat /tmp/my_pipe

Key behavior: open() on a FIFO blocks until both a reader and a writer have opened it, unless O_NONBLOCK is used. The FIFO name is in the filesystem but the data never touches disk — it lives in the same kernel pipe buffer as anonymous pipes.

Use cases: log aggregation pipelines, inter-service communication in legacy Unix scripts, systemd socket activation for FIFO-based services.


POSIX Message Queues

POSIX MQs (mqueue) provide a message-oriented (not stream-oriented) channel with priority ordering: messages with higher priority (lower number = higher priority in POSIX; higher number = higher priority in Linux's actual implementation) are dequeued first regardless of arrival order.

#include <mqueue.h>
// Sender:
mqd_t mq = mq_open("/my_queue", O_CREAT | O_WRONLY, 0644, NULL);
mq_send(mq, "msg", 3, 10 /* priority */);
mq_close(mq);

// Receiver:
mqd_t mq = mq_open("/my_queue", O_RDONLY);
char buf[256];
unsigned int prio;
mq_receive(mq, buf, sizeof(buf), &prio);
mq_close(mq);
mq_unlink("/my_queue");  // remove when done

Properties: - Queues appear in /dev/mqueue (tmpfs mountpoint, visible with ls /dev/mqueue). - Maximum messages and message size configurable via struct mq_attr or /proc/sys/fs/mqueue/. - mq_notify(3): asynchronous notification via signal or thread creation when a message arrives on an empty queue. - Used in real-time systems, POSIX-compliant IPC code, and cases where message ordering by priority is needed without a full message broker.


System V IPC (Legacy)

System V IPC (SysV) provides three mechanisms: message queues (msgget/msgsnd/msgrcv), shared memory (shmget/shmat/shmdt), and semaphores (semget/semop).

Avoid in new code. Reasons: - Identified by integer keys (generated with ftok(path, id)) rather than file-descriptor-based names — keys are fragile and prone to collision. - Not cleaned up automatically when a process exits (shared memory segments persist until explicitly removed via shmctl(IPC_RMID) or reboot). - Cannot be selected on with poll/epoll/select. - Limited introspection compared to /proc.

Inspection:

ipcs -a          # show all SysV IPC objects
ipcs -m          # shared memory segments
ipcs -q          # message queues
ipcs -s          # semaphore sets
ipcrm -m <shmid> # remove a shared memory segment

SysV IPC is still encountered in: database shared memory (old PostgreSQL, Oracle), legacy financial trading systems, and any C code written before POSIX IPC was available.


POSIX Shared Memory

The fastest IPC mechanism: once established, both processes read/write the same physical pages with no kernel involvement in the data path.

// Creator:
int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0644);
ftruncate(fd, 4096);
void *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);   // can close fd; mapping remains

// Consumer:
int fd = shm_open("/my_shm", O_RDWR, 0);
void *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Shared memory is backed by a tmpfs file in /dev/shm. Both processes' mmap calls map the same physical pages. A write by one process is immediately visible to the other — no copies, no kernel transitions.

Synchronization: shared memory alone is insufficient. Add a synchronization primitive in the shared region: POSIX unnamed semaphore (sem_init(ptr, 1 /* pshared */, 1)), futex-based mutex, or an atomic (__atomic_*) lock-free structure.

Process A                     Process B
┌──────────────────────┐      ┌──────────────────────┐
│ virtual address:     │      │ virtual address:      │
│   ptr → [0x7f...]    │      │   ptr → [0x7e...]     │
└──────────┬───────────┘      └──────────┬────────────┘
           │                             │
           └─────────────┬───────────────┘
                         ▼
                  ┌─────────────────┐
                  │ Physical pages  │
                  │ (tmpfs: /dev/shm│
                  │  /my_shm)       │
                  └─────────────────┘

AF_UNIX Sockets

Unix domain sockets are the IPC workhorse of modern Linux systems. Unlike pipes, they are bidirectional (SOCK_STREAM), support message boundaries (SOCK_DGRAM, SOCK_SEQPACKET), and can transmit ancillary data — including file descriptors (SCM_RIGHTS) and credentials (SCM_CREDENTIALS).

Types: - SOCK_STREAM: byte stream, bidirectional, like TCP but without network stack overhead - SOCK_DGRAM: message boundaries preserved, no connection, unordered (but same-host delivery is reliable in practice) - SOCK_SEQPACKET: message boundaries + ordering + connection — the best of both worlds

// Server (abstract socket — no filesystem entry):
int sfd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
struct sockaddr_un addr = { .sun_family = AF_UNIX, .sun_path = "\0my_service" };
bind(sfd, (struct sockaddr*)&addr, sizeof(sa_family_t) + 1 + strlen("my_service"));
listen(sfd, 128);

Passing file descriptors via SCM_RIGHTS:

// Send fd over Unix socket:
struct msghdr msg = { ... };
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
memcpy(CMSG_DATA(cmsg), &fd_to_send, sizeof(int));
sendmsg(sock, &msg, 0);

This is how: - systemd passes pre-bound sockets to services (socket activation) - Container runtimes pass network namespace fds to container processes - Wayland compositors pass GPU buffer fds to clients - dbus-broker passes message fds for zero-copy delivery

Real-world users: systemd (socket activation via /run/systemd/private/), D-Bus (system bus at /run/dbus/system_bus_socket), Docker daemon (/var/run/docker.sock), containerd (/run/containerd/containerd.sock), X11 (abstract socket @/tmp/.X11-unix/X0).


memfd_create: Anonymous File-Based IPC

memfd_create(2) (Linux 3.17) creates an anonymous file backed by tmpfs. The file descriptor is the only reference — no filesystem path exists.

int fd = memfd_create("my_buffer", MFD_CLOEXEC | MFD_ALLOW_SEALING);
ftruncate(fd, size);
void *ptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// Share fd with another process via SCM_RIGHTS over Unix socket
// Recipient calls mmap() on the received fd

Sealing (MFD_ALLOW_SEALING): once a memfd is sealed, certain operations are permanently forbidden. Seals are set via fcntl(fd, F_ADD_SEALS, seals): - F_SEAL_WRITE: no more writes (makes the content immutable) - F_SEAL_SHRINK / F_SEAL_GROW: cannot resize - F_SEAL_SEAL: cannot add more seals

Use case: a server generates a response, seals it as read-only, sends the fd to a client via SCM_RIGHTS. The client mmap's the fd and reads it at memory speed — no copy. Wayland's wl_shm protocol uses this for buffer passing between compositors and clients.


File-Descriptor-Based Event Primitives

Linux 2.6+ provides three fds that integrate with epoll for event-driven programming:

eventfd(2) — a counter with read/write semantics:

int efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
// Increment counter:
uint64_t val = 1;
write(efd, &val, 8);
// Wait for non-zero (blocking read):
uint64_t count;
read(efd, &count, 8);  // reads and resets counter atomically

Used by: io_uring (completion notification), futex-based condition variable implementations, and Docker's libcontainer for process synchronization.

timerfd_create(2) — a timer as a file descriptor:

int tfd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC | TFD_NONBLOCK);
struct itimerspec ts = { .it_value = {1, 0}, .it_interval = {1, 0} };
timerfd_settime(tfd, 0, &ts, NULL);
// tfd becomes readable every second; read returns number of expirations

signalfd(2) — signals as a file descriptor (covered in 04-signals.md).

All three integrate naturally with epoll_wait(), enabling a single-thread event loop that handles I/O, timers, and signals without threads or signal handlers.


Historical Context

Pipes are the oldest UNIX IPC (1971). The | pipe operator in shells was one of the defining innovations of UNIX, allowing the composition of small programs into processing pipelines — the "tools and pipes" philosophy.

System V IPC was introduced in Unix System V (AT&T, 1983) and brought from the UNIX world into BSD and Linux. It was widely used until POSIX IPC standardized a cleaner interface in the early 1990s.

AF_UNIX sockets appeared in BSD 4.2 (1983). The ability to pass file descriptors (SCM_RIGHTS) was added in BSD 4.3 and is one of the most powerful features of the Unix socket API.

memfd_create and timerfd/eventfd/signalfd are all Linux-specific extensions that emerged in the 2.6.x era (2005–2009), reflecting the shift toward fd-based, epoll-compatible interfaces.


Production Examples

Check SysV IPC accumulation (common production issue):

ipcs -m | awk 'NR>3 {print $2, $6}' | sort -k2 -rn | head
# If nattch == 0 for many segments, they're leaked orphans
ipcs -m | awk '$6 == 0 {print "ipcrm -m", $2}' | sh

Unix socket connection count:

ss -x | grep /var/run/docker.sock | wc -l

Pipe buffer size tuning for high-throughput pipelines:

# In C, after creating a pipe:
fcntl(pipefd[1], F_SETPIPE_SZ, 1048576);  # 1 MB pipe buffer
# Check max:
cat /proc/sys/fs/pipe-max-size

memfd for zero-copy IPC benchmark:

# Using memfd to share 100MB between two processes via fd passing:
# Measure: mmap + seal + pass fd via Unix socket + mmap in receiver
# vs: write() + read() through a pipe
# Expected: memfd approach is 10-100x faster for large payloads

Debugging Notes

  • Pipe full / blocked writer: cat /proc/PID/fdinfo/N (where N is the write-end fd number) shows pos:, flags:, and for pipes: pipe-buf-size. Use lsof +p PID to find which pipe is full.
  • SysV shm not cleaning up: use ipcs -m to find orphaned segments. The nattch column (number of attached processes) being 0 with a large segsz indicates an orphan.
  • AF_UNIX socket permission errors: check ownership and permissions of the socket file. Abstract sockets (path starts with \0) have no filesystem entry — permissions are handled differently (peer credentials via SO_PEERCRED).
  • fd passing failure: sendmsg with SCM_RIGHTS fails silently if the receiver's fd table is full (EMFILE). Check the receiver's RLIMIT_NOFILE and current fd count.
  • Tracing IPC: strace -e trace=read,write,recvmsg,sendmsg -p PID shows all pipe/socket activity. bpftrace -e 'tracepoint:syscalls:sys_enter_pipe { ... }' for production tracing.

Security Implications

  • Pipe data visibility: /proc/PID/fd/N shows the pipe endpoint, but the data in the kernel pipe buffer is not directly readable from userspace (unlike procfs memory maps). However, if you can ptrace the writing process, you can intercept the data.
  • AF_UNIX credential spoofing: SCM_CREDENTIALS lets a sender pass arbitrary pid/uid/gid. Always use SO_PEERCRED (automatically populated by the kernel) instead of trusting SCM_CREDENTIALS from the peer. systemd uses SO_PEERCRED to identify which service is connecting to its management socket.
  • POSIX shm and world-readable files: shm_open("/foo", O_CREAT, 0644) creates a world-readable shared memory object in /dev/shm. Any process on the system can map it and read (or write, if writable) its contents.
  • memfd sealing as a security primitive: a sealed MFD_ALLOW_SEALING + F_SEAL_WRITE memfd can be passed to an untrusted process that needs read access to data without being able to modify it. Chrome uses memfd-based IPC with sealing for renderer/GPU process communication.

Performance Implications

  • Copy count: pipes require two copies (user→kernel pipe buffer, kernel→user). Shared memory requires zero copies after setup. For large payloads (>1 MB), shared memory is dramatically faster.
  • Synchronization cost: shared memory's zero-copy advantage disappears if you use a mutex or semaphore that causes a futex syscall. Use lock-free atomics or spin loops for extremely latency-sensitive shared-memory IPC.
  • Unix socket vs. loopback TCP: AF_UNIX sockets skip the entire TCP/IP stack (no header parsing, no checksum, no port table lookup). Typical throughput is 2–5x higher than loopback TCP for small messages.
  • epoll + eventfd/timerfd: polling a single fd via epoll_wait is essentially free (~500 ns) compared to the overhead of thread synchronization. The fd-based event primitives enable efficient single-threaded event loops without OS-level thread costs.

Failure Modes

Failure Symptom Cause
Broken pipe SIGPIPE / EPIPE All readers closed before write
Pipe deadlock Both processes blocked Writer waiting on full pipe; reader waiting on writer
SysV shm orphan Growing shm usage after process exit No shmctl(IPC_RMID) called
AF_UNIX ECONNREFUSED Connection refused Socket file exists but nothing is listening (server crashed)
memfd permission denied EACCES on mmap Wrong flags in shm_open / memfd_create
FD table full on fd-pass EMFILE from recvmsg Receiver's fd table full; fd not received

Modern Usage

io_uring and IPC: io_uring (Linux 5.1+) uses eventfd for completion notification and internally uses a shared ring buffer between user and kernel space — itself a form of shared memory IPC with careful memory ordering.

Wayland: the Wayland display protocol uses AF_UNIX sockets for all compositor↔client communication, SCM_RIGHTS for GPU buffer fd passing, and memfd with sealing for shared pixel buffers. It is a textbook modern Unix IPC design.

D-Bus → dbus-broker: the original dbus-daemon used AF_UNIX sockets but performed a message copy in the broker. dbus-broker (2017) uses memfd-based zero-copy message passing for large messages, maintaining AF_UNIX for control messages.


Future Directions

  • io_uring IPC: proposals for io_uring-native IPC that would allow a process to submit a "receive from shared ring buffer" operation as an io_uring SQE, completing it asynchronously when data is available — bypassing epoll entirely.
  • Landlock and IPC access control: the Landlock LSM (Linux 5.13) currently covers filesystem access. Extensions to cover AF_UNIX socket connectivity are in discussion, enabling fine-grained process isolation policies without full seccomp complexity.
  • Cross-namespace pipes: current pipe semantics require shared mount/filesystem access for FIFOs; proposals for kernel-mediated cross-namespace byte channels would simplify container-to-host IPC.

Exercises

  1. Pipe throughput benchmark: write two programs (producer/consumer) communicating via a pipe. Measure throughput for message sizes of 4 B, 4 KB, 64 KB, 1 MB. At what size does the pipe's 65536-byte buffer become a bottleneck? Then use fcntl(F_SETPIPE_SZ, 1048576) and repeat.

  2. Zero-copy IPC: implement a benchmark comparing three methods for transferring a 100 MB payload from process A to process B: (a) write()/read() on a pipe, (b) POSIX shared memory with a mutex handshake, (c) memfd + SCM_RIGHTS fd passing

  3. mmap. Plot latency and throughput for each.

  4. AF_UNIX fd passing: write a server that opens /etc/passwd (or any file), then passes the open fd to a client via SCM_RIGHTS. The client reads from the received fd without knowing the filename. Verify with lsof -p client_pid.

  5. SysV IPC audit: on a test Linux system, write a C program that uses shmget to create a shared memory segment without ever calling shmctl(IPC_RMID). Run it 100 times and observe ipcs -m accumulating orphaned segments. Write a cleanup script using ipcrm.

  6. eventfd-based task queue: implement a thread-safe work queue using a shared circular buffer and an eventfd for notification. One producer thread writes work items; one consumer thread uses epoll on the eventfd to wake up. Measure scheduling latency (time from eventfd write to consumer wakeup).


References

  • fs/pipe.c — kernel pipe implementation
  • ipc/mqueue.c — POSIX message queue
  • ipc/shm.c, ipc/msg.c, ipc/sem.c — System V IPC
  • mm/shmem.c — tmpfs backing for POSIX shm and memfd
  • net/unix/af_unix.c — AF_UNIX socket implementation
  • fs/eventfd.c, fs/timerfd.c, fs/signalfd.c
  • Kerrisk, The Linux Programming Interface — Chapters 44 (pipes), 52 (POSIX MQ), 48–50 (SysV IPC), 54 (POSIX shm), 57 (AF_UNIX)
  • man 2 pipe2, man 3 mq_open, man 2 shmget, man 3 shm_open, man 2 memfd_create, man 2 eventfd, man 2 timerfd_create
  • man 7 unix — AF_UNIX socket details, SCM_RIGHTS, SO_PEERCRED
  • Wayland protocol specification — wl_shm and buffer passing
  • LWN: "Sealed files with memfd_create()" (2014)