Skip to content

Section 10: Synchronization

Purpose and Scope

Synchronization is the discipline of coordinating concurrent execution units — threads, processes, or processors — to produce correct, predictable outcomes when shared state is involved. This section covers every layer of the synchronization hierarchy: from hardware primitives like compare-and-swap and memory fences, through kernel primitives like futex and spinlocks, to high-level constructs like condition variables, read-write locks, and barriers. It extends into the frontier of lock-free and wait-free algorithms, transactional memory, and the full taxonomy of concurrency hazards (deadlock, livelock, starvation, and race conditions).

The scope is intentionally broad: correctness properties, performance trade-offs, implementation strategies, and real-world failure modes are all treated as first-class concerns.


Prerequisites

  • Section 02 (CPU Architecture): pipeline stages, cache coherence, memory hierarchy
  • Section 03 (Operating System Fundamentals): process/thread model, kernel/user mode
  • Section 05 (Scheduling): thread states, context switching, CPU affinity
  • Familiarity with C or C++ at the systems level
  • Basic understanding of assembly for x86-64 (LOCK prefix, MFENCE, CMPXCHG)

Learning Objectives

Upon completing this section you will be able to:

  1. Explain why memory reordering at the CPU and compiler level makes naive shared-memory programming incorrect.
  2. Implement a correct mutex using only atomic compare-and-swap and a futex syscall.
  3. Articulate the differences between spin-based, blocking, and adaptive locking and select appropriately for a given workload.
  4. Design lock-free data structures (stacks, queues) and reason formally about their progress guarantees.
  5. Identify deadlock, livelock, and starvation conditions from code and from runtime traces.
  6. Explain the C11/C++11 memory model and apply memory_order annotations correctly.
  7. Describe hardware transactional memory (HTM) semantics and its fallback requirements.
  8. Reason about ABA problems, priority inversion, and false sharing.

Architecture Overview

  User Space                    Kernel Space                   Hardware
 ┌─────────────────────┐       ┌──────────────────────┐       ┌──────────────┐
 │  C++11 std::mutex   │       │  futex (fast path)   │       │  CMPXCHG     │
 │  pthread_mutex      │──────>│  wait queue          │──────>│  LOCK XADD   │
 │  std::atomic<T>     │       │  priority inheritance │       │  MFENCE      │
 │  lock-free structs  │       │  PI mutexes           │       │  TSX/RTM     │
 └──────────┬──────────┘       └──────────────────────┘       └──────┬───────┘
            │                                                         │
            │  fast path (uncontended): no syscall                    │
            │  slow path (contended): FUTEX_WAIT / FUTEX_WAKE         │
            │                                                         │
 ┌──────────▼──────────────────────────────────────────────────────────▼──────┐
 │                      Memory Ordering Layer                                  │
 │  Sequential Consistency ─ Acquire/Release ─ Relaxed ─ Compiler Barrier     │
 │  C11 _Atomic / C++11 std::atomic ─ Linux READ_ONCE / WRITE_ONCE            │
 └────────────────────────────────────────────────────────────────────────────┘
            │
 ┌──────────▼──────────────────────────┐
 │       Cache Coherence (MESI/MOESI)  │
 │  L1$─L2$─L3$ per core              │
 │  Invalidation traffic on contention │
 └─────────────────────────────────────┘

Key Concepts

  • Critical Section: A region of code that must execute atomically with respect to other threads accessing the same data.
  • Mutual Exclusion (Mutex): Guarantee that at most one thread executes the critical section at any time.
  • Spinlock: A mutex whose waiting threads burn CPU cycles polling the lock word rather than sleeping.
  • Semaphore: A generalized integer counter allowing N concurrent holders; binary semaphore equals a mutex.
  • Condition Variable: A mechanism for threads to sleep until a predicate becomes true, always paired with a mutex.
  • Read-Write Lock (rwlock): Allows multiple concurrent readers or one exclusive writer.
  • Barrier: A rendezvous point where all participating threads must arrive before any may proceed.
  • Atomic Operation: An indivisible read-modify-write; hardware ensures no interleaved access.
  • Memory Ordering: Constraints on the order in which loads and stores become visible to other processors.
  • Futex: Linux's "fast userspace mutex" — an atomic word in user memory backed by a kernel wait queue, enabling lock acquisition without a syscall in the common uncontended case.
  • Lock-Free: A progress property guaranteeing that at least one thread makes progress in a finite number of steps, even if others are delayed.
  • Wait-Free: A stronger property guaranteeing every thread makes progress in a bounded number of its own steps.
  • ABA Problem: A compare-and-swap false positive when a value changes A→B→A between read and CAS.
  • Transactional Memory (TM): Optimistic concurrency where a set of memory accesses executes speculatively and commits atomically or aborts on conflict.
  • Priority Inversion: A high-priority thread blocked behind a low-priority thread holding a lock; solved by priority inheritance.
  • False Sharing: Two threads writing different data that reside on the same cache line, causing unnecessary coherence traffic.

Major Historical Milestones

Year Milestone
1965 Dijkstra introduces semaphores and the mutual exclusion problem
1968 Dijkstra states the dining philosophers problem
1972 Hoare introduces monitors and condition variables
1974 Lamport publishes the bakery algorithm (software mutual exclusion without atomics)
1987 Herlihy & Wing define linearizability as a correctness criterion
1990 Herlihy proves the universality of compare-and-swap
1991 Intel 486 ships with CMPXCHG instruction
1993 Herlihy introduces wait-free synchronization and the "consensus hierarchy"
1996 Linux 2.0 introduces kernel spinlocks; SMP support matures
2002 Futex introduced into Linux (Rusty Russell, Hubertus Franke)
2003 Michael & Scott non-blocking queue algorithm widely adopted
2004 Read-Copy-Update (RCU) mainlined into Linux kernel
2009 Intel TSX (Transactional Synchronization Extensions) announced
2011 C++11 and C11 standardize a formal memory model with std::atomic
2013 Intel Haswell ships with RTM/HLE (TSX hardware transactional memory)
2017 TSX errata lead to disabling on many SKUs via microcode
2020 Linux adds futex2 proposal (multi-wait, variable size)

Modern Relevance and Production Use Cases

Database engines (PostgreSQL, MySQL InnoDB, RocksDB) are built on careful lock hierarchies — buffer pool latches, page-level rwlocks, and lock-free freelist structures are tuned obsessively because lock contention is a primary OLTP bottleneck.

High-frequency trading systems use wait-free queues and lock-free ring buffers to pass market data between threads with deterministic latency. A single cache-line bounce can cost hundreds of nanoseconds.

Linux kernel employs the full spectrum: RCU for read-mostly structures (routing tables, inode caches), spinlocks for short critical sections in interrupt context, mutexes for sleeping locks, and sequence locks (seqlocks) for infrequently-written, frequently-read counters.

Go runtime scheduler, Java's java.util.concurrent, and Rust's std::sync all implement adaptive mutexes that spin briefly then park, based on empirical observations that most locks are released quickly.

GPU programming (CUDA, ROCm) requires barrier synchronization across thousands of threads executing in SIMT fashion; __syncthreads() is a barrier within a thread block.


File Map

File Description
01-hardware-primitives.md CAS, LL/SC, LOCK prefix, memory fences, x86 vs ARM atomics
02-memory-ordering.md Memory models, acquire/release/relaxed/SC, C++11 memory_order, compiler barriers
03-spinlocks.md Test-and-set, ticket spinlocks, MCS lock, queued spinlock in Linux
04-mutexes.md Blocking mutexes, adaptive mutexes, pthread_mutex internals
05-futex.md Futex design, FUTEX_WAIT/WAKE, priority inheritance, robust futexes
06-semaphores.md POSIX semaphores, counting semaphores, use cases and pitfalls
07-condition-variables.md Spurious wakeups, wait/signal/broadcast, monitor pattern
08-rwlocks.md Reader/writer locks, read starvation, seqlocks, RCU
09-barriers.md Thread barriers, memory barriers, Linux barrier() and smp_mb()
10-atomic-operations.md std::atomic, fetch_add, compare_exchange_weak vs strong
11-lock-free-data-structures.md Lock-free stack, Michael-Scott queue, hazard pointers, epochs
12-wait-free-algorithms.md Universal construction, wait-free queues, practical considerations
13-transactional-memory.md HTM (RTM/HLE), STM, conflict detection, abort handling
14-deadlock.md Conditions, detection (cycle detection), prevention, avoidance (Banker's)
15-race-conditions.md Data races, TOCTOU, detecting with TSan/Helgrind
16-starvation-livelock.md Progress guarantees, fairness, livelock examples and fixes
17-rcu.md Read-Copy-Update: grace periods, quiescent states, usage in Linux
18-priority-inversion.md Priority inversion, priority inheritance, priority ceiling protocol
19-false-sharing.md Cache line alignment, padding, __cacheline_aligned, profiling

Cross-References

  • Section 02 (CPU Architecture): cache coherence protocols (MESI), out-of-order execution, store buffers
  • Section 03 (OS Fundamentals): kernel vs user mode, syscall overhead
  • Section 05 (Scheduling): priority, real-time scheduling, SCHED_FIFO and priority inversion
  • Section 08 (IPC): pipes, message queues, shared memory synchronization
  • Section 11 (Memory Management): memory barriers and NUMA-aware allocation
  • Section 17 (Distributed Systems): distributed locking, consensus, and their relation to local synchronization primitives
  • Section 18 (Database Internals): MVCC, WAL latching, buffer pool synchronization