06 — Race Condition Detection

Technical Overview

A data race is a concurrent memory access where two or more threads access the same memory location, at least one write, with no synchronization between them. The result is undefined behavior — not just "unpredictable behavior" but literally anything the compiler and CPU choose to do, including silent data corruption, memory safety violations, and security vulnerabilities. Data races are among the most dangerous software bugs because they are intermittent (they only manifest when threads interleave in a specific order), often non-reproducible, and capable of causing security vulnerabilities in security-critical code.

The Linux kernel's lockdep validator is among the most sophisticated race-detection tools in any production software system: it proves at runtime that the kernel's lock acquisition ordering is consistent, detecting potential deadlocks before they occur in production. In userspace, ThreadSanitizer (TSan) provides dynamic detection of data races during testing.

Prerequisites

Understanding of threading models (pthreads, kernel tasks)
Familiarity with synchronization primitives (mutex, spinlock, RCU, atomic operations)
Knowledge of the C memory model (C11) or C++ memory model
For lockdep: familiarity with the Linux kernel's locking primitives

Core Content

What Makes a Data Race

The C/C++ standard definition (ISO C11 §5.1.2.4):

Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location. The execution of a program contains a data race if it contains two conflicting evaluations in different threads, at least one of which is not atomic, and neither happens before the other according to the happens-before relation. Any such data race results in undefined behavior.

// Classic data race (WRONG):
int counter = 0;  // shared variable

void thread_a() { counter++; }  // read-modify-write: 3 non-atomic operations
void thread_b() { counter++; }  // same

// Possible execution interleaving:
// Thread A: read counter (0) → ...
// Thread B: read counter (0) → increment (1) → write counter (1)
// Thread A: ... increment (1) → write counter (1)
// Result: counter = 1, not 2. One increment was lost.

// CORRECT (atomic):
#include <stdatomic.h>
_Atomic int counter = 0;
void thread_a() { atomic_fetch_add(&counter, 1); }
void thread_b() { atomic_fetch_add(&counter, 1); }

// CORRECT (mutex-protected):
pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;
void thread_a() { 
    pthread_mutex_lock(&mu);
    counter++;
    pthread_mutex_unlock(&mu); 
}

ThreadSanitizer (TSan)

TSan (Serebryany et al., Google, 2009 for GCC; 2011 for Clang) detects data races through compile-time instrumentation and shadow memory. For every memory access, TSan records: the access location, the time (logical vector clock), and the thread ID. When two conflicting accesses are detected without a happens-before relationship, a race is reported.

Shadow memory layout: TSan uses a 64-byte shadow cell per 8-byte application word, recording access history for the last N accesses. This requires 8x more memory than the program itself.

# Build with TSan
gcc -fsanitize=thread -fno-omit-frame-pointer -g -o myprogram myprogram.c -lpthread
# or Clang:
clang -fsanitize=thread -fno-omit-frame-pointer -g -o myprogram myprogram.c

# Run — TSan reports races as they occur:
./myprogram

# Example output:
==================
WARNING: ThreadSanitizer: data race (pid=12345)
  Write of size 4 at 0x000102345678 by thread T2:
    #0 update_counter counter.c:45 (myprogram+0x401234)
    #1 worker_thread worker.c:87 (myprogram+0x402345)
    #2 pthread_start_thread ... (myprogram+0x...)

  Previous read of size 4 at 0x000102345678 by thread T1:
    #0 read_counter counter.c:38 (myprogram+0x401189)
    #1 reporter_thread reporter.c:54 (myprogram+0x402678)
    #2 pthread_start_thread ...

  Location is global 'request_count' (counter.c:12 in myprogram+0x...)

  Thread T2 (tid=12347, running) created by main thread at:
    #0 pthread_create ... 
    #1 main main.c:23 (myprogram+0x400f00)

  Thread T1 (tid=12346, running) created by main thread at:
    #0 pthread_create ...
    #1 main main.c:18 (myprogram+0x400ec0)

SUMMARY: ThreadSanitizer: data race counter.c:45 in update_counter
==================

TSan runtime options:

# Show verbose race information
TSAN_OPTIONS=verbosity=2 ./myprogram

# Halt on first race
TSAN_OPTIONS=halt_on_error=1 ./myprogram

# History size for access tracking (more = better reports, more memory)
TSAN_OPTIONS=history_size=7 ./myprogram  # 0-7, default 3

# Second-chance algorithm (reduces false positives)
TSAN_OPTIONS=second_deadlock_stack=1 ./myprogram

# Write report to file
TSAN_OPTIONS=log_path=/tmp/tsan ./myprogram

TSan for Go: Go's runtime has TSan built in. Enable with:

go test -race ./...                          # run all tests with race detector
go build -race ./cmd/myserver               # build with race detector
go run -race main.go                         # run with race detector

# Output:
# ==================
# WARNING: DATA RACE
# Write at 0x00c000018120 by goroutine 7:
#   main.updateCounter()
#       /home/user/main.go:34 +0x44
#
# Previous read at 0x00c000018120 by goroutine 6:
#   main.readCounter()
#       /home/user/main.go:27 +0x34
# ==================

TSan for Java: Java has no standard TSan. Use alternatives: Java's java.util.concurrent tools audit (RV-Monitor), or the JVM's built-in -XX:+ThreadSanitizer in some JVMs.

KTSAN: Kernel ThreadSanitizer

KTSAN (also called "Kernel Thread Sanitizer") applies TSan to the Linux kernel. It is experimental and has not been merged to mainline. It adds per-memory-word shadow data to track access history in the kernel. The main challenge: the kernel's lock primitives (spinlock, mutex, RCU) must be fully annotated so KTSAN knows what constitutes a synchronization point.

# Build a kernel with KTSAN (experimental, out-of-tree patches required)
# CONFIG_KTSAN=y (not in mainline — requires KTSAN patch series)
# Once enabled, KTSAN reports appear in dmesg like KASAN reports

KTSAN is primarily used by kernel developers who can apply the patches to their development environment. For production kernel race detection, lockdep is the primary tool.

lockdep: The Linux Kernel Lock Validator

lockdep is the most sophisticated runtime race/deadlock detection tool in any production codebase. It does not detect data races directly; instead, it validates that the kernel's locking discipline is consistent — meaning it cannot lead to deadlocks. It operates by tracking lock acquisition order across all CPUs and all execution contexts (process, softirq, hardirq).

lockdep approach: track for every lock class (unique lock type + initialization point), the complete set of locks held when this lock is acquired. If lock A is ever acquired while holding lock B, and lock B is ever acquired while holding lock A (directly or transitively), lockdep reports a potential deadlock.

# Enable lockdep (kernel build config)
# CONFIG_PROVE_LOCKING=y   (main lockdep config)
# CONFIG_DEBUG_LOCKDEP=y   (more validation, more overhead)
# CONFIG_LOCK_STAT=y        (lock contention statistics)

# Check if lockdep is active on running kernel
cat /proc/lockdep_stats
# Output:
# lock-classes:                         2321
# direct dependencies:                 10234
# indirect dependencies:               45678
# all direct dependencies are valid
# connected dependencies:            198765
# dependency chains:                    5432
# in-hardirq chains:                      45
# in-softirq chains:                     234
# softirq-irq-lock chains:                12
# ...
# Total lock depth violation warnings:  0   ← 0 means no violations detected
# Total lock nesting violations:         0

Reading a lockdep report:

====================================
WARNING: possible circular locking dependency detected
5.15.0-76-generic #83 Tainted: G

socket_operations/12345 is trying to acquire lock:
ffffffff82a12340 (sk->sk_lock.slock){+.+.}, at: lock_sock+0x1c/0x40
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                  (lock class name)  (lock flags) (call site)

but task is already holding lock:
ffffffff82b34560 (rtnl_mutex){+.+.}, at: rtnl_lock+0x16/0x20

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.}:
       lock_acquire+0xd0/0x290
       __mutex_lock+0x58/0x990
       rtnl_lock+0x16/0x20
       dev_close+0x23/0x40      ← dev_close acquired rtnl_mutex while holding sk_lock

-> #0 (sk->sk_lock.slock){+.+.}:
       lock_acquire+0xd0/0x290
       _raw_spin_lock+0x29/0x40
       lock_sock+0x1c/0x40      ← lock_sock acquires sk_lock while holding rtnl_mutex

stack backtrace:
 CPU: 2 PID: 12345 Comm: socket_operations
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x59
  check_noncircular+0xd9/0x100
  __lock_acquire+0x1271/0x1d90
  lock_acquire+0xd0/0x290
  ...
====================================

This report says: Thread A acquired rtnl_mutex while holding sk_lock (via dev_close). Thread B (this thread) is trying to acquire sk_lock while holding rtnl_mutex. If Thread A also tries to acquire rtnl_mutex (which it does, in a different code path), we have a classic A→B / B→A deadlock scenario. lockdep detected this at runtime without a deadlock actually occurring.

Key insight: lockdep detects POTENTIAL deadlocks (based on observed acquisition orders) before they actually happen in production. This is its power — it finds deadlock bugs at system boot or during testing, not after the system has hung in production.

Annotation macros for custom locking:

// In kernel code, if you need to tell lockdep about custom locking:
#include <linux/lockdep.h>

// Declare that you're simulating acquiring a lock (for custom primitives)
lockdep_assert_held(&my_lock);      // assert lock is held
lockdep_assert_not_held(&my_lock);  // assert lock is not held

// For RCU:
lockdep_assert_held_read(&my_rwlock);
lockdep_assert_held_write(&my_rwlock);

LOCK_STAT: Lock Contention Statistics:

# View lock contention stats (requires CONFIG_LOCK_STAT=y)
cat /proc/lock_stat | head -30
# Output:
# lock_stat version 0.4
# ---------------------------------------------------------------
# class name       con-bounces   contentions   waittime-min  waittime-max  waittime-total  waittime-avg
# ---------------------------------------------------------------
# &sk->sk_lock.slock:  12345678    23456789    1.00          10000.00     9876543210.00     422.34
# rtnl_mutex:            123456      234567    5.00          50000.00      987654321.00    4211.23

This shows which locks are most contended and the average wait time — the data you need to find synchronization bottlenecks.

Dynamic Analysis Limitations

Both TSan and lockdep are dynamic tools — they only detect races that actually occur during the monitored execution. A race that only manifests with a specific thread interleaving may never be triggered during test runs.

Coverage implications: - A data race that occurs only at >10,000 req/s may never be triggered during tests - A deadlock that requires three threads to hit specific code points simultaneously may never be triggered in a single test run - Schedule-dependent races are particularly hard: a race may only manifest on a machine with a specific number of cores, or only when a certain kernel version schedules threads differently

Improving coverage: 1. Stress testing with TSan: run tests under load for extended periods 2. Random scheduling (sched_yield() injection): randomly yield inside tight loops to create more interleaving opportunities 3. TSAN_OPTIONS=force_seq_cst_atomics=1: forces all atomics to seq_cst, exposing races hidden by relaxed ordering 4. Thread Sanitizer with sleep injection: inject random sleep(0) calls to increase thread interleaving 5. Kernel: CONFIG_DEBUG_SPINLOCK=y: adds extra checks to spinlock operations

Static Analysis for Races

Static analysis tools can find races without running the program:

Sparse (kernel static analysis): the Linux kernel uses sparse for type checking. Lock annotations via __acquires(lock), __releases(lock):

void my_function(void) __acquires(my_lock);  // tells sparse this function acquires my_lock

# Run sparse on kernel source
make C=1 CHECK="sparse" drivers/mydriver/

Coverity: commercial static analyzer that includes race detection. Used by the Linux kernel project (kernel.org scans with Coverity).

ThreadSanitizer static analysis (experimental): TSan's instrumentation can sometimes detect races via static analysis of control flow graphs, without execution. Not widely deployed.

Dirty COW: CVE-2016-5195

Dirty COW is a classic example of a race condition security vulnerability in the Linux kernel. It allowed local users to gain write access to read-only memory-mapped files, including setuid binaries (enabling privilege escalation to root).

The race:

The bug was in the Copy-On-Write (COW) mechanism for private memory mappings.
When a process writes to a read-only page mapped with MAP_PRIVATE:
  1. Kernel faults (COW), gets a write lock on the page table
  2. Kernel copies the read-only page to a new writable page
  3. Kernel remaps the new page as writable for this process

The race: between steps 2 and 3, another thread could advise the kernel
to drop the mapping (madvise(MADV_DONTNEED)), causing the page table to revert
to the read-only mapping. If the write operation then retried (after the race),
it would write DIRECTLY to the read-only page, bypassing COW.

Timeline:
  Thread 1 (write to /proc/self/mem):
  ┌─────────────────────────────────────────────────────────────┐
  │ write() → fault → copy_to_user → mmap_sem.read_lock        │
  │                                  ↓                         │
  │ time to write                    get_user_pages()          │
  │                                  ↓                         │
  │ ... retry ...                    GUP returns dirty page     │
  └─────────────────────────────────────────────────────────────┘
  Thread 2 (madvise to drop):
  ┌─────────────────────────────────────────────────────────────┐
  │ madvise(MADV_DONTNEED, addr) → removes dirty pages         │
  │                                (drops COW copy)             │
  └─────────────────────────────────────────────────────────────┘

Race outcome: write goes to original read-only page

This race existed since at least Linux 2.6.22 (2007) and was fixed in 4.8.3 (October 2016). The fix added a try_again label and a check that the GUP-returned page is still the correct page after acquiring the write lock.

Dirty COW demonstrates why race conditions require dynamic detection (the lockdep would not have caught this — it's not a lock-ordering issue but a TOCTOU race between userspace and kernel), and why security races are particularly critical.

Historical Context

The formal study of data races in concurrent programs dates to Leslie Lamport's work on happens-before relations (1978) and the development of Petri nets. Practical race detection tools emerged in the 1990s:

Eraser (Savage et al., 1997) introduced the "lockset algorithm": track which locks are held when each variable is accessed. If the intersection of locksets for all accesses to a variable becomes empty, there's a potential race. This is the conceptual predecessor to TSan.

Helgrind (Valgrind-based, 2008) implemented an improved happens-before detector.

ThreadSanitizer (Serebryany et al., Google, 2009) replaced the lockset algorithm with a hybrid vector-clock algorithm for more precise happens-before tracking, dramatically reducing false positives while maintaining performance.

lockdep (Ingo Molnár, 2006, Linux 2.6.17) was a revolutionary addition to the Linux kernel, implementing a formal lock-ordering verifier that runs continuously in debug kernels. Its key insight: instead of detecting actual deadlocks after they occur, prove that lock ordering is consistent to prevent deadlocks before they happen. It discovered hundreds of latent deadlocks in the Linux kernel in its first year.

Production Examples

# CI: run Go tests with race detector
go test -race -count=100 ./...   # run each test 100 times with race detector
# The -count flag increases the chance of races being triggered

# Find races in kernel during development:
# Boot with CONFIG_PROVE_LOCKING=y and run the affected workload
# lockdep reports appear in dmesg within seconds of the first lock violation

# View lockdep-detected violations
dmesg | grep -A 100 "WARNING: possible circular locking dependency"

# Reset lockdep stats (to check for new violations after a change)
echo 0 > /proc/sys/kernel/lock_stat  # requires CONFIG_LOCK_STAT=y

# Check lockdep is enabled
cat /proc/lockdep | head -20

Debugging Notes

TSan false positive in lock-free code: TSan may report races in correctly-implemented lock-free algorithms that use relaxed atomic operations. The happens-before relationship exists but is expressed through sequentially-consistent fences that TSan may not model correctly for all atomic orderings. Fix: use memory_order_seq_cst throughout, or use TSan's __tsan_acquire/__tsan_release annotation API.

lockdep "too many lock classes": The lockdep hash table has a fixed size. Dynamically-allocated locks that are initialized many times each create new lock classes and can exhaust the table. Symptom: lockdep: MAX_LOCKDEP_KEYS too low! in dmesg. Fix: use lock class keys (DEFINE_MUTEX, not dynamically initialized mutex) to share a class across instances.

TSan reports race in signal handler: Signal handlers can interrupt any point in a program, creating implicit concurrency. TSan treats signal handlers as separate threads. Fix: either make signal handlers only call async-signal-safe functions, or use sigprocmask to block signals during critical sections.

Race in kernel detected by lockdep but not a "real" deadlock: lockdep's theorem proves potential deadlock, not actual deadlock. The code path that would create the deadlock may be unreachable in practice. However, the lock ordering violation should still be fixed — future code changes may make the path reachable.

Security Implications

Race conditions (TOCTOU — Time of Check to Time of Use) in setuid programs and kernel code are exploitable for privilege escalation (Dirty COW being the canonical example).
TSan and lockdep in production code: while these tools are not for production due to overhead, their findings in development prevent security vulnerabilities from reaching production. Enabling -race in Go CI is a standard security practice.
lockdep in production-ish kernels: CONFIG_PROVE_LOCKING=y can be enabled in staging environments to catch races before production. Overhead: ~15-20% — acceptable for staging, too high for production.
Lock-free algorithms in security-critical code deserve extra scrutiny. The C11/C++11 memory model is complex; even experts make mistakes with relaxed atomics.

Performance Implications

TSan: 5-15x CPU slowdown, 5-10x memory increase. Suitable for CI, unit tests, integration tests. Not for load tests or production.
Go race detector: 2-20x slowdown depending on mutex contention. Go's standard recommendation: enable in all CI tests, disable in production binaries.
lockdep: 15-20% overhead. Acceptable for staging. Disabled in production kernels (not compiled in).
KTSAN (experimental): similar to TSan, ~5-10x overhead on kernel operations.
LOCK_STAT: 5-10% overhead from counter updates. Can be enabled in production for contention analysis if overhead is acceptable.

Failure Modes and Real Incidents

Go map race causing HTTP/2 server corruption (production incident, anonymous): A Go HTTP/2 server was storing connection state in a shared map, protected by a mutex in most paths. A goroutine in the h2 library accessed the map without the lock in a specific connection upgrade path. The race caused occasional nil pointer panics in production. Caught by go test -race after reproducing with a concurrent test load. Fix: move the shared map access inside the lock.

Linux kernel network driver race causing packet corruption (2019, i40e driver): The Intel i40e NIC driver had a race between the TX completion IRQ handler and the driver reset path. Both paths accessed the same TX ring buffer structure without locking. Under specific conditions (reset during high TX load), the ring pointers could be corrupted, causing the NIC to DMA to wrong memory locations (potentially kernel memory). Discovered by lockdep in a debug kernel + syzbot fuzzing. Fix: add proper locking around the shared ring pointer access.

Dirty COW exploited in production (2016): Within hours of CVE-2016-5195 publication, exploit code appeared that used the race to write to /etc/passwd or replace a setuid binary with a malicious version. Systems not yet patched were compromised. The race had existed for ~9 years undetected. TSan would have detected it if any security researcher had run the affected kernel code paths under a TSan-instrumented kernel.

Modern Usage

Go race detector: enabled in all production CI pipelines for Go projects at major companies (Cloudflare, Datadog, etc.). Finding races in Go's standard library continues — dozens per year.
TSan in Chrome CI: Google's Chrome build infrastructure runs certain tests under TSan. Found 300+ races since 2011.
lockdep in syzbot: Google's syzbot (syzkaller fuzzer) runs Linux kernel fuzzing with CONFIG_PROVE_LOCKING=y + KASAN + KCSAN. Reports all lock-ordering violations found during fuzzing to LKML automatically.
KCSAN (Kernel Concurrency Sanitizer): recently-merged alternative to KTSAN, a data-race detector for the kernel based on access trapping via breakpoints. Experimental but progressing toward production use.

Future Directions

KCSAN maturation: KCSAN is the path toward production-viable kernel data race detection. As its overhead decreases and false positive rate drops, it may become suitable for staging kernels.
Formal verification of concurrent algorithms: Spin model checker, TLA+ model checking for concurrency — increasingly used to prove correctness of distributed protocols and lock-free data structures before implementation.
Hardware race detection (Intel TSX + watchpoints): using hardware transactional memory as a race detector — transactional regions that conflict with concurrent accesses abort instead of racing.

Exercises

TSan in CI: Take an existing multi-threaded C++ or Go program. Add a deliberate data race (concurrent access to an unprotected global). Enable TSan in your build. Write a test that reliably triggers the race (run 1000 iterations with 2 threads). Verify TSan catches it and the report points to the correct source lines.
lockdep experiment: On a Linux kernel development VM with CONFIG_PROVE_LOCKING=y, write a kernel module that acquires two locks in inconsistent order (lock A then B in one path, lock B then A in another path). Load the module. Exercise both paths. Observe the lockdep warning in dmesg. Decode the dependency chain.
Dirty COW reproduction: (In a VM with an unpatched Linux kernel ≤4.8.2) Reproduce the Dirty COW exploit to write to a read-only file. Analyze the race condition in the kernel source (mm/memory.c, cow_user_page() and related functions). After reproducing, patch the kernel source with the fix and verify the exploit no longer works.
lock contention analysis: On a production-similar system running a multithreaded application (e.g., nginx with multiple worker threads), enable CONFIG_LOCK_STAT=y and exercise the system under load. Read /proc/lock_stat and identify the top-3 most-contended locks. For each, explain why the lock is contended and propose an alternative design (lock-free, sharded locks, etc.) that would reduce contention.
Go concurrent map race: Write a Go program that uses a map[string]int from 10 goroutines simultaneously (reading and writing). Run it 1000 times without -race. Does it ever panic? Run it once with -race. How quickly does it detect the race? Now fix it using sync.Map or a mutex-protected wrapper. Verify -race shows no errors.

References

Serebryany, Konstantin and Iskhodzhanov, Timur. "ThreadSanitizer — data race detection in practice." WBIA 2009.
Savage, Stefan et al. "Eraser: A Dynamic Data Race Detector for Multithreaded Programs." ACM TOCS 1997.
Molnár, Ingo. "Lockdep: The kernel lock validator." LWN.net, 2006. https://lwn.net/Articles/185666/
Linux Kernel Documentation: Documentation/locking/lockdep-design.rst
CVE-2016-5195 (Dirty COW) analysis: https://dirtycow.ninja/
Elver, Marco. "KCSAN: The Kernel Concurrency Sanitizer." LWN.net, 2019.
Go Data Race Detector: https://golang.org/doc/articles/race_detector
ThreadSanitizer project: https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm
Lamport, Leslie. "Time, Clocks, and the Ordering of Events in a Distributed System." CACM 1978.