Skip to content

Transactional Memory

Overview

Transactional memory (TM) is a concurrency control mechanism that allows programmers to designate regions of code as "transactions" — atomic, isolated, and consistent blocks that either commit all their memory effects or abort and retry as a unit, similar to database transactions. The promise of transactional memory is compelling: replace complex lock hierarchies with intuitive atomic regions, eliminate deadlock by construction, and let the hardware or runtime system handle conflict detection. Hardware transactional memory (HTM) — notably Intel TSX and IBM's POWER HTM — brought this vision to commodity hardware. Reality proved more complicated: Intel TSX had significant bugs, correctness pitfalls, and limited capacity; software transactional memory (STM) added too much overhead for most workloads. Today, TM occupies an important but narrow niche, and the lessons from its partial failure inform modern concurrent programming.

Prerequisites

  • Understanding of CAS and atomic operations
  • Familiarity with lock-based synchronization
  • Basic understanding of database ACID properties
  • Knowledge of cache coherency (MESI)

Core Technical Content

Transactional Memory Concept

A transaction executes a sequence of reads and writes atomically:

// Without TM: explicit locking
pthread_mutex_lock(&account_lock);
account_a -= 100;
account_b += 100;
pthread_mutex_unlock(&account_lock);

// With TM: atomic region
__transaction_atomic {
    account_a -= 100;
    account_b += 100;
}
// Either both happen, or neither — no partial state visible

A transaction commits if no conflict is detected (no other transaction wrote to the same memory locations). It aborts if a conflict is detected and retries automatically (by default). The programmer does not write retry logic.

Properties of Transactions

Atomicity: All or nothing. If a transaction aborts, none of its writes become visible.

Isolation: A transaction's reads and writes appear to other transactions as if they happened at a single instant.

Consistency: (Weaker than databases) The program remains in a consistent state after each successful transaction. No partial states are visible.

Note: TM does NOT provide durability (no persistence to disk). This is a key difference from database transactions.

Hardware Transactional Memory: Intel TSX

Intel TSX (Transactional Synchronization Extensions) added two mechanisms in Haswell (2013):

RTM (Restricted Transactional Memory): Explicit transactional regions using XBEGIN/XEND/XABORT:

#include <immintrin.h>

retry:
    unsigned status = _xbegin();  // start transaction
    if (status == _XBEGIN_STARTED) {
        // transactional region
        shared_data++;
        _xend();  // commit
    } else {
        // aborted: status contains reason
        if (status & _XABORT_RETRY) goto retry;
        // fall back to lock
        pthread_mutex_lock(&fallback_lock);
        shared_data++;
        pthread_mutex_unlock(&fallback_lock);
    }

XBEGIN records a "fallback address" (taken on abort). XEND commits if no conflict. XABORT explicitly aborts with a reason code.

HLE (Hardware Lock Elision): Transparent optimization of existing LOCK-prefixed instructions. The LOCK prefix is "elided" — the CPU speculatively executes the critical section without acquiring the lock. If a conflict occurs, it falls back to the real lock. This requires zero code changes:

; Normal lock (non-transactional):
lock xadd [counter], eax

; HLE: speculative execution, no actual lock
xacquire lock xchg [mutex], eax  ; acquire hint: start transaction
...
xrelease lock xchg [mutex], eax  ; release hint: try to commit

Intel TSX Implementation Details

TSX uses the L1/L2 cache as the transactional buffer. Modified lines in the cache are kept speculative. At commit time, the coherency protocol checks that no other cache has the exclusive copy:

Transaction executing:
  CPU cache: [track read set] [track write set]
  Write set kept in L1/L2 cache as modified but uncommitted

Conflict detection:
  If another CPU requests exclusive access to a line in our read/write set:
    --> Transaction ABORTS

Commit:
  If XEND and no conflicts:
    --> Write set becomes globally visible atomically

RTM Abort Causes

A transaction can abort for many reasons: - Conflict: Another CPU wrote to a line in the read set, or read/wrote a line in the write set. - Capacity overflow: Read or write set exceeds L1/L2 cache capacity (typically ~32KB). - Non-transactional instruction: CPUID, PAUSE, system calls, I/O — any instruction that crosses a transaction boundary. - Interrupt/exception: Any hardware interrupt (timer, NMI, machine check) aborts the transaction. - Explicit XABORT: Programmer-initiated abort. - Debug/trap: Breakpoints, single-step, watchpoints. - XACQUIRE on locked mutex: The lock is actually held (contended case).

The TSX Fallback Problem

Because transactions can abort for many reasons beyond programmer control, every RTM use must have a fallback path:

unsigned status = _xbegin();
if (status == _XBEGIN_STARTED) {
    // Check if fallback lock is held (abort if so)
    if (fallback_lock_held) _xabort(0xff);
    // ... transactional work ...
    _xend();
} else {
    // MUST have a correct non-transactional fallback
    pthread_mutex_lock(&fallback_lock);
    // ... same work ...
    pthread_mutex_unlock(&fallback_lock);
}

The fallback path is serialized via a real lock. The TM only optimizes the common uncontended case.

Intel TSX Bugs and Deprecation

TSX has a troubled history:

Errata TSX bugs (Haswell, 2013): The initial Haswell TSX implementation had correctness bugs causing spurious aborts and incorrect behavior under specific microarchitectural conditions. Intel issued a microcode update that disabled TSX entirely on affected Haswell processors.

TAA (TSX Asynchronous Abort, CVE-2019-11135): A Meltdown-like speculative execution side channel via TSX's asynchronous abort mechanism. When a transaction aborts due to an asynchronous condition, the CPU may briefly expose data from other security domains (other VMs, the kernel). Mitigated by disabling TSX (tsx=off kernel parameter) or microcode updates.

TSX removal in Alder Lake (2021): Intel removed TSX from Alder Lake (12th generation) Core i3/i5/i7 desktop processors. TSX is retained only in some Xeon server processors. This effectively ends TSX as a general-purpose mechanism.

GCC __transaction_atomic removal: GCC's TM support (-fgnu-tm) was experimental and is no longer actively developed due to TSX's deprecation trajectory.

IBM POWER HTM

IBM's POWER8/9/10 architecture includes HTM with stronger properties than TSX: - Uses a dedicated transactional memory buffer separate from the cache. - Larger read/write set capacity. - No security vulnerabilities equivalent to TAA. - No HLE equivalent (RTM-style tbegin/tend/tabort instructions only).

POWER HTM is used in IBM's mainframe products and some high-end server workloads.

Software Transactional Memory (STM)

STM implements transactional semantics entirely in software, without hardware support. Every load/store inside a transaction goes through STM read/write barriers:

// STM pseudocode
TxBegin(&tx);
    int x = TxLoad(&tx, &shared_x);  // record in read set
    TxStore(&tx, &shared_y, x + 1);  // buffer in write set
status = TxCommit(&tx);
// Commit: validate read set, apply write set atomically
if (status == CONFLICT) retry;

STM overhead is enormous: every load/store requires a function call and metadata update. Typical STM adds 5-50x overhead to transactional code. STM is generally practical only for coarse-grained transactions (microseconds or longer) or as a research vehicle.

Notable STMs: - TinySTM (EPFL): Log-based STM with commit-time validation. - JVSTM (Clojure): Clojure's software transactional memory for atoms and refs. - GCC libitm: The GNU STM library, supporting the C++ TM TS.

Conflict Detection: Eager vs Lazy

Eager (optimistic) conflict detection: Detect conflicts as they occur during transaction execution. When a read/write is made, immediately check for conflicts. Aborts earlier, potentially wastes less work, but has higher overhead per operation.

Lazy (pessimistic) conflict detection: Buffer all reads and writes. Validate the entire read set at commit time. Higher chance of reaching commit but more wasted work if conflict is detected at commit.

Intel TSX uses eager detection: a cache line conflict (write to a read set line by another CPU) immediately causes abort.

Contention Management

When two transactions conflict, which one should abort and which should continue? Contention management policies:

  • Aggressive: The winner (commit) is the one that detects the conflict. The other retries. Fast but can cause starvation.
  • Polite (backoff): The aborting transaction backs off exponentially before retrying.
  • Karma: Tracks "karma points" — transactions that have been aborted many times accumulate karma and eventually get priority.
  • Timestamp-based: The older transaction wins (timestamp priority).

In RTM hardware, the hardware decides (usually: the transaction detecting the conflict wins). Contention management in STM is software-controlled.

Connection to Optimistic Concurrency Control

Database systems have used optimistic concurrency control (OCC, Kung & Robinson, 1981) for decades: 1. Read-phase: Execute transaction, record read/write set. 2. Validation-phase: Check that read set is still valid. 3. Write-phase: Apply writes atomically.

Hardware TM is essentially OCC in hardware, with the cache as the read/write set buffer. This is why transactional memory resonates naturally with database practitioners.

Modern databases like FoundationDB, CockroachDB, and Google Spanner use OCC-like MVCC (Multi-Version Concurrency Control) — logically equivalent to TM but without hardware support.

Nesting and Exception Handling

RTM supports nested transactions: XBEGIN inside an outer XBEGIN. The outer transaction is the "true" transaction; inner XEND simply decrements a nesting counter. An inner XABORT aborts the outermost transaction.

This makes composability straightforward: a transactional library function can use XBEGIN/XEND and it will correctly nest inside a caller's transaction.

STM nesting is more complex: many STMs use "closed nesting" (inner aborts roll back only inner writes) or "open nesting" (inner commits are immediately visible).

Historical Context

Herlihy and Moss proposed hardware transactional memory in "Transactional Memory: Architectural Support for Lock-Free Data Structures" (ISCA, 1993). The key insight was using cache coherency hardware for conflict detection rather than software tracking.

Shavit and Touitou proposed software transactional memory in "Software Transactional Memory" (PODC, 1995), showing TM could be implemented without hardware support.

Harris et al. proposed language-level TM with a retry operator in "Composable Memory Transactions" (2005), which influenced Haskell's STM.

Intel's TSX (Haswell, 2013) was the first widely-deployed commercial HTM. IBM had HTM in POWER6 (2007) but with limited API.

Production Examples

  • Linux kernel (TSX lock elision attempt): Linux 3.15-4.4 attempted to use TSX HLE for spinlock optimization (/arch/x86/include/asm/spinlock.h). Disabled due to TSX TAA vulnerability.
  • glibc PTHREAD_MUTEX_DEFAULT with HLE: Some glibc versions used HLE transparently for pthread_mutex_lock() on TSX-capable hardware. Disabled after TAA.
  • IBM DB2: Uses HTM for row-level lock elision in multi-version concurrency control.
  • Clojure STM: Clojure's ref, alter, and dosync implement software transactional memory as a first-class language feature.
  • Intel TBB (Threading Building Blocks): Uses TSX for concurrent container access optimization.

Debugging Notes

  • XABORT reason codes: The status returned by _xbegin() contains flags indicating why the transaction aborted. Log these to understand abort patterns.
  • XBEGIN success rate: Measure the ratio of successful transactions to total attempts. Low success rate (< 50%) indicates contention or capacity issues — reconsider using TM.
  • TSX perf events: perf stat -e tx-start,tx-commit,tx-abort,tx-conflict on Intel CPUs shows transaction statistics.
  • tsx=off: Kernel boot parameter to disable TSX entirely, applied automatically on vulnerable systems by the TSX TAA microcode mitigation.
  • Clojure STM: (dosync) block with (alter) throws LockingTransaction$RetryEx on abort — Clojure retries automatically.

Security Implications

  • CVE-2019-11135 (TSX Asynchronous Abort / TAA): Exploits the transactional abort mechanism as a Meltdown-like side channel. An attacking program can use TSX XBEGIN/abort cycles to leak data from other processes' private memory, including kernel memory and other VMs. The leak works via the CPU's L1 fill buffer. Fully mitigated only by disabling TSX or applying microcode + OS patches that flush microarchitectural buffers on context switches.
  • Transaction-based timing attacks: The timing of transaction commit vs abort can leak information about the read set. If an attacker can influence which cache lines are in the transaction, they can probe memory contents via commit/abort timing.
  • STM rollback and exception safety: In STM, a transaction abort rolls back writes. If C++ destructors fire on rollback (for stack-allocated objects), interactions between exception handling and TM are complex and potentially incorrect.

Performance Implications

  • Successful RTM transaction: ~10-30 cycles overhead vs a non-transactional version. Comparable to a spinlock acquire/release.
  • Failed RTM transaction: ~50-200 cycles (transaction setup + abort + fallback lock path). If abort rate is high, TM is slower than locking.
  • TSX capacity: Limited by L1/L2 size (~32-256KB). Transactions touching more data will abort due to capacity. Most useful for small, hot critical sections.
  • STM overhead: 5-50x over non-transactional code. Practical only for granular operations where the overhead is amortized.
  • TSX lock elision speedup: Benchmarks show 10-60% improvement for read-heavy workloads on TSX-capable hardware. Negated if contention is high.

Common Pitfalls

  1. No fallback path: Using RTM without a correct lock-based fallback. If the transaction aborts 100% of the time (e.g., due to a microcode update disabling TSX), the program hangs or corrupts data.
  2. Non-transactional code in transaction: Any system call, I/O, or non-transactional instruction inside an RTM region will immediately abort the transaction. Debug with XABORT reason code checking.
  3. Assuming commit = success: After XEND, the transaction is committed, but this doesn't mean the transaction was conflict-free — it means no conflict was detected. Rare hardware bugs could theoretically allow incorrect commits.
  4. STM with blocking operations: STM requires all writes to be revokable. Blocking operations (mutex lock, I/O) cannot be revoked. STM must either prohibit them or handle them specially (open nesting).
  5. TSX performance cliff: Performance that looks good in testing can cliff on production hardware due to:
  6. NUMA (remote access invalidates cache lines)
  7. OS interrupts (timer interrupt fires ~1000/sec and aborts any transaction running at that moment)
  8. Virtualization (VM exits abort transactions)

Real-World Failure Cases

glibc TSX mutex regression (2014): glibc 2.19 added transparent HLE optimization to pthread_mutex_lock(). On some workloads (notably, GCC bootstrap), this regressed performance by 30% due to high abort rates from cache pressure. The optimization was made conditional and eventually removed after TAA.

Intel TSX disable via microcode (2014/2019): In 2014, Intel silently disabled TSX via microcode update on affected Haswell processors due to correctness bugs. Users who had written RTM-dependent code (without proper fallbacks) found their programs taking the fallback path 100% of the time. This demonstrated the critical importance of always having a correct non-TM fallback.

CVE-2019-11135 (TAA): Disclosed November 2019. Intel Xeon processors using TSX were found to leak memory via the transactional abort side channel. The mitigation required disabling TSX on non-Xeon processors and flushing microarchitectural buffers (MDS mitigation) on context switches on Xeon. This had up to 20% performance impact on I/O-intensive workloads.

Modern Usage and Cloud-Scale

  • TSX in cloud: AWS, GCP, and Azure selectively expose TSX to VMs based on processor generation and security posture. Most modern instances run with TSX disabled due to TAA.
  • Database TM connection: MVCC in databases (PostgreSQL, MySQL InnoDB, Oracle) implements the same OCC concept in software at the row/page level, with undo logs serving as the write buffer and MVCC timestamps as conflict detection.
  • IBM Z (mainframe) HTM: IBM's mainframe processors continue to invest in HTM, using it for lock elision in the z/OS kernel and commercial RDBMS products.
  • RISC-V TM extension: A transactional memory extension (TM) is being standardized for RISC-V, learning from x86's TSX experience.

Future Directions

  • Persistent TM: Transactional memory for non-volatile memory (Optane DCPMM), combining atomicity with crash consistency. Active research area.
  • Speculative lock elision in RISC-V: The RISC-V TM extension design incorporates lessons from TSX bugs, aiming for a more reliable implementation.
  • TM and deterministic execution: TM can be combined with deterministic execution frameworks (where thread interleavings are predetermined) to provide both performance and reproducibility.
  • TM in managed runtimes: JVM and CLR could implement object-level TM using object headers and write barriers already present for GC, potentially with lower overhead than generic STM.

Summary Table

Mechanism Conflict detection Fallback Capacity Security Status
Intel TSX/RTM Cache coherency (eager) Lock required L1/L2 (~32-256KB) TAA (CVE-2019-11135) Deprecated (most CPUs)
Intel HLE Cache coherency Transparent L1/L2 TAA Deprecated
IBM POWER HTM Cache coherency Lock required Larger buffer No known CVE Active (Xeon-class)
Software STM Software log N/A Unlimited N/A Niche (Clojure, research)
GCC libitm Software log N/A Unlimited N/A Maintained, rarely used

Exercises

  1. RTM microbenchmark: On a TSX-capable Intel processor (Haswell through Ice Lake without microcode TSX disable), implement a lock-protected counter and an RTM-protected counter. Measure throughput at 1, 2, 4, 8 threads. Observe where RTM wins and where contention causes high abort rates.

  2. Abort reason logging: Write an RTM program that logs _xbegin() status codes. Intentionally cause each abort type: (a) explicit _xabort(), (b) system call inside transaction, (c) large read set (>32KB). Log and categorize the abort reasons.

  3. Clojure STM experiment: Write a Clojure bank transfer using ref and dosync. Add a (Thread/sleep 100) inside the transaction. Observe that Clojure's STM retries handle the artificial race correctly.

  4. TSX fallback correctness: Write an RTM implementation where the fallback path is deliberately buggy (e.g., missing a mutex unlock). Trigger the TSX to abort 100% of the time (by touching >32KB inside the transaction). Observe the bug manifest. Fix the fallback and verify.

  5. Simulate MVCC as STM: Implement a simple in-memory MVCC key-value store. Transactions read from a snapshot (timestamp). Writes go to a pending set. At commit, validate read set against committed timestamps. This simulates STM semantics using database-style OCC.

References

  • Herlihy, M. & Moss, J.E.B. (1993). "Transactional Memory: Architectural Support for Lock-Free Data Structures." ISCA '93.
  • Shavit, N. & Touitou, D. (1995). "Software Transactional Memory." PODC '95.
  • Harris, T., Marlow, S., Peyton Jones, S., & Herlihy, M. (2005). "Composable Memory Transactions." PPoPP '05.
  • Yoo, R.M. & Lee, H.H.S. (2008). "Adaptive Transaction Scheduling for Transactional Memory Systems." SPAA '08.
  • Intel Transactional Synchronization Extensions Programming Reference (Intel SDM)
  • CVE-2019-11135: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11135
  • Dice, D., Shalev, O., & Shavit, N. (2006). "Transactional Locking II." DISC '06.
  • Kung, H.T. & Robinson, J.T. (1981). "On Optimistic Methods for Concurrency Control." TODS 6(2):213-226.