08 — Transaction Isolation Levels

Technical Overview

Transaction isolation levels define the degree to which one transaction can see uncommitted changes from other concurrent transactions. The ANSI SQL-92 standard defined four isolation levels — READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE — and characterized them by the anomalies they allow or prevent. In practice, actual implementations using MVCC provide stronger guarantees than ANSI SQL-92 implies, while simultaneously allowing anomalies (like write skew) that the standard did not anticipate.

Understanding isolation levels is critical for building correct concurrent applications. The choice of isolation level is a tradeoff between correctness and performance. Getting it wrong in either direction — choosing too strong an isolation level and serializing all transactions, or too weak and allowing data anomalies — both cause real production incidents.

Prerequisites

Understanding of MVCC (how version chains and snapshots work)
Familiarity with lock-based concurrency (shared and exclusive locks)
Knowledge of transaction commit/abort semantics
Basic understanding of concurrent programming and race conditions

Core Content

ANSI SQL Isolation Levels and Anomalies

The ANSI SQL-92 standard defines isolation levels by the anomalies they prevent:

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Write Skew	Lost Update
READ UNCOMMITTED	Allowed	Allowed	Allowed	Allowed	Allowed
READ COMMITTED	Prevented	Allowed	Allowed	Allowed	Allowed
REPEATABLE READ	Prevented	Prevented	Allowed*	Allowed**	Prevented
SERIALIZABLE	Prevented	Prevented	Prevented	Prevented	Prevented

PostgreSQL REPEATABLE READ prevents phantom reads (stronger than ANSI requires, due to snapshot semantics) *Write skew is not in the ANSI SQL-92 standard; it was identified later as an anomaly allowed by snapshot isolation

Anomaly Definitions

Dirty Read: Transaction T2 reads a row written by T1 that has not yet committed. If T1 aborts, T2 has read data that never existed.

T1: UPDATE accounts SET balance = balance - 100 WHERE id=1;  -- not committed
T2: SELECT balance FROM accounts WHERE id=1;                  -- sees T1's uncommitted change
T1: ROLLBACK;
T2 has now acted on a balance that was never real.

Non-Repeatable Read: T1 reads a row, T2 modifies and commits that row, T1 reads the same row again and gets a different value.

T1: SELECT salary FROM employees WHERE id=42;  -- returns 50000
T2: UPDATE employees SET salary=60000 WHERE id=42; COMMIT;
T1: SELECT salary FROM employees WHERE id=42;  -- returns 60000 (changed!)

Phantom Read: T1 executes a range query, T2 inserts a row matching that range and commits, T1 re-executes the same range query and gets additional rows.

T1: SELECT * FROM orders WHERE amount > 1000;   -- returns 5 rows
T2: INSERT INTO orders (id, amount) VALUES (99, 2000); COMMIT;
T1: SELECT * FROM orders WHERE amount > 1000;   -- returns 6 rows (phantom!)

Write Skew: Two transactions read overlapping data, both decide to write (non-overlapping writes), and the result violates a constraint that should have prevented the write.

Constraint: At least one doctor must be on-call at all times.
Both doctors see 2 on-call → both go off-call → 0 on-call (violated!)
(Detailed example in 04-mvcc.md)

Lost Update: Two transactions both read a value, both compute updates, and the second transaction's write overwrites the first transaction's write.

T1: x = SELECT balance FROM accounts WHERE id=1;  -- x=100
T2: x = SELECT balance FROM accounts WHERE id=1;  -- x=100
T1: UPDATE accounts SET balance = 100+50 WHERE id=1;  -- balance=150; COMMIT
T2: UPDATE accounts SET balance = 100+30 WHERE id=1;  -- balance=130; COMMIT
Result: 130. T1's deposit of 50 was lost!

MVCC-Based Implementation

Modern databases implement isolation levels via MVCC snapshots rather than pure locking:

READ COMMITTED (PostgreSQL default for BEGIN without explicit level): - Each SQL statement gets a fresh snapshot. - Sees all committed data as of the start of that statement. - Non-repeatable reads are possible: two SELECTs within the same transaction can see different data if a concurrent transaction committed between them.

REPEATABLE READ (InnoDB default! PostgreSQL: must specify explicitly): - The entire transaction sees a single snapshot taken at the start of the first query. - Sees only data committed before the transaction began. - In PostgreSQL (but not ANSI SQL): also prevents phantoms, because the snapshot is taken at transaction start, not statement start.

SERIALIZABLE (implemented via SSI in PostgreSQL since 9.1): - Equivalent to transactions executing one at a time in some serial order. - Prevents all anomalies including write skew.

PostgreSQL snapshot timing:

READ COMMITTED:
  T1: BEGIN;
  T1: SELECT ...  (snapshot S1 taken here)
  T1: SELECT ...  (new snapshot S2 taken here — may see T2's commits)
  T1: COMMIT;

REPEATABLE READ:
  T1: BEGIN;
  T1: SELECT ...  (snapshot S1 taken here)
  T1: SELECT ...  (snapshot S1 reused — never sees T2's commits)
  T1: COMMIT;

SELECT FOR UPDATE and Locking

SELECT FOR UPDATE acquires an exclusive row-level lock, converting an MVCC read into a locking read. This is the mechanism for preventing lost updates and implementing pessimistic concurrency control on top of an MVCC system.

-- Safe "read-modify-write" pattern:
BEGIN;
SELECT balance FROM accounts WHERE id=1 FOR UPDATE;  -- acquires exclusive row lock
-- Now no other transaction can modify this row until our transaction ends
UPDATE accounts SET balance = balance + 100 WHERE id=1;
COMMIT;

SELECT FOR SHARE (SELECT ... LOCK IN SHARE MODE in MySQL) acquires a shared lock. Prevents concurrent writes but allows concurrent readers.

SELECT FOR UPDATE SKIP LOCKED skips rows that are already locked by other transactions. Used for implementing work queues (a worker takes the next unlocked job).

SELECT FOR UPDATE NOWAIT fails immediately with an error rather than waiting if the row is locked.

Serializable Snapshot Isolation (SSI)

Snapshot Isolation (SI) allows write skew. Making SI truly serializable requires detecting and aborting transactions that form a "dangerous structure." Cahill, Röhm, and Fekete (2008) identified that all SI anomalies involve a specific pattern in the "serialization graph."

The key insight: under SI, a transaction T reads some version V1 and writes V2. If another transaction T' reads V2 and writes something that "would have" affected V1, we have a cycle. SERIALIZABLE isolation requires aborting one of the transactions in such cycles.

PostgreSQL SSI (since 9.1): PostgreSQL's SSI (READ WRITE transactions at REPEATABLE READ level become SERIALIZABLE) tracks: - rw-anti-dependency edges: T1 read a version that T2's write supersedes (T1 reads old, T2 writes new) - Serialization conflict: When two rw-anti-dependency edges form a cycle

Implementation uses SIREAD locks (predicate locks) that record which ranges a transaction has read. These are not blocking locks — they only track dependencies. The predicate.c module in PostgreSQL manages SIREAD locks.

/* src/backend/storage/lmgr/predicate.c */
/* CheckForSerializableConflictOut -- check for rw-conflict on a
 * tuple being read that was modified by another transaction */
void CheckForSerializableConflictOut(bool visible, Relation relation,
                                      HeapTuple htup, Buffer buffer,
                                      Snapshot snapshot)

When a dangerous cycle is detected, one of the transactions is aborted with ERROR: could not serialize access due to read/write dependencies among transactions. The application must retry the transaction.

MySQL InnoDB Default: REPEATABLE READ with Gap Locks

MySQL InnoDB's default isolation level is REPEATABLE READ (not READ COMMITTED, as PostgreSQL's is). This is stronger than most applications need and has implications for lock behavior.

InnoDB's REPEATABLE READ uses gap locks and next-key locks to prevent phantom reads in some (but not all) cases:

Record lock: Locks a specific row.
Gap lock: Locks the gap between index records (prevents INSERT into the gap). Gap locks don't lock existing records.
Next-key lock: A combination of a record lock and a gap lock on the gap before the record. Default lock type in InnoDB.

Index: [10, 20, 30, 40]

Gap lock on gap (20, 30):  prevents INSERT of 21-29
Next-key lock on record 30: locks the row 30 AND the gap (20, 30)

Transaction T1:
  SELECT * FROM t WHERE id > 15 FOR UPDATE;
  -> Acquires next-key locks on (20, gap(10,20)), 30, 40, and the supremum gap
  -> Prevents concurrent INSERT of any id > 15 until T1 commits

Transaction T2:
  INSERT INTO t VALUES (25);  -- BLOCKED by T1's gap lock

Gap locks in InnoDB can cause surprising deadlocks when two transactions try to insert into the same gap from different rows.

Practical Advice on Isolation Level Selection

READ COMMITTED is appropriate for: - Reporting queries that don't need a consistent view across multiple statements - High-throughput OLTP where lock contention matters - When your application is designed around single-statement atomicity

REPEATABLE READ is appropriate for: - Multi-statement transactions that read the same data multiple times - Financial transactions where consistency within a transaction is critical - Any transaction that performs a check-then-act pattern (but beware write skew!)

SERIALIZABLE is appropriate for: - Check-then-act patterns that would be vulnerable to write skew - Complex financial operations involving multiple accounts - Any case where write skew could violate business invariants

Never use READ UNCOMMITTED in application code. Dirty reads produce logically inconsistent results. The performance difference between READ UNCOMMITTED and READ COMMITTED is negligible (a few percent) — not worth the correctness risk.

Isolation Bugs in Production

The banking double-spend: Two concurrent withdrawal requests, both checking IF balance >= amount THEN deduct. Under READ COMMITTED with application-level checks, both can pass the check before either commits, both withdraw, and balance goes negative. Fix: SELECT FOR UPDATE or a CHECK constraint at the database level.

The inventory race: Two users simultaneously "reserve" the last item. Both check stock > 0, both see 1, both decrement to 0, stock goes to -1. Fix: Use UPDATE inventory SET stock = stock - 1 WHERE item_id = ? AND stock > 0 and check the affected row count.

The TOCTOU booking: A hotel reservation system checks room availability then inserts a booking in two separate queries without FOR UPDATE. Two concurrent bookings for the same room can both pass the availability check. Fix: unique constraint + retry on conflict, or SELECT FOR UPDATE on the room record.

Historical Context

The ANSI SQL-92 standard's isolation levels were a compromise that attempted to describe existing implementations (notably IBM DB2) rather than defining a theoretically clean model. Berenson, Bernstein, Gray, Melton, O'Neil, and O'Neil's 1995 paper "A Critique of ANSI SQL Isolation Levels" demonstrated that the ANSI definitions were ambiguous and incomplete, and introduced the "snapshot isolation" level that most MVCC systems actually implement.

The write skew anomaly was first characterized in this 1995 paper, exposing that snapshot isolation allows an anomaly the ANSI standard failed to classify. This motivated research into Serializable Snapshot Isolation (Cahill et al., 2008; Fekete et al., 2005) and its eventual integration into PostgreSQL 9.1 by Kevin Grittner and Dan Ports.

Production Examples

PostgreSQL SSI: src/backend/storage/lmgr/predicate.c (>5000 lines). GetSerializableTransactionSnapshot() initializes SSI tracking. CheckForSerializableConflictOut() and CheckForSerializableConflictIn() detect anti-dependency edges. FlagSxactUnsafe() marks a transaction for potential abort.

InnoDB isolation levels: storage/innobase/trx/trx0trx.cc. trx_set_rw_mode() and isolation level management. Next-key locking in storage/innobase/lock/lock0lock.cc.

CockroachDB: Implements serializable isolation (no weaker levels available to users). Uses write intents (a special lock record in RocksDB MVCC) and timestamp uncertainty intervals (HLC timestamps) to detect conflicts without a central lock manager.

Debugging Notes

Identify current isolation level: SHOW transaction_isolation; (PostgreSQL) or SELECT @@transaction_isolation; (MySQL).
Detect serialization failures: In PostgreSQL, ERROR: could not serialize access due to read/write dependencies indicates an SSI abort. Implement retry logic in application.
InnoDB lock wait debugging: SELECT * FROM information_schema.INNODB_LOCK_WAITS; and SELECT * FROM information_schema.INNODB_LOCKS; (MySQL 5.7). In MySQL 8.0: SELECT * FROM performance_schema.data_lock_waits;.
Deadlock detection: PostgreSQL: SET deadlock_timeout = '1s'; and check pg_stat_activity for waiting transactions. InnoDB: SHOW ENGINE INNODB STATUS\G shows the most recent deadlock.
Long-running transactions holding locks: SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;

Security Implications

Information leakage via READ UNCOMMITTED: Applications that accidentally use READ UNCOMMITTED can expose partially-written data from other transactions, including data that will eventually be rolled back. This violates data integrity and may expose sensitive data before it should be visible.
Time-of-check to time-of-use (TOCTOU): Race conditions in application code that check a condition then act on it without holding locks are TOCTOU vulnerabilities. Database isolation levels don't help if the check and action are in separate transactions. Always use SELECT FOR UPDATE for check-then-act patterns.
Serialization retries as DoS: Under SERIALIZABLE isolation, an adversarial workload can trigger excessive serialization failures, causing legitimate transactions to abort repeatedly. Production systems should implement exponential backoff for retried transactions.
Gap lock exploitation (InnoDB): An adversarial transaction can hold gap locks on a range and prevent concurrent legitimate inserts from completing, causing timeout-based DoS. This is mitigated by innodb_lock_wait_timeout.

Performance Implications

SERIALIZABLE overhead (PostgreSQL SSI): ~5-15% overhead vs REPEATABLE READ in typical OLTP workloads due to SIREAD lock tracking. In write-skew-prone workloads, the overhead of tracking + aborting + retrying is workload-dependent.
Gap locks (InnoDB REPEATABLE READ): Gap locks cause significantly more lock contention than READ COMMITTED's row-level locks only. High-concurrency INSERT workloads often benefit from switching to READ COMMITTED.
READ COMMITTED reduces lock wait: Under READ COMMITTED, statement-level snapshots mean readers don't hold long-lived MVCC snapshots, reducing the need for VACUUM (PostgreSQL) and undo log retention (InnoDB).
FOR UPDATE lock granularity: PostgreSQL SELECT FOR UPDATE acquires locks on individual tuples. For queries returning many rows, this can create thousands of lock entries. Use NOWAIT or SKIP LOCKED to avoid queueing.

Failure Modes

Serialization cascade: Under high contention with SERIALIZABLE isolation, transactions abort each other in a cascade. Implement retry with exponential backoff. Limit the number of retries before returning an error to the user.
Phantom-based bugs at REPEATABLE READ in MySQL: InnoDB's REPEATABLE READ prevents some but not all phantom scenarios (gap locks prevent phantom reads, but gap locks are not taken for all query patterns). Test phantom scenarios explicitly.
InnoDB deadlock from gap locks: Transaction A holds a gap lock on (10, 20), trying to also lock (20, 30). Transaction B holds a gap lock on (20, 30), trying to also lock (10, 20). Deadlock. MySQL detects this and rolls back the smaller transaction. Implement retry logic.
Snapshot isolation write skew (silent correctness bug): The most insidious failure mode. The application appears to work correctly under low concurrency but produces logically inconsistent results under high concurrency. Reproduce in testing with concurrent load, or upgrade to SERIALIZABLE.

Modern Usage

Most production OLTP applications use READ COMMITTED. PostgreSQL's default. This works because most application-level transactions are short and single-statement, and application developers often add SELECT FOR UPDATE where needed.

Vitesse / Neon / PlanetScale: Cloud-native serverless databases default to READ COMMITTED for connection pooler compatibility (connection poolers typically operate at statement granularity, requiring READ COMMITTED semantics).

CockroachDB's forced SERIALIZABLE: By providing only SERIALIZABLE isolation, CockroachDB forces application developers to handle serialization retries, but eliminates an entire class of correctness bugs. Their documentation provides retry wrappers for popular ORMs.

Future Directions

Formal verification of isolation: Tools like Elle (Jepsen's test framework) and Hermitage systematically test database isolation levels for anomalies. Production databases are increasingly tested with these tools in CI.
Weak isolation for performance-consistency tradeoffs: Research into "invariant-based" isolation (Balegas et al., 2015 — "Putting Consistency Back into Eventual Consistency") where the isolation level is chosen based on the invariants each transaction must preserve.
Hardware transactional memory: Intel TSX and IBM Power's hardware transactional memory allow speculative execution of transactions with hardware-level conflict detection. Prototypes have explored replacing software MVCC with HTM for very short OLTP transactions.

Exercises

Demonstrate dirty reads in MySQL: set Session 1 to SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. In Session 2, start a transaction, update a row, but don't commit. In Session 1, read the row. Then rollback Session 2. Observe Session 1 had a dirty read.
Reproduce write skew at REPEATABLE READ: use the doctor on-call example. Implement it in PostgreSQL with REPEATABLE READ. Confirm write skew occurs. Switch to SERIALIZABLE and confirm one transaction aborts.
Implement retry logic for serialization failures: write a Python function that wraps a database transaction with retry-on-serialization-failure and exponential backoff.
Demonstrate gap lock deadlock in InnoDB: two concurrent sessions inserting into the same range. Observe the deadlock and verify MySQL rolls back one of them.
Measure READ COMMITTED vs REPEATABLE READ performance: using pgbench on PostgreSQL, run the default TPC-B-like benchmark at both isolation levels for 60 seconds. Compare TPS and identify if any queries behave differently.

References

ANSI/ISO SQL Standard: SQL-92 (ISO 9075:1992), Transaction Isolation Levels.
Berenson, H., et al. (1995). A Critique of ANSI SQL Isolation Levels. SIGMOD 1995.
Fekete, A., et al. (2005). Making Snapshot Isolation Serializable. ACM TODS, 30(2), 492–528.
Cahill, M. J., Röhm, U., & Fekete, A. D. (2008). Serializable Isolation for Snapshot Databases. SIGMOD 2008.
Ports, D. R. K., & Grittner, K. (2012). Serializable Snapshot Isolation in PostgreSQL. PVLDB 2012.
Bailis, P., et al. (2012). Probabilistically Bounded Staleness for Practical Partial Quorums. PVLDB 2012.
PostgreSQL transaction isolation documentation: https://www.postgresql.org/docs/current/transaction-iso.html