Skip to content

12 — Locking and Concurrency Control

Technical Overview

Locking is the oldest and most fundamental approach to concurrency control in database systems. Before MVCC became dominant, all production databases used locking to serialize concurrent access to shared data. Even in MVCC-based systems, locking remains necessary for certain operations: DDL, certain DML patterns (SELECT FOR UPDATE), range predicate locking in serializable isolation, and auto-increment sequences.

Two-Phase Locking (2PL) is the theoretical foundation guaranteeing serializability. The InnoDB lock manager implements strict 2PL with next-key locking. PostgreSQL uses locking for heavyweight operations but relies on MVCC for regular DML. CockroachDB uses write intents as a form of distributed lock. Understanding the full locking spectrum — from row-level record locks to database-wide DDL locks — is essential for debugging deadlocks and diagnosing lock wait chains.

Prerequisites

  • Understanding of MVCC and transaction isolation levels
  • Familiarity with graph theory (directed graphs, cycles) for deadlock detection
  • Knowledge of transaction serializability concepts
  • Basic understanding of mutex/semaphore primitives

Core Content

Two-Phase Locking (2PL)

2PL invariant: A transaction must not acquire any new locks after it has released any lock. This divides transaction execution into two phases: - Growing phase: acquire locks as needed; never release any - Shrinking phase: release locks; never acquire new ones

Proof of serializability: In the conflict-serialization graph of a 2PL schedule, there can be no cycles. If transactions T1 and T2 conflict (one holds a lock the other wants), the waiting transaction's position in the serial order is determined by when it acquires the conflicting lock. No cycle → topological sort → serial order exists → serializable.

Strict 2PL: All exclusive (write) locks are held until transaction commit/abort. Prevents cascading aborts (if T1 modifies a row and T2 reads T1's modification before T1 commits, T2 must be rolled back if T1 aborts — a "cascading abort"). Strict 2PL is required in practice.

Rigorous 2PL (Strong Strict 2PL): All locks (shared AND exclusive) held until commit/abort. Used by some database systems to avoid even shared-lock cascading.

Lock Granularity

Databases support multiple lock granularities. Coarser locks are faster to manage but cause more contention; finer locks reduce contention but increase overhead.

Granularity Use Case Lock Count
Database DDL operations (CREATE DATABASE) 1
Schema DDL operations (ALTER TABLE) 1
Table Full table scans, TRUNCATE 1
Page Rare; some legacy systems Many
Row/Tuple OLTP DML (UPDATE specific rows) Millions
Column Column-level updates (rare in RDBMS) Millions

A transaction performing UPDATE customers SET vip=TRUE might lock 10,000 rows. Each lock entry is ~50-100 bytes in InnoDB's lock manager. 10,000 rows × 80 bytes = 800KB of lock memory.

Intention Locks

When a transaction wants a fine-grained lock (row-level), it must first declare its intent at each ancestor granularity. This prevents a coarser-grained lock from being granted to another transaction while the fine-grained locks are held.

Intention lock types: - IS (Intention Shared): Intends to acquire S locks on some tuples - IX (Intention eXclusive): Intends to acquire X locks on some tuples - SIX (Shared + Intention eXclusive): Holds S on the table AND intends X on some tuples (e.g., reading a table while updating specific rows)

Lock Compatibility Matrix

Requesting  |         Held Lock
Lock        |  IS    IX   S    SIX   X
------------+---------------------------------
IS          |  OK    OK   OK   OK    NO
IX          |  OK    OK   NO   NO    NO
S           |  OK    NO   OK   NO    NO
SIX         |  OK    NO   NO   NO    NO
X           |  NO    NO   NO   NO    NO

Legend: OK = compatible (can grant immediately), NO = incompatible (must wait)

Row-level lock compatibility:
            |  S (shared)    X (exclusive)
------------+-----------------------------
S           |  OK            NO
X           |  NO            NO

Example: - T1 holds IX on Table T (wants to update some rows in T) - T2 requests S on Table T (wants to read all rows in T) - Compatibility: IX vs S = NO → T2 must wait - But T3 requests IS on Table T (wants to read some rows) → IS vs IX = OK → T3 can proceed

This allows T1 to update specific rows while T3 reads other specific rows, but prevents T2 from scanning the full table while T1 is updating.

Lock Manager Implementation

The lock manager maintains a lock table: a hash map from resource identifiers to lock lists.

Lock Table (hash by resource):

Resource: (table=5, row=42)
  +------------------------------------------+
  | Lock Entry: txn=100, mode=X, status=HELD |
  | Lock Entry: txn=101, mode=S, status=WAIT |
  | Lock Entry: txn=102, mode=S, status=WAIT |
  +------------------------------------------+

Resource: (table=5, row=17)
  +------------------------------------------+
  | Lock Entry: txn=103, mode=S, status=HELD |
  | Lock Entry: txn=104, mode=S, status=HELD |
  +------------------------------------------+

When T100 commits and releases its X lock on (table=5, row=42): 1. Find the lock entry list for the resource 2. Remove T100's lock entry 3. Check waiting transactions (T101, T102 both want S) 4. S + S = compatible → grant both T101 and T102 simultaneously 5. Wake up both waiting transactions

In InnoDB, the lock table is implemented in storage/innobase/lock/lock0lock.cc. The lock table is partitioned into LOCK_TABLE_SIZE (256 by default in older versions, more in recent) buckets, each protected by a mutex or read-write latch.

InnoDB lock data structures:

// storage/innobase/include/lock0types.h
struct lock_t {
    trx_t*      trx;          // Transaction that owns the lock
    UT_LIST_NODE_T(lock_t) trx_locks; // Link in transaction's lock list
    dict_index_t* index;      // Index for a record lock
    lock_t*     hash;         // Hash chain node
    union {
        lock_table_t tab_lock; // Table lock
        lock_rec_t   rec_lock; // Record lock
    };
    ib_uint32_t type_mode;    // Lock type and mode
};

Deadlock Detection

A deadlock occurs when a cycle forms in the wait-for graph: T1 waits for T2, T2 waits for T3, T3 waits for T1.

Wait-For Graph:

  T1 ----waits for----> T2
  ^                      |
  |                      v
  T4 <---waits for---- T3

  Cycle: T1 → T2 → T3 → T4 → T1  (deadlock!)

Cycle detection: InnoDB runs a background deadlock detector that traverses the wait-for graph periodically (or when a new wait edge is added). If a cycle is found, InnoDB selects a victim transaction (typically the one with the fewest locks held, to minimize rollback cost) and rolls it back with ERROR 1213: Deadlock found when trying to get lock.

Timeout-based deadlock detection: An alternative to cycle detection. If a lock wait exceeds innodb_lock_wait_timeout (default 50 seconds), InnoDB rolls back the waiting transaction. This handles deadlocks but also kills legitimate long-running transactions.

PostgreSQL's deadlock detector (src/backend/storage/lmgr/deadlock.c) uses a more sophisticated algorithm that also considers lock upgrading conflicts. It runs when a lock wait exceeds deadlock_timeout (default 1 second).

Optimistic Concurrency Control (OCC)

OCC (Kung & Robinson, 1981) assumes conflicts are rare and allows transactions to proceed without locking, validating at commit time.

Three phases: 1. Read phase: Execute transaction against a private workspace (no locking). Record read set and write set. 2. Validation phase: Check if the transaction's read set was modified by any concurrently committed transaction. If not, proceed; if yes, abort and restart. 3. Write phase: If validation passes, apply write set to the database.

OCC Example:
  T1 reads rows A, B (read_set={A,B}), writes C (write_set={C})
  T2 reads row D (read_set={D}), writes A (write_set={A})

  T1 validation: has any committed transaction modified {A, B}?
    T2 committed and wrote A → T2's write_set ∩ T1's read_set = {A} ≠ ∅
    → T1 must ABORT and RESTART

  If T2 had written only D (not in T1's read_set), T1 would have committed.

OCC excels when conflict rates are low (< 5-10%). For high-conflict workloads, OCC degrades because many transactions abort in the validation phase, wasting work. MVCC is usually preferred over OCC for databases because MVCC readers never conflict with writers.

Comparison: OCC vs 2PL vs MVCC

Property 2PL OCC MVCC
Readers block writers Yes (S blocks X) No No
Writers block readers Yes (X blocks S) No No (reads old version)
Deadlock possible Yes No Yes (for X locks)
Conflict detection At lock acquisition At commit At commit (write-write)
Wasted work on abort Low (immediate block) High (entire txn) Low
Best for High-conflict OLTP Low-conflict OLTP Mixed read/write OLTP

Predicate Locking

Standard record locking locks existing records. But what about preventing INSERTs into a range that a transaction has scanned (preventing phantom reads)?

Predicate locks lock a predicate (query condition) rather than specific records. Any INSERT that satisfies the predicate conflicts with the predicate lock.

T1: SELECT * FROM orders WHERE amount > 1000;
  → Acquires predicate lock on: amount > 1000

T2: INSERT INTO orders (id, amount) VALUES (99, 2000);
  → Conflicts with T1's predicate lock (2000 > 1000)
  → T2 must wait until T1 commits/aborts

Exact predicate locking is expensive to implement (requires checking every new insert/update against all active predicates). In practice, databases use approximations: - PostgreSQL SSI predicate locks: Use SIREAD locks covering entire pages or relations as approximations of row-level predicate locks. - InnoDB gap locks: Lock the gap between index records, preventing inserts into that gap.

Gap Locking (InnoDB)

InnoDB's REPEATABLE READ implementation uses gap locks to prevent phantom reads within transactions using SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE.

Index: [10, 20, 30, 40, 50]

T1: SELECT * FROM t WHERE id BETWEEN 20 AND 40 FOR UPDATE;

InnoDB acquires:
  Record lock on 20 (existing row)
  Record lock on 30 (existing row)
  Record lock on 40 (existing row)
  Gap lock on (20, 30) -- prevents INSERT of 21-29
  Gap lock on (30, 40) -- prevents INSERT of 31-39
  Gap lock on (10, 20) -- prevents INSERT of 11-19 (lower bound of range scan)

T2: INSERT INTO t VALUES (25);  -- BLOCKED by T1's gap lock on (20, 30)
T2: INSERT INTO t VALUES (50);  -- ALLOWED (50 not in T1's range)
T2: UPDATE t SET val=100 WHERE id=15;  -- BLOCKED (15 in gap (10,20))

Next-key lock: InnoDB's default lock type for most reads with FOR UPDATE. Combines a record lock on the row AND a gap lock on the gap before the row. For the record at key 30, the next-key lock covers the gap (20, 30] — all values from 21 to 30 inclusive.

Next-Key Locks for: SELECT * FROM t WHERE id >= 20 FOR UPDATE;
  Next-key lock on 20: locks (10, 20] — gap before 20 + record 20
  Next-key lock on 30: locks (20, 30] — gap before 30 + record 30
  ...
  Lock on supremum gap: locks (50, +∞) — prevents inserting after 50

Gap locks interact in non-obvious ways, causing deadlocks:

Deadlock scenario:
  T1: INSERT INTO t (id) VALUES (15);
    → Checks gap lock on (10, 20): no lock held → proceeds
    → Waits to acquire INSERT intention lock on gap (10, 20)
  T2: INSERT INTO t (id) VALUES (25);
    → Checks gap lock on (20, 30): no lock held → proceeds
    → Waits to acquire INSERT intention lock on gap (20, 30)
  T1: Also needs to check (20, 30) gap → conflicts with T2 → DEADLOCK

Isolation Level Effect on Locking

Different isolation levels affect which locks are acquired:

READ COMMITTED (InnoDB): - Only record locks; no gap locks - Locks released after each statement (not held to end of transaction) - Greatly reduces lock contention and deadlock probability

REPEATABLE READ (InnoDB default): - Record + gap + next-key locks - All locks held until transaction end - Higher contention, more deadlocks

SERIALIZABLE (InnoDB and PostgreSQL SSI): - Full predicate locking (PostgreSQL: SIREAD locks; InnoDB: all SELECT operations implicitly get LOCK IN SHARE MODE behavior) - Highest contention, lowest anomalies

OCC in Practice: Optimistic vs Pessimistic for Specific Patterns

InnoDB uses optimistic concurrency for INSERT operations (most inserts don't conflict) and pessimistic for UPDATE/DELETE (must lock existing rows). This hybrid is common in production systems.

InnoDB "optimistic" INSERT:
  1. Locate the target page
  2. Check if another transaction holds a conflicting gap lock
  3. If no conflict: insert without waiting
  4. If conflict: fall back to pessimistic path (wait for lock)

Compare with "pessimistic" UPDATE:
  1. Locate target row
  2. Acquire exclusive record lock (wait if conflicting lock held)
  3. Modify row under lock
  4. Lock held until transaction end

Lock Overhead and Lock Escalation

In some systems (SQL Server), if a transaction acquires too many row-level locks, the lock manager automatically "escalates" to a table-level lock to reduce lock manager memory overhead. This can cause unexpected blocking:

Lock Escalation (SQL Server):
  T1 acquires 5000 row locks → lock escalation threshold → upgrades to TABLE LOCK
  T2 wants any row lock on same table → blocked by T1's table lock

InnoDB does not have lock escalation — it can hold millions of row-level locks (bounded only by innodb_buffer_pool_size for lock memory). PostgreSQL does not have row-level locks in the traditional sense for regular DML (MVCC handles this); it acquires lightweight "buffer locks" (latches) for page access, not long-held application locks.

Historical Context

Two-phase locking was first formalized by Eswaran, Gray, Lorie, and Traiger at IBM Research in their 1976 paper "The Notions of Consistency and Predicate Locks in a Database System." This paper defined the theoretical foundation of locking-based serializability and remains a key reference.

Optimistic concurrency control was proposed by Kung and Robinson in 1981 as an alternative for low-conflict workloads, with the insight that locking overhead is unnecessary when conflicts are rare.

MVCC largely superseded lock-based concurrency for reads in the 1990s-2000s, but locking remains essential for write conflicts. Modern systems like PostgreSQL use a hybrid: MVCC for readers, locks for write-write conflicts and DDL.

Production Examples

InnoDB lock manager: storage/innobase/lock/lock0lock.cc. lock_rec_add_to_queue() adds a record lock to the lock table. lock_rec_has_to_wait() checks compatibility. lock_deadlock_check_and_resolve() runs the deadlock detection cycle.

PostgreSQL lock manager: src/backend/storage/lmgr/lock.c. LockAcquire() is the main lock acquisition function. DeadLockCheck() in deadlock.c implements cycle detection. LockWaitError() returns the deadlock error to the application. pg_locks view exposes all current lock wait chains.

CockroachDB write intents: pkg/storage/intentresolver/intent_resolver.go. Write intents in RocksDB are resolved by the intent resolver. LockTableGuard in pkg/kv/kvserver/concurrency/ tracks locks per transaction.

Debugging Notes

  • PostgreSQL lock waits: SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE state='active' AND wait_event_type='Lock'; shows all transactions currently blocked on locks.
  • PostgreSQL lock chains: SELECT blocked_locks.pid, blocked_activity.query, blocking_locks.pid, blocking_activity.query FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_locks.pid = blocked_activity.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.transactionid = blocked_locks.transactionid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_locks.pid = blocking_activity.pid WHERE NOT blocked_locks.granted;
  • InnoDB deadlock log: SHOW ENGINE INNODB STATUS\G shows the last deadlock. SET GLOBAL innodb_print_all_deadlocks=ON; logs all deadlocks to the error log.
  • InnoDB lock waits: SELECT * FROM performance_schema.data_lock_waits; and SELECT * FROM performance_schema.data_locks; (MySQL 8.0) show all current lock waits.
  • InnoDB transactions with locks: SELECT * FROM information_schema.INNODB_TRX ORDER BY trx_lock_structs DESC; — high trx_lock_structs indicates transactions holding many locks.

Security Implications

  • Lock-based DoS: An attacker (or runaway query) holding a table-level lock (e.g., via LOCK TABLE t IN ACCESS EXCLUSIVE MODE in PostgreSQL) blocks all DML and DDL on that table. Set lock_timeout = '5s' to prevent lock wait indefinitely.
  • Lock escalation exploitation: In systems with lock escalation (SQL Server), a user with INSERT rights on a table can cause a table-level lock by inserting many rows in a single transaction, blocking other users. This is a privilege escalation-style availability attack.
  • Gap lock starvation: A long-running transaction holding a large range of gap locks can prevent inserts for the entire range's duration. This is an unintentional but exploitable DoS pattern in InnoDB REPEATABLE READ.
  • Deadlock as availability attack: Deliberately constructing transactions that create deadlocks forces the database to roll back victim transactions. While the database recovers, the application may retry infinitely, creating a resource exhaustion loop.

Performance Implications

  • Lock granularity choice: Row-level locking (InnoDB default) minimizes contention but uses more memory. For bulk operations (UPDATE 1M rows), acquiring 1M row locks is slower than a table lock. Use LOCK TABLE explicitly for batch operations, then release.
  • Isolation level and lock volume: READ COMMITTED holds locks briefly (per-statement) and acquires only record locks (no gap locks). Switching from REPEATABLE READ to READ COMMITTED can dramatically reduce deadlocks in INSERT-heavy workloads.
  • OCC overhead for low-conflict workloads: For workloads where conflicts are rare (<1%), OCC eliminates lock acquisition overhead. For high-conflict workloads, aborted transactions waste significant CPU and I/O.
  • Lock memory: InnoDB allocates lock memory from the buffer pool. SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Innodb_row_lock_current_waits'; and Innodb_row_lock_waits provide wait statistics.

Failure Modes

  1. Deadlock cascade: In a microservices architecture, each service holds a connection with open transactions. A deadlock in one connection causes a retry, which may create new deadlocks if the retry logic does not break the cycle. Implement exponential backoff for lock-related errors.
  2. Lock wait queue starvation: With many transactions waiting for the same lock, FIFO ordering is not always guaranteed. Some transactions at the back of the queue may starve. Implement application-level timeouts on operations.
  3. Lock memory exhaustion: If a single transaction acquires millions of row locks without releasing them, InnoDB's lock memory grows until it is limited by the buffer pool. This can cause out of memory errors in the lock manager. Set max_execution_time or innodb_lock_wait_timeout to limit runaway transactions.
  4. Phantom lock upgrade: A transaction that reads a row (acquiring S lock) and then wants to write it (needs X lock) must upgrade S → X. If another transaction also holds S on the same row, neither can upgrade → deadlock. Fix: always acquire X lock immediately if write is anticipated (SELECT FOR UPDATE).

Modern Usage

Modern OLTP databases use locking for write-write conflicts but MVCC for read-write interactions. The boundaries are: - Regular DML (INSERT/UPDATE/DELETE): MVCC creates new versions; X locks prevent concurrent conflicting writes - DDL (ALTER TABLE, CREATE INDEX): Table-level locks (ACCESS EXCLUSIVE in PostgreSQL) required; CONCURRENTLY variants use weaker locks - Advisory locks: PostgreSQL pg_advisory_lock(key) provides application-level cooperative locking outside the standard lock manager, used for distributed leader election and cross-process mutual exclusion

Distributed locking: In distributed systems (microservices, Kubernetes), database advisory locks are a common implementation of distributed mutual exclusion: SELECT pg_advisory_lock(42); on a shared PostgreSQL instance provides a cluster-wide mutex. This is simpler than Zookeeper/etcd for simple coordination needs.

Future Directions

  • Hardware transactional memory (HTM): Intel TSX and ARM TLE provide hardware-assisted optimistic concurrency: transactions execute speculatively; hardware detects conflicts and rolls back. Database implementations using HTM (e.g., H-Store with TM) achieve near-zero locking overhead for short transactions that don't conflict.
  • RDMA-based lock managers: Disaggregated locking (PolarDB, POLARFS) implements the lock manager on a dedicated DPU or remote memory, allowing lock acquisition without involving the storage server's CPU.
  • Formal verification of lock protocols: Tools like TLA+ and Alloy are used to formally verify deadlock freedom and serializability of lock protocols in new distributed database designs. CockroachDB uses TLA+ specs for some of its concurrency protocols.

Exercises

  1. Reproduce a deadlock in MySQL InnoDB: two concurrent transactions, each inserting into gaps in a table with REPEATABLE READ isolation. Observe the ERROR 1213: Deadlock found and examine SHOW ENGINE INNODB STATUS for the deadlock trace.
  2. Implement the 2PL algorithm in Python: a lock manager with acquire_lock(txn, resource, mode) and release_lock(txn, resource). Implement strict 2PL by preventing lock release until commit/abort. Test with a schedule that would cause a non-repeatable read without locking.
  3. Demonstrate lock compatibility: in PostgreSQL, use SELECT pg_sleep(30) FROM t WHERE id=1 FOR UPDATE to hold an exclusive lock. Then in another session, verify that SELECT ... FOR SHARE blocks while SELECT (without FOR) succeeds (because MVCC provides a snapshot read).
  4. Implement OCC in Python: a key-value store where read(key) records to read_set, write(key, value) records to write_set, and commit() validates that no other committed transaction has modified any key in read_set since the transaction started.
  5. In PostgreSQL, use pg_locks to observe the full lock hierarchy for a transaction doing a table scan for update: identify the AccessShareLock on the table, the RowExclusiveLock, and any tuple-level locks.

References

  • Eswaran, K. P., Gray, J. N., Lorie, R. A., & Traiger, I. L. (1976). The Notions of Consistency and Predicate Locks in a Database System. Communications of the ACM, 19(11), 624–633.
  • Kung, H. T., & Robinson, J. T. (1981). On Optimistic Methods for Concurrency Control. ACM TODS, 6(2), 213–226.
  • Bernstein, P. A., Hadzilacos, V., & Goodman, N. (1987). Concurrency Control and Recovery in Database Systems. Addison-Wesley. (Free online at https://www.microsoft.com/en-us/research/people/philbe/)
  • Gray, J., & Reuter, A. (1992). Transaction Processing: Concepts and Techniques. Morgan Kaufmann. Chapter 7 (Locking).
  • Mohan, C., et al. (1990). ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes. VLDB 1990.
  • InnoDB locking documentation: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html
  • PostgreSQL lock monitoring: https://www.postgresql.org/docs/current/view-pg-locks.html