09 — Indexing Strategies

Technical Overview

An index is a separate data structure that provides an alternative access path to data, trading storage space and write overhead for faster reads. The right index can reduce a query from a full table scan (O(n)) to a single page fetch (O(log n) or O(1)), transforming queries from hours to milliseconds. Conversely, over-indexing creates write overhead that degrades insert/update/delete throughput and bloats storage.

PostgreSQL supports more index types than any other major RDBMS: B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and Bloom. Each serves a distinct purpose. Understanding when to use each, and how they interact with the query planner, is a core DBA and database engineer competency.

Prerequisites

Understanding of B+ trees (see 02-b-trees-and-b-plus-trees.md)
Familiarity with MVCC and tuple visibility
Basic knowledge of PostgreSQL's query planner and cost model
Knowledge of Bloom filters (probabilistic set membership)

Core Content

Index Access Methods in PostgreSQL

PostgreSQL exposes index types through the Access Method API (pg_am catalog table). Each AM implements a set of functions (strategy numbers) for comparisons, and operators that can use the index:

SELECT amname, amtype FROM pg_am;
-- Results: btree, hash, gist, gin, spgist, brin, bloom (if installed)

B-Tree Index

The default and most general index. Supports all comparison operators (=, <, >, <=, >=, BETWEEN, LIKE 'prefix%'). Implemented as a B+ tree. See 02-b-trees-and-b-plus-trees.md for detailed internals.

CREATE INDEX idx_employees_lastname ON employees(last_name);
-- Uses for: = equality, < > range, LIKE 'Smith%' prefix match
-- Does NOT use for: LIKE '%Smith', full-text search, geometric queries

Hash Index

A hash-based index supporting only equality (=) lookups. PostgreSQL's hash index (src/backend/access/hash/) is a disk-resident extensible hashing structure (linear hashing). It does not support range queries.

CREATE INDEX idx_sessions_token ON sessions USING HASH (token);
-- Uses for: exact equality: WHERE token = 'abc123'
-- Does NOT use for: range queries, sorting

InnoDB Adaptive Hash Index (AHI): InnoDB has a different hash index — the Adaptive Hash Index, which is built automatically (not manually) on frequently accessed B-tree pages. It acts as a B-tree lookup cache. The AHI maps (index_id, n_fields, key) → page pointer, allowing some B-tree lookups to skip tree traversal entirely. Controlled by innodb_adaptive_hash_index. Note: AHI is an internal optimization, not a user-created index.

PostgreSQL hash indexes were not WAL-logged before PostgreSQL 10 — this meant they were corrupted on crash recovery. Since PostgreSQL 10, they are fully WAL-logged and crash-safe.

When to use hash over B-tree: Almost never for on-disk databases. Hash index is slightly faster (O(1) vs O(log n)) for equality-only workloads, but the margin is small (sub-millisecond) because B-tree's height is typically 3-4 levels. Hash indexes are larger than B-tree indexes for low-cardinality columns, and do not support multi-column lookups or range scans.

GiST (Generalized Search Tree)

GiST (Hellerstein, Naughton & Pfeffer, 1995) is a framework for building balanced tree indexes on arbitrary data types with arbitrary search predicates. GiST decouples the tree structure from the key comparison logic, allowing custom index types without modifying the core.

GiST requires implementing 8 methods: - consistent: Does a subtree possibly contain matches for this query? - union: Compute the bounding predicate for a set of entries - compress/decompress: Convert key to index-internal form - penalty: Cost of inserting a new key into a subtree - picksplit: How to split an overfull node - same: Are two keys equal? - distance (optional): For nearest-neighbor searches

PostgreSQL ships GiST opclasses for: - Geometric types (point, box, polygon, circle): point_ops, using R-tree-like bounding boxes - Range types (tsrange, daterange, int4range): range containment and overlap queries - Full-text search (tsvector): text search with GiST index (less efficient than GIN for text search but supports nearest-neighbor) - Cube extension: multi-dimensional point searches - PostGIS: Spatial indexes using GiST

-- Geometric containment query using GiST:
CREATE INDEX idx_locations_pos ON locations USING GIST (position);
SELECT * FROM locations WHERE position <@ circle(point(10, 20), 5);  -- within circle
-- GiST supports: @> (contains), <@ (contained by), && (overlaps), |>> (strictly above), etc.

GIN (Generalized Inverted Index)

GIN (Teodor Sigaev & Oleg Bartunov, PostgreSQL 8.2) is an inverted index — it maps each "element" to the set of rows containing it. Optimized for composite values (arrays, JSONB, full-text tsvector) where each row may contain multiple elements.

GIN structure:

GIN Index for array column:
  "apple"  -> {row 1, row 5, row 22}
  "banana" -> {row 3, row 5, row 17}
  "cherry" -> {row 1, row 17}

Query: WHERE fruits @> ARRAY['apple', 'banana']
  -> Lookup "apple": {1,5,22}
  -> Lookup "banana": {3,5,17}
  -> Intersect: {5}  <- rows containing both apple AND banana

-- GIN for full-text search:
CREATE INDEX idx_articles_search ON articles USING GIN (to_tsvector('english', body));
SELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('database & index');

-- GIN for JSONB:
CREATE INDEX idx_events_data ON events USING GIN (data jsonb_path_ops);
SELECT * FROM events WHERE data @> '{"type": "click", "button": "submit"}';

-- GIN for arrays:
CREATE INDEX idx_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'index'];

GIN uses a "pending list" for fast inserts: new entries go to a pending list and are merged into the main GIN structure later (via gin_pending_list_limit or explicit VACUUM). This amortizes the cost of updating the inverted index.

GIN vs GiST for full-text search: - GIN: faster queries (direct lookup in inverted index), slower builds, more storage - GiST: faster builds, slower queries (signature approximation may have false positives requiring heap check)

BRIN (Block Range Index)

BRIN (Block Range Index, PostgreSQL 9.5) stores summary information about ranges of consecutive pages (blocks). For each range of pages (default 128 pages), BRIN stores the minimum and maximum value of the indexed column.

BRIN for column 'created_at' (128 pages per range):
  Range 0 (pages 0-127):   min=2020-01-01, max=2020-06-30
  Range 1 (pages 128-255): min=2020-07-01, max=2020-12-31
  Range 2 (pages 256-383): min=2021-01-01, max=2021-06-30

Query: WHERE created_at BETWEEN '2021-01-01' AND '2021-03-31'
  -> Range 0: max=2020-06-30 < 2021-01-01: SKIP
  -> Range 1: max=2020-12-31 < 2021-01-01: SKIP
  -> Range 2: min=2021-01-01, max=2021-06-30 overlaps range: SCAN pages 256-383

BRIN is extremely small (just min/max per range) and extremely fast to build, but only useful when the physical order of rows correlates with the column value. Ideal for: - Append-only tables (log data, time-series) with timestamp as the BRIN column - Bulk-loaded tables sorted by date

CREATE INDEX idx_logs_ts ON application_logs USING BRIN (logged_at) WITH (pages_per_range=64);
-- Tiny index, very fast to build, excellent for time-range queries on append-only tables

BRIN is not useful for randomly ordered data (e.g., a UUIDs column in a randomly-inserted table will have min/max spanning the entire range for every block, providing no selectivity).

Partial Indexes

A partial index indexes only rows satisfying a predicate. Dramatically reduces index size and maintenance cost when only a fraction of rows are queried.

-- Index only active orders (90% of orders may be 'completed', only 10% 'active')
CREATE INDEX idx_orders_active ON orders(created_at) WHERE status = 'active';

-- Query that uses this index:
SELECT * FROM orders WHERE status = 'active' AND created_at > NOW() - INTERVAL '1 day';

-- The planner uses the partial index because:
-- 1. The WHERE clause implies status='active'
-- 2. The index is much smaller than a full index on created_at

Partial indexes are one of the most underused optimization techniques. They reduce: - Index build time - Index maintenance overhead on INSERT/UPDATE/DELETE - Index storage size - Buffer pool pressure (smaller index fits in cache)

Expression Indexes (Functional Indexes)

An expression index indexes the result of a function applied to a column. Enables index lookups on computed values.

-- Case-insensitive search:
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');
-- Without this index, the query cannot use an index on 'email' column

-- Date truncation (group by month):
CREATE INDEX idx_orders_month ON orders (date_trunc('month', order_date));
SELECT date_trunc('month', order_date), count(*) FROM orders
WHERE date_trunc('month', order_date) = '2024-01-01';

The function in an expression index must be IMMUTABLE (deterministic, no side effects) to guarantee that the indexed value is stable.

Multi-Column Indexes and the Leading Column Rule

A multi-column index can be used for queries that filter on the leading columns (leftmost columns in the index definition).

CREATE INDEX idx_employees_dept_salary ON employees(department, salary);

-- Can use the index:
SELECT * FROM employees WHERE department = 'Engineering';         -- leading column
SELECT * FROM employees WHERE department = 'Engineering' AND salary > 80000;  -- both columns

-- Cannot efficiently use the index:
SELECT * FROM employees WHERE salary > 80000;  -- salary is NOT the leading column
-- (PostgreSQL may still use a bitmap scan on this, but it's less efficient)

Column ordering in a multi-column index matters significantly: - Put the most selective column first if the query uses equality on that column - Put the equality column before the range column: (department, salary) is better than (salary, department) for WHERE department='Eng' AND salary > 80000

Index-Only Scans and Covering Indexes

An index-only scan (PostgreSQL) or covering index satisfies a query entirely from the index without accessing the heap (main table). This eliminates the second B+ tree lookup for secondary indexes.

-- Index that covers the query:
CREATE INDEX idx_employees_dept_covering ON employees(department) INCLUDE (name, salary);

-- This query does NOT need to access the heap:
SELECT name, salary FROM employees WHERE department = 'Engineering';
-- All projected columns (name, salary) are in the index leaf nodes

The INCLUDE clause (PostgreSQL 11+, SQL:2011 standard) adds non-key columns to the leaf nodes. These columns are not part of the sort key and cannot be used in predicates, but they are returned during index-only scans.

For PostgreSQL index-only scans to actually skip heap access, the visibility map must indicate the page is all-visible (set by VACUUM). If the visibility map bit is not set for a page, PostgreSQL falls back to the heap even with an index-only scan.

Bloom Filter Indexes (PostgreSQL Extension)

The bloom extension in PostgreSQL provides a bloom filter index that maps rows to compact bit signatures. Useful for multi-column equality queries where a B-tree composite index would be too large.

CREATE EXTENSION bloom;
CREATE INDEX idx_bloom_multi ON t USING BLOOM (col1, col2, col3, col4)
  WITH (length=80, col1=4, col2=4, col3=4, col4=4);
-- Each row maps to an 80-bit signature; each column contributes 4 bits
-- False positive rate: ~(1 - e^(-k*n/m))^k

The bloom index has false positives (may return rows that don't match), requiring a heap recheck. It excels for queries like WHERE col1 = v1 AND col2 = v2 AND col3 = v3 AND col4 = v4 across many columns where the cardinality is high.

Index Maintenance Overhead

Every index must be maintained on INSERT, UPDATE, and DELETE:

INSERT into table:
  → INSERT into every index for that table
  Cost: O(num_indexes * log(index_size))

UPDATE of indexed column:
  → DELETE old index entry + INSERT new index entry
  Cost: O(2 * num_indexes_on_changed_columns * log(index_size))

DELETE:
  → Logical deletion (mark index entry as dead)
  → Physical removal on VACUUM
  Cost: O(num_indexes * log(index_size)) for logical deletion

A table with 10 indexes has write throughput ~5-10x lower than one with 0 indexes. Index maintenance is the primary reason not to create indexes speculatively.

Index Bloat and REINDEX

Over time, B-tree indexes can develop bloat: pages with many dead index entries (from DELETEs and UPDATEs) that VACUUM marks as "empty" but does not reclaim from the OS. The index structure remains at its peak size.

Diagnosis:

-- Using pgstattuple extension:
SELECT avg_leaf_density, leaf_fragmentation FROM pgstattuple('my_index');
-- leaf_fragmentation > 30% indicates significant bloat

Remediation:

-- Option 1: REINDEX (locks table, rebuilds index from scratch)
REINDEX INDEX CONCURRENTLY my_index;  -- PostgreSQL 12+: CONCURRENTLY avoids table lock

-- Option 2: VACUUM (reclaims dead entries but doesn't compact)
VACUUM ANALYZE my_table;

-- Option 3: pg_repack extension (rebuilds without exclusive lock)
pg_repack --table my_table my_database

InnoDB index defragmentation: OPTIMIZE TABLE t; rebuilds the table and all indexes, defragmenting them. Equivalent to ALTER TABLE t ENGINE=InnoDB;.

Historical Context

B-tree indexes have been the dominant access method since the 1970s. Hash indexes are as old as DBMS — IBM System R used a hash-based join and storage mechanism.

GiST was invented at UC Berkeley by Joseph Hellerstein, Jeff Naughton, and Ariel Pfeffer (1995) to provide a generalized framework for R-trees, B+-trees, and other balanced tree variants. It was the first index AM that allowed user-defined types to plug into the tree without modifying the DBMS source.

GIN was developed for PostgreSQL by Teodor Sigaev and Oleg Bartunov (introduced in PostgreSQL 8.2, 2006), initially to support full-text search. It was later extended to JSONB and array operations.

BRIN was contributed by Álvaro Herrera (2014, PostgreSQL 9.5) as an extremely compact index for large, sorted data. It was inspired by Oracle's "table statistics" and IBM DB2's "range indexes."

Production Examples

GiST for PostGIS: PostGIS (ST_Contains, ST_Intersects, ST_DWithin) relies on GiST indexes for spatial queries. A GiST index on a geometry column stores R-tree bounding boxes; the spatial query first prunes candidates by bounding box, then exact geometry comparison is done at the heap.

GIN for JSONB: Systems storing semi-structured data in JSONB (user preferences, event metadata, product attributes) use GIN with jsonb_path_ops or jsonb_ops for containment queries. Elasticsearch-style document search within PostgreSQL.

BRIN for IoT time-series: Tables storing sensor readings with millions of rows per day, inserted in timestamp order, use BRIN on the timestamp column. BRIN index may be 100x smaller than a B-tree index with near-equivalent selectivity for date-range queries.

InnoDB AHI: On a production MySQL server with a hot B-tree index, SHOW ENGINE INNODB STATUS reports Hash table size 276671, node heap has N buffer(s), N.NN hash searches/s, N.NN non-hash searches/s. The ratio indicates AHI effectiveness.

Debugging Notes

Why isn't the index being used?: Check with EXPLAIN (ANALYZE, BUFFERS). Look for: table too small (sequential scan is cheaper), statistics are stale (ANALYZE), non-indexed predicate expression (wrong column or wrapped in function), or random_page_cost too high.
Index size: SELECT pg_size_pretty(pg_relation_size('my_index'));
Index usage stats: SELECT * FROM pg_stat_user_indexes WHERE relname='my_table'; — idx_scan=0 after many queries means the index is never used.
BRIN validity: After a large batch insert into the middle of a BRIN-indexed table (violating sort order), run SELECT brin_summarize_new_values('my_brin_index') to update the range summaries.
GIN pending list: SELECT * FROM pg_stat_user_indexes WHERE indexrelname LIKE '%gin%' — if GIN has a large pending list, it may slow queries. Run VACUUM ANALYZE my_table or SELECT gin_clean_pending_list('my_gin_index').

Security Implications

Index side-channel timing: The latency difference between an index hit and a miss leaks information about data existence. For security-sensitive columns (e.g., a secret_token table), index hits reveal whether a token exists even if the row is inaccessible.
Partial index predicate exposure: The predicate of a partial index is visible in pg_indexes. If the predicate contains sensitive information (e.g., WHERE is_admin = TRUE), it reveals business logic to anyone with access to the catalog.
Expression index immutability: A non-immutable function in an expression index can be called with elevated privileges during index scans. Ensure only truly immutable functions are used in expression indexes.

Performance Implications

Index selection tradeoff: Each additional index costs ~10-30% of table write throughput (rough estimate). For write-heavy tables (>10K INSERT/s), minimize indexes to 1-3.
Multi-column index vs multiple single-column indexes: A single multi-column index on (dept, salary) is better than separate indexes on dept and salary for queries filtering both. The planner can also use bitmap scan intersection for separate indexes, but multi-column is usually faster.
Covering index for hot queries: Identify the top-5 queries from pg_stat_statements by total execution time. For each, check if an index-only scan is possible (add INCLUDE columns). This can reduce query time 2-5x for read-heavy workloads.
GIN vs B-tree for arrays: For containment queries on small arrays (<5 elements), a GIN index may be slower than a B-tree on each element (if elements are normalized). GIN shines for documents with 10+ elements.

Failure Modes

Index bloat from rapid churn: A table with high UPDATE rates on indexed columns (e.g., a status column updated frequently) accumulates bloat rapidly. Schedule REINDEX CONCURRENTLY during off-peak hours.
Wrong index chosen: The planner chooses a worse index due to stale statistics. Run ANALYZE my_table after large data changes. Use SET enable_indexscan=off to diagnose if an index scan is the problem.
GIN pending list overflow: If GIN's gin_pending_list_limit is exceeded without a VACUUM, GIN queries scan the entire pending list for each query. This causes O(pending_list_size) overhead per query.
BRIN misuse on unsorted data: A BRIN index on a column with no physical correlation (e.g., an integer column where values were inserted randomly) has min=1 and max=10M for every block range, providing zero selectivity. The planner may still try to use it and perform worse than a sequential scan.

Modern Usage

Index design has evolved to support JSONB (schemaless), arrays, full-text, and geospatial workloads in a single database. Modern PostgreSQL indexes handle use cases that previously required separate Elasticsearch, MongoDB, and PostGIS deployments.

pgvector: The ivfflat and hnsw index types from the pgvector extension add approximate nearest-neighbor (ANN) search for high-dimensional vector embeddings (ML model outputs) to PostgreSQL. hnsw uses a Hierarchical Navigable Small World graph for sub-millisecond nearest-neighbor queries on millions of 1536-dimensional vectors.

Future Directions

Learned indexes: ALEX (Ding et al., 2020) replaces B-tree index pages with linear models learned from key distribution, achieving better cache efficiency for static data.
Adaptive indexing: "Database cracking" (Idreos et al., 2007) incrementally builds indexes during query execution, with no explicit CREATE INDEX command. The index reflects actual query patterns.
ML-guided index recommendation: DB2 Index Advisor, Azure SQL Index Advisor, and pganalyze use workload analysis and ML to recommend index changes.

Exercises

Build a table with 1M rows containing tags (array column). Compare query performance for tags @> ARRAY['foo','bar'] with: no index, a GiST index, and a GIN index. Measure both query latency and index build time.
Create a partial index on a status column where only 5% of rows have status='pending'. Measure the index size vs a full B-tree index on status. Benchmark a query WHERE status='pending' against both.
Demonstrate index-only scan: create a table, add a covering index with INCLUDE, run a query that would normally require heap access, and use EXPLAIN ANALYZE to verify the heap is not accessed (no Heap Fetches).
Demonstrate BRIN effectiveness: create an append-only table with 10M rows inserted in timestamp order. Create a BRIN index and a B-tree index on the timestamp. Compare sizes and query latency for a 1-day range query.
Measure index maintenance overhead: a table with 1M rows, run 100K random UPDATEs. Compare throughput with 0, 3, and 8 indexes on the table.

References

Hellerstein, J. M., Naughton, J. F., & Pfeffer, A. (1995). Generalized Search Trees for Database Systems. VLDB 1995.
Ramakrishnan, R., & Gehrke, J. (2003). Database Management Systems, 3rd ed. McGraw-Hill. (Chapter on indexing)
Stonebraker, M., & Rowe, L. A. (1986). The Design of Postgres. SIGMOD 1986.
Idreos, S., Kersten, M. L., & Manegold, S. (2007). Database Cracking. CIDR 2007.
PostgreSQL Index Types documentation: https://www.postgresql.org/docs/current/indexes-types.html
GIN documentation: https://www.postgresql.org/docs/current/gin.html
BRIN documentation: https://www.postgresql.org/docs/current/brin.html