2. EXPLAIN ANALYZE
Your query is slow. You know it is slow. But where is it slow? Is it reading too many rows? Is it sorting in memory or spilling to disk? Is it using the index you created, or ignoring it entirely?
EXPLAIN ANALYZE answers all of these questions. It runs your query, instruments every step of the execution plan, and tells you exactly what happened. This is the single most important tool for PostgreSQL performance work.
EXPLAIN vs EXPLAIN ANALYZE
These two commands look similar but do very different things.
EXPLAIN (without ANALYZE)
Shows the planner's estimated plan without actually running the query. Safe to run on production. The costs, row counts, and widths are all guesses based on table statistics.
EXPLAIN ANALYZE
Actually executes the query and reports real timing, real row counts, and real I/O stats. This is what you need for debugging. Be careful with INSERT/UPDATE/DELETE queries because they will modify data. Wrap them in a transaction and roll back:
-- Safe way to EXPLAIN ANALYZE a write query:
BEGIN;
EXPLAIN ANALYZE DELETE FROM logs WHERE created_at < '2024-01-01';
ROLLBACK;Reading the Output
Let's start with a simple table and query:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMP DEFAULT now()
);
-- 100k rows of test data
INSERT INTO orders (customer, amount, status, created_at)
SELECT
'customer_' || (random() * 1000)::int,
(random() * 500)::numeric(10,2),
(ARRAY['pending','shipped','delivered'])[floor(random()*3+1)],
now() - (random() * interval '365 days')
FROM generate_series(1, 100000);
ANALYZE orders;Now let's run our first EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';Seq Scan on orders (cost=0.00..2137.00 rows=33521 width=52)
(actual time=0.013..12.41 rows=33289 loops=1)
Filter: (status = 'pending')
Rows Removed by Filter: 66711
Planning Time: 0.085 ms
Execution Time: 14.12 msEvery line has a lot packed into it. Let's break each part down.
Cost: The Planner's Estimate
cost=0.00..2137.00Cost units are based on seq_page_cost (default 1.0) and cpu_tuple_cost (default 0.01). You don't usually need to think about the absolute numbers. What matters is comparing costs between different plans for the same query.
Estimated vs Actual Rows
rows=33521 <-- planner's estimate
rows=33289 <-- what actually happenedThe planner estimated 33,521 rows but the actual result was 33,289. That's close enough. The planner got this right because we ran ANALYZE, which collected up-to-date statistics.
When estimates are way off:
Bad row estimates are the #1 cause of bad plans. If the planner thinks a filter returns 10 rows but it actually returns 100,000, it might choose a Nested Loop when a Hash Join would be orders of magnitude faster.
Actual Time
actual time=0.013..12.41These are real wall-clock times in milliseconds. Unlike cost, these are absolute. If a node says 500ms, it took 500ms.
Important: in a tree with multiple nodes, the time of a parent node includes the time of its children. To find how long a specific node took by itself, subtract its children's time.
Rows Removed by Filter
Filter: (status = 'pending')
Rows Removed by Filter: 66711This is a red flag detector. The Seq Scan read all 100,000 rows but only 33,289 passed the filter. That means 66,711 rows were read from disk (or cache) and then thrown away.
For this query, that might be acceptable. But if you are filtering down to 10 rows out of 10 million, a Seq Scan that reads all 10 million rows is extremely wasteful. That is when you need an index.
-- Add an index to avoid the full table scan
CREATE INDEX idx_orders_status ON orders(status);
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';Bitmap Heap Scan on orders (cost=375.40..1678.94 rows=33521 width=52)
(actual time=2.18..8.92 rows=33289 loops=1)
Recheck Cond: (status = 'pending')
Heap Blocks: exact=1137
-> Bitmap Index Scan on idx_orders_status
(cost=0.00..367.02 rows=33521 width=0)
(actual time=1.92..1.92 rows=33289 loops=1)
Index Cond: (status = 'pending')
Planning Time: 0.153 ms
Execution Time: 10.45 msNow PostgreSQL uses a Bitmap Index Scan. It reads the index to find which pages contain matching rows, then fetches only those pages from the heap. No more "Rows Removed by Filter" because every row it reads matches.
Notice the planner chose a Bitmap scan instead of a plain Index Scan. That is because ~33% of rows match. For such a large fraction, reading pages sequentially via bitmap is faster than jumping around the heap via an Index Scan.
Scan Types Explained
Seq Scan
Reads every page of the table from start to finish. Filters are applied after reading. Cheapest per-page cost because it reads sequentially, but expensive overall when you only need a few rows.
Index Scan
Walks the index (B-tree) to find matching row pointers, then fetches each row from the heap. Best for highly selective queries (returning a small fraction of rows). Each heap fetch is a random I/O, so this gets expensive if many rows match.
Index Only Scan
The best case. All requested columns are in the index, so PostgreSQL never touches the heap at all. Only works when the visibility map confirms the pages are all-visible (no recent updates).
Bitmap Index Scan + Bitmap Heap Scan
Two-phase approach. First, scan the index and build a bitmap of matching page locations. Then, fetch those pages in physical order (sequential I/O). Best for medium selectivity where Index Scan would cause too much random I/O.
-- Rough selectivity guide for which scan type the planner picks:
--
-- < 5% of rows match --> Index Scan (few random reads)
-- 5-20% of rows match --> Bitmap Scan (batch random reads)
-- > 20% of rows match --> Seq Scan (just read everything)
--
-- These thresholds depend on table size, page layout, and cost settings.BUFFERS: Where the Time Really Goes
Add BUFFERS to see I/O details. This is where you find out if your query is fast because everything is cached, or slow because it is hitting disk.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';Bitmap Heap Scan on orders (cost=375.40..1678.94 rows=33521 width=52)
(actual time=2.31..9.14 rows=33289 loops=1)
Recheck Cond: (status = 'pending')
Heap Blocks: exact=1137
Buffers: shared hit=1228
-> Bitmap Index Scan on idx_orders_status
(cost=0.00..367.02 rows=33521 width=0)
(actual time=1.95..1.95 rows=33289 loops=1)
Index Cond: (status = 'pending')
Buffers: shared hit=91
Planning Time: 0.12 ms
Execution Time: 10.72 msReading buffer stats:
If you see temp read or temp written, your query is spilling sorts or hash tables to disk. Increase work_mem for that session:
SET work_mem = '256MB'; -- per-operation, not global
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer, sum(amount)
FROM orders
GROUP BY customer
ORDER BY sum(amount) DESC;Joins: Nested Loop, Hash, Merge
When your query joins tables, the plan becomes a tree with join nodes. The planner chooses between three join strategies:
-- Setup: add a customers table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL
);
INSERT INTO customers (name, tier)
SELECT 'customer_' || i, (ARRAY['free','pro','enterprise'])[floor(random()*3+1)]
FROM generate_series(1, 1000) AS i;
ANALYZE customers;
EXPLAIN ANALYZE
SELECT c.name, o.amount, o.status
FROM orders o
JOIN customers c ON c.name = o.customer
WHERE c.tier = 'enterprise';Hash Join (cost=20.50..2670.50 rows=11111 width=44)
(actual time=0.34..25.71 rows=11063 loops=1)
Hash Cond: (o.customer = c.name)
-> Seq Scan on orders o (cost=0.00..2137.00 rows=100000 width=36)
(actual time=0.006..7.82 rows=100000 loops=1)
-> Hash (cost=16.50..16.50 rows=320 width=16)
(actual time=0.31..0.31 rows=331 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on customers c (cost=0.00..16.50 rows=320 width=16)
(actual time=0.005..0.15 rows=331 loops=1)
Filter: (tier = 'enterprise')
Rows Removed by Filter: 669
Planning Time: 0.25 ms
Execution Time: 26.38 msHash Join
Builds a hash table from the smaller table (customers filtered to enterprise tier = 331 rows, using only 24kB). Then scans the larger table (orders) and probes the hash table for each row. Great when one side is small enough to fit in memory.
Nested Loop
For each row in the outer table, look up matching rows in the inner table (usually via an index). Best when the outer table returns very few rows and the inner table has an index. Worst case is O(n*m) without an index.
Merge Join
Both inputs must be sorted on the join key. Then it walks through both sorted lists simultaneously, merging matches. Efficient when both sides are large and already sorted (or have an index that provides order).
Watch the Batches field on Hash nodes. If Batches: 1, the hash table fit in memory. If batches > 1, it spilled to disk and you should increase work_mem.
Planning Time vs Execution Time
Planning Time: 0.25 ms
Execution Time: 26.38 msNote: Execution Time includes time sending rows to the client. If you are running EXPLAIN ANALYZE from a remote connection on a query that returns millions of rows, network transfer time is included. Use LIMIT or pipe to /dev/null to isolate actual computation time.
Patterns to Watch For
1. Seq Scan on a large table with heavy filtering
Rows Removed by Filter: 9999000 out of 10M rows means you need an index. The database is reading the entire table to find a handful of rows.
Fix: Create an index on the filtered column.
2. Estimated vs actual rows diverge wildly
rows=100 (estimated) vs rows=500000 (actual). The planner made a bad choice because it thought the result was small.
Fix: Run ANALYZE on the table. Consider increasing default_statistics_target for columns with skewed distributions.
3. Sort with temp files
Sort Method: external merge Disk: 45MB means the sort did not fit in work_mem and spilled to disk.
Fix: Increase work_mem for the session, or add an index that provides the sort order so no sorting is needed.
4. Nested Loop with high loops count
loops=50000 on an inner Index Scan means the nested loop is running the inner scan 50,000 times. Multiply actual time by loops to get the true cost.
Fix: If loops is high, a Hash Join is usually faster. Check if the planner underestimated the outer row count.
5. Index exists but not used
You created an index but the planner still picks a Seq Scan. This is often correct. If the query returns a large fraction of the table, a Seq Scan is genuinely faster than bouncing around the index.
Check: Is the filter selective enough? Is the index on the right column? Is the data type matching (no implicit casts)?
The Full EXPLAIN Toolkit
EXPLAIN accepts several options that can be combined:
-- The full diagnostic toolkit:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ...;
-- Machine-readable output:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT ...;
-- Verbose: show output columns, schema-qualified table names
EXPLAIN (ANALYZE, VERBOSE)
SELECT ...;
-- WAL: show WAL (write-ahead log) usage for writes
EXPLAIN (ANALYZE, BUFFERS, WAL)
INSERT INTO ...;
-- Settings: show non-default planner settings that affect the plan
EXPLAIN (ANALYZE, SETTINGS)
SELECT ...;The combination I use most often is EXPLAIN (ANALYZE, BUFFERS). It gives you timing, row counts, and I/O stats, which covers 95% of debugging needs.
Quick debugging checklist: