1. Query Execution Pipeline
You Run a Query
You open your terminal, connect to PostgreSQL, and run:
Behind the scenes, PostgreSQL will take this SQL string through a pipeline of stages before a single row is returned. Each stage has a specific job, and understanding them helps you write faster queries and debug slow ones.
The first thing that happens is your client (psql, your app, pgAdmin) sends the query text over a TCP connection to the PostgreSQL backend process assigned to your session.
Stage 1: Parser
The Parser is the first stage. It takes your raw SQL string and checks whether it is grammatically correct. This is purely about syntax, not meaning.
It does two things:
Breaks the SQL string into tokens: SELECT, name, ,, email, FROM, etc.
Arranges the tokens into a parse tree (abstract syntax tree). If your SQL has a syntax error like a missing comma or misspelled keyword, it fails here with a syntax error.
At this point, PostgreSQL does not know whether the table users even exists. It only knows the query is structurally valid SQL.
-- This fails at the parser (syntax error):
SELCT name FROM users;
ERROR: syntax error at or near "SELCT"
Stage 2: Analyzer (Semantic Analysis)
The Analyzer takes the parse tree and gives it meaning. This is where PostgreSQL checks whether the things you referenced actually exist.
It queries the system catalog (PostgreSQL's internal metadata tables) to verify:
The output is a query tree: the parse tree enriched with type information, resolved table OIDs, and column references.
-- This passes the parser but fails at the analyzer:
SELECT name FROM nonexistent_table;
ERROR: relation "nonexistent_table" does not exist
Stage 3: Planner / Optimizer
The Planner is where the real intelligence lives. It takes the query tree and figures out the most efficient way to execute it.
For our query, the planner considers multiple strategies:
Read every row in the table and filter age > 25. Simple but slow on large tables.
If an index exists on age, use it to jump directly to matching rows. Fast for selective queries.
Build a bitmap of matching pages from the index, then fetch those pages. Good when many rows match but not all.
The planner uses table statistics (row count, value distribution, null fraction) collected by ANALYZE to estimate the cost of each plan. It picks the one with the lowest estimated total cost.
For the ORDER BY name, the planner also decides: sort in memory, use an index that's already ordered, or spill to disk if the data is too large.
The output is an execution plan, which is a tree of operations (nodes) to execute. This is what EXPLAIN shows you.
Stage 4: Executor
The Executor walks the plan tree and executes each node. PostgreSQL uses a pull-based model: the top node asks its child for a row, which asks its child, and so on down the tree. Rows flow upward one at a time.
-- EXPLAIN output:
Sort (cost=1.12..1.13 rows=3 width=36)
Sort Key: name
-> Seq Scan on users (cost=0.00..1.10 rows=3)
Filter: (age > 25)
The executor processes this bottom-up:
But where does the executor actually read the data from? That depends on whether the pages are already in memory.
Stage 5: Shared Buffers and Disk
PostgreSQL does not read directly from disk for every query. It maintains a large memory area called shared buffers (a page cache). When the executor needs a data page, it checks shared buffers first.
The page is already in shared buffers from a previous query. Read directly from memory. This is measured in microseconds.
The page is not in memory. PostgreSQL reads it from disk (or the OS page cache), loads it into shared buffers, then reads it. This involves I/O and is orders of magnitude slower.
This is why EXPLAIN (ANALYZE, BUFFERS) is so valuable. It shows you exactly how many pages were hits vs reads:
Seq Scan on users (actual time=0.01..0.03 rows=3)
Filter: (age > 25)
Buffers: shared hit=1
-- "shared hit=1" means 1 page was served from memory
-- "shared read=1" would mean 1 page was read from disk
The shared_buffers setting controls the size of this cache. The typical recommendation is 25% of system RAM. PostgreSQL also benefits from the OS page cache, which acts as a second layer.
Results Return to the Client
The executor sends the result rows back to the client through the connection. For large result sets, PostgreSQL streams rows progressively rather than buffering everything in memory.
Let's recap the full pipeline:
Understanding this pipeline tells you where to look when a query is slow. EXPLAIN ANALYZE maps directly to these stages: the planner chose the plan, the executor ran it, and the buffer stats tell you if I/O was the bottleneck.
Most of the time, your queries are slow because of the planner's choices (missing indexes, stale statistics) or I/O (cold cache, large table scans). Rarely is it the parser or analyzer.