Database

1. Query Execution Pipeline

Mar 14, 2026·8 min read

You Run a Query

You open your terminal, connect to PostgreSQL, and run:

SELECT name, email FROM users WHERE age > 25 ORDER BY name;

Behind the scenes, PostgreSQL will take this SQL string through a pipeline of stages before a single row is returned. Each stage has a specific job, and understanding them helps you write faster queries and debug slow ones.

The first thing that happens is your client (psql, your app, pgAdmin) sends the query text over a TCP connection to the PostgreSQL backend process assigned to your session.

Stage 1: Parser

The Parser is the first stage. It takes your raw SQL string and checks whether it is grammatically correct. This is purely about syntax, not meaning.

It does two things:

1. Lexical analysis

Breaks the SQL string into tokens: SELECT, name, ,, email, FROM, etc.

2. Syntax analysis

Arranges the tokens into a parse tree (abstract syntax tree). If your SQL has a syntax error like a missing comma or misspelled keyword, it fails here with a syntax error.

At this point, PostgreSQL does not know whether the table users even exists. It only knows the query is structurally valid SQL.

-- This fails at the parser (syntax error):

SELCT name FROM users;

ERROR: syntax error at or near "SELCT"

Stage 2: Analyzer (Semantic Analysis)

The Analyzer takes the parse tree and gives it meaning. This is where PostgreSQL checks whether the things you referenced actually exist.

It queries the system catalog (PostgreSQL's internal metadata tables) to verify:

-Table exists? Does the table "users" exist in this schema?

-Columns exist? Do "name", "email", and "age" exist in that table?

-Types match? Is "age > 25" a valid comparison? (age must be numeric, not text)

-Permissions? Does the current user have SELECT permission on this table?

The output is a query tree: the parse tree enriched with type information, resolved table OIDs, and column references.

-- This passes the parser but fails at the analyzer:

SELECT name FROM nonexistent_table;

ERROR: relation "nonexistent_table" does not exist

Stage 3: Planner / Optimizer

The Planner is where the real intelligence lives. It takes the query tree and figures out the most efficient way to execute it.

For our query, the planner considers multiple strategies:

Sequential Scan

Read every row in the table and filter age > 25. Simple but slow on large tables.

Index Scan

If an index exists on age, use it to jump directly to matching rows. Fast for selective queries.

Bitmap Index Scan

Build a bitmap of matching pages from the index, then fetch those pages. Good when many rows match but not all.

The planner uses table statistics (row count, value distribution, null fraction) collected by ANALYZE to estimate the cost of each plan. It picks the one with the lowest estimated total cost.

For the ORDER BY name, the planner also decides: sort in memory, use an index that's already ordered, or spill to disk if the data is too large.

The output is an execution plan, which is a tree of operations (nodes) to execute. This is what EXPLAIN shows you.

Stage 4: Executor

The Executor walks the plan tree and executes each node. PostgreSQL uses a pull-based model: the top node asks its child for a row, which asks its child, and so on down the tree. Rows flow upward one at a time.

-- EXPLAIN output:

Sort (cost=1.12..1.13 rows=3 width=36)

Sort Key: name

-> Seq Scan on users (cost=0.00..1.10 rows=3)

Filter: (age > 25)

The executor processes this bottom-up:

1.Seq Scan - Reads rows from the users table, applies the age > 25 filter, and passes matching rows up

2.Sort - Collects all filtered rows, sorts them by name, and returns them in order

But where does the executor actually read the data from? That depends on whether the pages are already in memory.

Stage 5: Shared Buffers and Disk

PostgreSQL does not read directly from disk for every query. It maintains a large memory area called shared buffers (a page cache). When the executor needs a data page, it checks shared buffers first.

Buffer hit (fast path)

The page is already in shared buffers from a previous query. Read directly from memory. This is measured in microseconds.

Buffer miss (slow path)

The page is not in memory. PostgreSQL reads it from disk (or the OS page cache), loads it into shared buffers, then reads it. This involves I/O and is orders of magnitude slower.

This is why EXPLAIN (ANALYZE, BUFFERS) is so valuable. It shows you exactly how many pages were hits vs reads:

Seq Scan on users (actual time=0.01..0.03 rows=3)

Filter: (age > 25)

Buffers: shared hit=1

-- "shared hit=1" means 1 page was served from memory

-- "shared read=1" would mean 1 page was read from disk

The shared_buffers setting controls the size of this cache. The typical recommendation is 25% of system RAM. PostgreSQL also benefits from the OS page cache, which acts as a second layer.

Results Return to the Client

The executor sends the result rows back to the client through the connection. For large result sets, PostgreSQL streams rows progressively rather than buffering everything in memory.

Let's recap the full pipeline:

1.Parser - SQL text to parse tree. Catches syntax errors.

2.Analyzer - Parse tree to query tree. Resolves tables, columns, types.

3.Planner - Query tree to execution plan. Picks the cheapest strategy using statistics.

4.Executor - Walks the plan tree. Pulls rows through each operation.

5.Storage - Shared buffers (memory) first, disk only on cache miss.

Understanding this pipeline tells you where to look when a query is slow. EXPLAIN ANALYZE maps directly to these stages: the planner chose the plan, the executor ran it, and the buffer stats tell you if I/O was the bottleneck.

Most of the time, your queries are slow because of the planner's choices (missing indexes, stale statistics) or I/O (cold cache, large table scans). Rarely is it the parser or analyzer.