4. Processes & Memory
Before you can tune PostgreSQL, you need to understand what is actually running when you start it. PostgreSQL is not a single program. It is a collection of cooperating processes that share memory. Understanding these processes and how memory is divided between them is the foundation for every performance decision you will make.
Process Architecture
PostgreSQL is built around three types of processes. Understanding what each one does tells you where CPU time and memory are going.
Postmaster
The parent process that starts when you launch PostgreSQL. It listens on the configured port (default 5432), manages server initialization, spawns all other processes, and monitors their health. When a client connects, the postmaster authenticates the connection and forks a new dedicated backend process to handle it.
Backend Processes
The workhorses. Each client connection gets its own dedicated backend process that executes queries, manages transactions, and accesses shared memory and disk. This process-per-connection model provides strong isolation between clients and simplifies crash recovery. If one backend crashes, it does not bring down other connections. The tradeoff is higher memory overhead compared to thread-based architectures.
Background Workers
Processes that run continuously to maintain database health. They handle checkpointing, WAL flushing, buffer management, autovacuum, statistics collection, and WAL archiving. The postmaster spawns and monitors them. If any background worker crashes, the postmaster restarts it automatically.
PostgreSQL process architecture: postmaster forks backend and background processes
All processes, both backends and background workers, access the same shared memory region. This is how data is passed between them without expensive IPC. When a backend modifies a page in shared buffers, the checkpointer can later write that dirty page to disk without any explicit communication.
-- View all PostgreSQL processes and their roles
SELECT
pid,
usename,
backend_type,
state,
LEFT(query, 60) AS query_preview
FROM pg_stat_activity
ORDER BY backend_type, pid;
-- On Linux/macOS, you can also see the process tree:
-- ps aux | grep postgresBackground Processes
Backend processes handle your queries, but the background processes keep the database running smoothly behind the scenes. Each one has a specific role in moving data from memory to disk and keeping the system healthy.
How background processes move data from memory to disk
Checkpointer
Periodically writes all dirty pages from shared buffers to disk, creating a consistency point for crash recovery. After a checkpoint, PostgreSQL knows that all data up to that point is safely on disk, so earlier WAL files can be recycled.
Checkpoints happen on two triggers: time (checkpoint_timeout, default 5 minutes) or WAL volume (max_wal_size). The checkpoint_completion_target setting (default 0.9) spreads writes over 90% of the checkpoint interval to avoid I/O spikes.
Background Writer (bgwriter)
Works proactively between checkpoints. It wakes up every bgwriter_delay milliseconds (default 200ms), finds a few dirty buffers, writes them to disk, and goes back to sleep. This smooths out I/O and ensures clean buffers are available when backends need to load new pages.
Without the bgwriter, backends would have to evict dirty buffers themselves (synchronous writes), which stalls the query until the write completes.
WAL Writer
Periodically flushes WAL buffers to WAL files on disk (every wal_writer_delay, default 200ms). The write-ahead log is the foundation of PostgreSQL's durability guarantee: before any data change is confirmed to the client, the WAL record describing that change must be on disk.
Backend processes can flush their own WAL on commit, but the WAL writer batches multiple flushes together, reducing the number of fsync() calls. This is especially beneficial for high-throughput workloads and asynchronous commits.
Autovacuum Launcher + Workers
The launcher wakes up every autovacuum_naptime (default 60s), checks which tables have accumulated enough dead tuples, and spawns worker processes to clean them. Each worker handles one table at a time.
Autovacuum does three critical things:
Stats Collector
Gathers performance metrics from all backends: row counts, block reads, cache hits, sequential scans, index usage. This data powers pg_stat_user_tables, pg_stat_activity, and other system views. In PostgreSQL 15+, this was replaced with shared memory statistics, removing the overhead of a separate process.
-- Monitor checkpoint and bgwriter activity
SELECT
checkpoints_timed, -- Scheduled checkpoints
checkpoints_req, -- Forced checkpoints (max_wal_size hit)
buffers_checkpoint, -- Pages written by checkpointer
buffers_clean, -- Pages written by bgwriter
buffers_backend -- Pages written by backends (bad if high)
FROM pg_stat_bgwriter;
-- If buffers_backend is high relative to buffers_clean,
-- bgwriter is not keeping up. Increase bgwriter_lru_maxpages.
-- If checkpoints_req >> checkpoints_timed, increase max_wal_size.
-- Monitor autovacuum activity
SELECT
schemaname, relname,
n_dead_tup,
n_live_tup,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
-- Check WAL generation rate
SELECT
wal_records,
pg_size_pretty(wal_bytes) AS wal_generated
FROM pg_stat_wal;Memory Architecture
PostgreSQL memory is divided into two categories: shared memory (accessible by all processes) and per-process memory (private to each backend). Understanding this split is essential for capacity planning and tuning.
PostgreSQL memory areas: shared across all processes vs per-connection
shared_buffers
The heart of PostgreSQL's caching system. When a query needs a data page, it checks shared buffers first. If the page is there (buffer hit), it reads directly from memory in microseconds. If not (buffer miss), it reads from disk, loads the page into shared buffers, then reads it.
The recommended size is 25% of system RAM for dedicated database servers. Setting it higher is not always better because PostgreSQL also relies on the OS page cache as a second layer. The OS cache uses whatever RAM is left over, so the two layers work together.
work_mem
Controls how much memory each operation (not query) can use before spilling to disk. A single query with 3 sorts and 2 hash joins uses up to 5 x work_mem.
The default 4MB is intentionally conservative. With 100 concurrent connections, aggressive work_mem can consume enormous amounts of RAM. The safe approach is to keep the global default low and increase it per-session for expensive queries:
If you see Sort Method: external merge Disk: 45MB in your EXPLAIN output, that operation exceeded work_mem and spilled to temporary files on disk.
maintenance_work_mem
Used by maintenance operations that typically run one at a time: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY. Since only one maintenance operation runs per session, you can set this much higher than work_mem (512MB to 2GB) to speed up index builds and vacuuming.
Autovacuum has its own setting: autovacuum_work_mem. If not set, it defaults to maintenance_work_mem. Since multiple autovacuum workers can run concurrently (default 3), set this carefully.
effective_cache_size
This does not allocate any memory. It is a hint to the query planner about how much total memory is available for caching (shared_buffers + OS page cache). Setting it to ~75% of system RAM helps the planner make better decisions about whether an index scan is likely to find its pages in cache.
If set too low, the planner assumes most data is on disk and favors sequential scans over index scans, even when the data is actually cached.
-- View current memory configuration
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name IN (
'shared_buffers', 'work_mem', 'maintenance_work_mem',
'effective_cache_size', 'temp_buffers', 'wal_buffers',
'autovacuum_work_mem'
)
ORDER BY name;
-- Check your cache hit ratio (should be > 95% for OLTP)
SELECT
SUM(heap_blks_hit) AS hits,
SUM(heap_blks_read) AS disk_reads,
ROUND(
100.0 * SUM(heap_blks_hit) /
NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2
) AS cache_hit_ratio
FROM pg_statio_user_tables;
-- If cache_hit_ratio < 95%, consider:
-- 1. Increasing shared_buffers
-- 2. Adding more RAM
-- 3. Reducing working set size (partitioning, archiving old data)Configuration by Server Size
These are starting points, not final answers. Every workload is different. Start here, monitor with pg_stat_bgwriter and pg_statio_user_tables, and adjust.
| Setting | 8 GB RAM | 32 GB RAM | 128 GB RAM |
|---|---|---|---|
| shared_buffers | 2 GB | 8 GB | 32 GB |
| effective_cache_size | 6 GB | 24 GB | 96 GB |
| work_mem | 8 MB | 32 MB | 64 MB |
| maintenance_work_mem | 512 MB | 1 GB | 2 GB |
| max_connections | 100 | 200 | 300 |
| checkpoint_timeout | 10 min | 15 min | 15 min |
| max_wal_size | 2 GB | 4 GB | 8 GB |
Key monitoring queries after tuning:
-- Example postgresql.conf for a dedicated 32GB server (OLTP)
-- shared_buffers = 8GB
-- effective_cache_size = 24GB
-- work_mem = 32MB
-- maintenance_work_mem = 1GB
-- max_connections = 200 -- use pgbouncer for more
-- checkpoint_timeout = 15min
-- max_wal_size = 4GB
-- checkpoint_completion_target = 0.9
-- wal_buffers = 64MB
-- bgwriter_delay = 200ms
-- bgwriter_lru_maxpages = 100
-- autovacuum_max_workers = 4
-- autovacuum_vacuum_scale_factor = 0.02
-- autovacuum_analyze_scale_factor = 0.01The architecture is the map. The configuration is the tuning. Once you understand that shared_buffers is the cache, work_mem controls spills, the bgwriter prevents backend stalls, and autovacuum keeps bloat under control, every tuning decision becomes intuitive.