Database

4. Processes & Memory

Mar 24, 2026·10 min read

Before you can tune PostgreSQL, you need to understand what is actually running when you start it. PostgreSQL is not a single program. It is a collection of cooperating processes that share memory. Understanding these processes and how memory is divided between them is the foundation for every performance decision you will make.

Process Architecture

PostgreSQL is built around three types of processes. Understanding what each one does tells you where CPU time and memory are going.

Postmaster

The parent process that starts when you launch PostgreSQL. It listens on the configured port (default 5432), manages server initialization, spawns all other processes, and monitors their health. When a client connects, the postmaster authenticates the connection and forks a new dedicated backend process to handle it.

Backend Processes

The workhorses. Each client connection gets its own dedicated backend process that executes queries, manages transactions, and accesses shared memory and disk. This process-per-connection model provides strong isolation between clients and simplifies crash recovery. If one backend crashes, it does not bring down other connections. The tradeoff is higher memory overhead compared to thread-based architectures.

Background Workers

Processes that run continuously to maintain database health. They handle checkpointing, WAL flushing, buffer management, autovacuum, statistics collection, and WAL archiving. The postmaster spawns and monitors them. If any background worker crashes, the postmaster restarts it automatically.

PostgreSQL process architecture: postmaster forks backend and background processes

All processes, both backends and background workers, access the same shared memory region. This is how data is passed between them without expensive IPC. When a backend modifies a page in shared buffers, the checkpointer can later write that dirty page to disk without any explicit communication.

sql

-- View all PostgreSQL processes and their roles
SELECT
    pid,
    usename,
    backend_type,
    state,
    LEFT(query, 60) AS query_preview
FROM pg_stat_activity
ORDER BY backend_type, pid;

-- On Linux/macOS, you can also see the process tree:
-- ps aux | grep postgres

Background Processes

Backend processes handle your queries, but the background processes keep the database running smoothly behind the scenes. Each one has a specific role in moving data from memory to disk and keeping the system healthy.

How background processes move data from memory to disk

Checkpointer

Periodically writes all dirty pages from shared buffers to disk, creating a consistency point for crash recovery. After a checkpoint, PostgreSQL knows that all data up to that point is safely on disk, so earlier WAL files can be recycled.

Checkpoints happen on two triggers: time (checkpoint_timeout, default 5 minutes) or WAL volume (max_wal_size). The checkpoint_completion_target setting (default 0.9) spreads writes over 90% of the checkpoint interval to avoid I/O spikes.

Background Writer (bgwriter)

Works proactively between checkpoints. It wakes up every bgwriter_delay milliseconds (default 200ms), finds a few dirty buffers, writes them to disk, and goes back to sleep. This smooths out I/O and ensures clean buffers are available when backends need to load new pages.

Without the bgwriter, backends would have to evict dirty buffers themselves (synchronous writes), which stalls the query until the write completes.

WAL Writer

Periodically flushes WAL buffers to WAL files on disk (every wal_writer_delay, default 200ms). The write-ahead log is the foundation of PostgreSQL's durability guarantee: before any data change is confirmed to the client, the WAL record describing that change must be on disk.

Backend processes can flush their own WAL on commit, but the WAL writer batches multiple flushes together, reducing the number of fsync() calls. This is especially beneficial for high-throughput workloads and asynchronous commits.

Autovacuum Launcher + Workers

The launcher wakes up every autovacuum_naptime (default 60s), checks which tables have accumulated enough dead tuples, and spawns worker processes to clean them. Each worker handles one table at a time.

Autovacuum does three critical things:

-Reclaim dead tuples - UPDATEs and DELETEs leave old row versions behind (MVCC). VACUUM marks that space as reusable.

-Prevent transaction ID wraparound - PostgreSQL uses 32-bit transaction IDs. Without vacuuming, the counter wraps around and the database shuts down to prevent data corruption.

-Update planner statistics - ANALYZE collects value distributions so the query planner can make good decisions. Autovacuum runs ANALYZE automatically.

Stats Collector

Gathers performance metrics from all backends: row counts, block reads, cache hits, sequential scans, index usage. This data powers pg_stat_user_tables, pg_stat_activity, and other system views. In PostgreSQL 15+, this was replaced with shared memory statistics, removing the overhead of a separate process.

sql

-- Monitor checkpoint and bgwriter activity
SELECT
    checkpoints_timed,       -- Scheduled checkpoints
    checkpoints_req,         -- Forced checkpoints (max_wal_size hit)
    buffers_checkpoint,      -- Pages written by checkpointer
    buffers_clean,           -- Pages written by bgwriter
    buffers_backend          -- Pages written by backends (bad if high)
FROM pg_stat_bgwriter;

-- If buffers_backend is high relative to buffers_clean,
-- bgwriter is not keeping up. Increase bgwriter_lru_maxpages.
-- If checkpoints_req >> checkpoints_timed, increase max_wal_size.

-- Monitor autovacuum activity
SELECT
    schemaname, relname,
    n_dead_tup,
    n_live_tup,
    last_autovacuum,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- Check WAL generation rate
SELECT
    wal_records,
    pg_size_pretty(wal_bytes) AS wal_generated
FROM pg_stat_wal;

Memory Architecture

PostgreSQL memory is divided into two categories: shared memory (accessible by all processes) and per-process memory (private to each backend). Understanding this split is essential for capacity planning and tuning.

PostgreSQL memory areas: shared across all processes vs per-connection

shared_buffers

The heart of PostgreSQL's caching system. When a query needs a data page, it checks shared buffers first. If the page is there (buffer hit), it reads directly from memory in microseconds. If not (buffer miss), it reads from disk, loads the page into shared buffers, then reads it.

The recommended size is 25% of system RAM for dedicated database servers. Setting it higher is not always better because PostgreSQL also relies on the OS page cache as a second layer. The OS cache uses whatever RAM is left over, so the two layers work together.

work_mem

Controls how much memory each operation (not query) can use before spilling to disk. A single query with 3 sorts and 2 hash joins uses up to 5 x work_mem.

The default 4MB is intentionally conservative. With 100 concurrent connections, aggressive work_mem can consume enormous amounts of RAM. The safe approach is to keep the global default low and increase it per-session for expensive queries:

SET work_mem = '256MB'; -- for this session only

If you see Sort Method: external merge Disk: 45MB in your EXPLAIN output, that operation exceeded work_mem and spilled to temporary files on disk.

maintenance_work_mem

Used by maintenance operations that typically run one at a time: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY. Since only one maintenance operation runs per session, you can set this much higher than work_mem (512MB to 2GB) to speed up index builds and vacuuming.

Autovacuum has its own setting: autovacuum_work_mem. If not set, it defaults to maintenance_work_mem. Since multiple autovacuum workers can run concurrently (default 3), set this carefully.

effective_cache_size

This does not allocate any memory. It is a hint to the query planner about how much total memory is available for caching (shared_buffers + OS page cache). Setting it to ~75% of system RAM helps the planner make better decisions about whether an index scan is likely to find its pages in cache.

If set too low, the planner assumes most data is on disk and favors sequential scans over index scans, even when the data is actually cached.

sql

-- View current memory configuration
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name IN (
    'shared_buffers', 'work_mem', 'maintenance_work_mem',
    'effective_cache_size', 'temp_buffers', 'wal_buffers',
    'autovacuum_work_mem'
)
ORDER BY name;

-- Check your cache hit ratio (should be > 95% for OLTP)
SELECT
    SUM(heap_blks_hit) AS hits,
    SUM(heap_blks_read) AS disk_reads,
    ROUND(
        100.0 * SUM(heap_blks_hit) /
        NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2
    ) AS cache_hit_ratio
FROM pg_statio_user_tables;

-- If cache_hit_ratio < 95%, consider:
-- 1. Increasing shared_buffers
-- 2. Adding more RAM
-- 3. Reducing working set size (partitioning, archiving old data)

Configuration by Server Size

These are starting points, not final answers. Every workload is different. Start here, monitor with pg_stat_bgwriter and pg_statio_user_tables, and adjust.

Setting	8 GB RAM	32 GB RAM	128 GB RAM
shared_buffers	2 GB	8 GB	32 GB
effective_cache_size	6 GB	24 GB	96 GB
work_mem	8 MB	32 MB	64 MB
maintenance_work_mem	512 MB	1 GB	2 GB
max_connections	100	200	300
checkpoint_timeout	10 min	15 min	15 min
max_wal_size	2 GB	4 GB	8 GB

Key monitoring queries after tuning:

-Cache hit ratio < 95% - Increase shared_buffers or add RAM

-buffers_backend is high - bgwriter is not keeping up. Increase bgwriter_lru_maxpages

-checkpoints_req >> checkpoints_timed - WAL is filling too fast. Increase max_wal_size

-Sorts spilling to disk - Increase work_mem per-session for expensive queries

-Dead tuple % > 20% - Autovacuum is falling behind. Lower autovacuum_vacuum_scale_factor

sql

-- Example postgresql.conf for a dedicated 32GB server (OLTP)
-- shared_buffers = 8GB
-- effective_cache_size = 24GB
-- work_mem = 32MB
-- maintenance_work_mem = 1GB
-- max_connections = 200               -- use pgbouncer for more
-- checkpoint_timeout = 15min
-- max_wal_size = 4GB
-- checkpoint_completion_target = 0.9
-- wal_buffers = 64MB
-- bgwriter_delay = 200ms
-- bgwriter_lru_maxpages = 100
-- autovacuum_max_workers = 4
-- autovacuum_vacuum_scale_factor = 0.02
-- autovacuum_analyze_scale_factor = 0.01

The architecture is the map. The configuration is the tuning. Once you understand that shared_buffers is the cache, work_mem controls spills, the bgwriter prevents backend stalls, and autovacuum keeps bloat under control, every tuning decision becomes intuitive.