System Design

Load Balancers, Reverse Proxies & API Gateways

Mar 19, 2026·8 min read

The Three Gatekeepers

Between your users and your backend, three critical components control how traffic flows: Reverse Proxies, API Gateways, and Load Balancers.

They sound similar, and they do overlap, but each solves a distinct problem. Knowing when to use which is fundamental to designing resilient systems.

Let's explore each one, starting from the simplest concept and building up.

Reverse Proxy

A reverse proxy sits in front of your backend and acts on behalf of the server. Clients never talk directly to your origin.

In the diagram, the client sends HTTPS, but the proxy forwards plain HTTP internally. That's SSL termination - the proxy handles encryption so your app doesn't have to.

What it does:

• SSL termination - decrypt HTTPS at the edge

• Caching - serve repeated responses without hitting origin

• Compression - gzip/brotli responses

• Hide origin - clients never see your real server IPs

Examples: Nginx, Caddy, HAProxy, Cloudflare. Even a CDN is essentially a globally distributed reverse proxy.

API Gateway

An API Gateway is a specialized reverse proxy built for APIs. It adds application-aware features on top of basic proxying.

Notice how requests to /users, /orders, and /pay each route to different backend services.

What it adds over a reverse proxy:

• Authentication - validate JWTs, API keys, OAuth

• Rate limiting - 100 req/min per key

• Request transformation - reshape payloads between client and service

• Service routing - route by path, header, or method to different backends

Examples: Kong, AWS API Gateway, Apigee. This is the front door to your microservices - it keeps cross-cutting concerns out of business logic.

Load Balancer: Round Robin

Now the Load Balancer. While a reverse proxy can sit in front of one server, a load balancer distributes traffic across multiple servers. It's how you scale horizontally.

The simplest strategy is Round Robin: requests cycle through servers in order - 1 → 2 → 3 → 1 → 2 → 3.

Best for: Stateless services with similar server specs

Weakness: Ignores server load. A slow request on App #1 doesn't stop new traffic from arriving.

Weighted Round Robin is a variant where beefier servers get a larger share. A server with weight 3 gets three requests for every one sent to a weight-1 server.

Least Connections

Smarter than round robin. The load balancer tracks active connections per server and routes new requests to the one with the fewest.

The badges show this clearly - App #2 has only 3 active connections vs 12 and 7, so every new request goes there.

Best for: Long-lived connections (WebSocket, database pools, streaming)

Why it wins: Adapts to real load, not just request count

This is what most production setups use when connections have varying durations. AWS ALB uses a variant of this by default.

Health Checks & Failover

The load balancer periodically pings each server with a health check - typically GET /health every 10-30 seconds.

In the diagram, App #2 fails its health check, turns red, and gets pulled from rotation. Traffic automatically reroutes to the remaining healthy servers.

Green dot = healthy, passing health checks

Red dot = unhealthy, removed from pool

This is how systems stay available during deployments, crashes, or scaling events, with no manual intervention needed.

Sticky Sessions

Sometimes you need the same user to hit the same server, usually because session state is stored in server memory.

Here, User A (blue) always goes to App #1, and User B (green) always goes to App #2. The LB uses a cookie or IP hash to maintain that affinity.

Trade-off: If that server dies, the session is lost. That's why stateless architectures (JWTs, Redis sessions) are preferred.

Use sticky sessions as a temporary solution or when migrating from a stateful app. The goal should always be stateless servers behind the LB.

Layer 4 vs Layer 7

This is the most important distinction in load balancing.

L4 (Transport Layer)

Routes based on IP + port. Sees TCP/UDP packets, not HTTP. Fast but blind to content. Example: AWS NLB.

L7 (Application Layer)

Routes based on URL path, headers, cookies. Can split traffic by content. Example: AWS ALB, Nginx.

In the diagram, the L7 LB reads the request path and routes /images to App #1, /api to App #2, and /static to App #3. Each path lands on its own dedicated service.

L7 is what you want for microservices, API gateways, and anything where routing depends on the content of the request.

Putting It All Together

These three components often overlap, and in production you typically layer them:

→Reverse Proxy (Nginx/Cloudflare) handles SSL, caching, compression at the edge
→API Gateway (Kong/AWS) handles auth, rate limiting, and routes to services
→Load Balancer (ALB/NLB) distributes traffic within each service cluster
→Health checks automatically pull unhealthy instances from rotation

A common stack:

Client → Cloudflare (RP) → Kong (API GW) → ALB (LB) → App Servers

Knowing the distinction between these layers, and when each one is needed, is the foundation of designing resilient, scalable backend infrastructure.