Load Balancers, Reverse Proxies & API Gateways
The Three Gatekeepers
Between your users and your backend, three critical components control how traffic flows: Reverse Proxies, API Gateways, and Load Balancers.
They sound similar, and they do overlap, but each solves a distinct problem. Knowing when to use which is fundamental to designing resilient systems.
Let's explore each one, starting from the simplest concept and building up.
Reverse Proxy
A reverse proxy sits in front of your backend and acts on behalf of the server. Clients never talk directly to your origin.
In the diagram, the client sends HTTPS, but the proxy forwards plain HTTP internally. That's SSL termination - the proxy handles encryption so your app doesn't have to.
What it does:
โข SSL termination - decrypt HTTPS at the edge
โข Caching - serve repeated responses without hitting origin
โข Compression - gzip/brotli responses
โข Hide origin - clients never see your real server IPs
Examples: Nginx, Caddy, HAProxy, Cloudflare. Even a CDN is essentially a globally distributed reverse proxy.
API Gateway
An API Gateway is a specialized reverse proxy built for APIs. It adds application-aware features on top of basic proxying.
Notice how requests to /users, /orders, and /pay each route to different backend services.
What it adds over a reverse proxy:
โข Authentication - validate JWTs, API keys, OAuth
โข Rate limiting - 100 req/min per key
โข Request transformation - reshape payloads between client and service
โข Service routing - route by path, header, or method to different backends
Examples: Kong, AWS API Gateway, Apigee. This is the front door to your microservices - it keeps cross-cutting concerns out of business logic.
Load Balancer: Round Robin
Now the Load Balancer. While a reverse proxy can sit in front of one server, a load balancer distributes traffic across multiple servers. It's how you scale horizontally.
The simplest strategy is Round Robin: requests cycle through servers in order - 1 โ 2 โ 3 โ 1 โ 2 โ 3.
Best for: Stateless services with similar server specs
Weakness: Ignores server load. A slow request on App #1 doesn't stop new traffic from arriving.
Weighted Round Robin is a variant where beefier servers get a larger share. A server with weight 3 gets three requests for every one sent to a weight-1 server.
Least Connections
Smarter than round robin. The load balancer tracks active connections per server and routes new requests to the one with the fewest.
The badges show this clearly - App #2 has only 3 active connections vs 12 and 7, so every new request goes there.
Best for: Long-lived connections (WebSocket, database pools, streaming)
Why it wins: Adapts to real load, not just request count
This is what most production setups use when connections have varying durations. AWS ALB uses a variant of this by default.
Health Checks & Failover
The load balancer periodically pings each server with a health check - typically GET /health every 10-30 seconds.
In the diagram, App #2 fails its health check, turns red, and gets pulled from rotation. Traffic automatically reroutes to the remaining healthy servers.
Green dot = healthy, passing health checks
Red dot = unhealthy, removed from pool
This is how systems stay available during deployments, crashes, or scaling events, with no manual intervention needed.
Sticky Sessions
Sometimes you need the same user to hit the same server, usually because session state is stored in server memory.
Here, User A (blue) always goes to App #1, and User B (green) always goes to App #2. The LB uses a cookie or IP hash to maintain that affinity.
Trade-off: If that server dies, the session is lost. That's why stateless architectures (JWTs, Redis sessions) are preferred.
Use sticky sessions as a temporary solution or when migrating from a stateful app. The goal should always be stateless servers behind the LB.
Layer 4 vs Layer 7
This is the most important distinction in load balancing.
Routes based on IP + port. Sees TCP/UDP packets, not HTTP. Fast but blind to content. Example: AWS NLB.
Routes based on URL path, headers, cookies. Can split traffic by content. Example: AWS ALB, Nginx.
In the diagram, the L7 LB reads the request path and routes /images to App #1, /api to App #2, and /static to App #3. Each path lands on its own dedicated service.
L7 is what you want for microservices, API gateways, and anything where routing depends on the content of the request.
Putting It All Together
These three components often overlap, and in production you typically layer them:
- โReverse Proxy (Nginx/Cloudflare) handles SSL, caching, compression at the edge
- โAPI Gateway (Kong/AWS) handles auth, rate limiting, and routes to services
- โLoad Balancer (ALB/NLB) distributes traffic within each service cluster
- โHealth checks automatically pull unhealthy instances from rotation
A common stack:
Client โ Cloudflare (RP) โ Kong (API GW) โ ALB (LB) โ App Servers
Knowing the distinction between these layers, and when each one is needed, is the foundation of designing resilient, scalable backend infrastructure.