System Design

Caching - Strategies, Patterns & Common Problems

Mar 20, 2026·15 min read

“There are only two hard things in Computer Science: cache invalidation and naming things.” - Phil Karlton

At its core, caching means keeping a copy of data somewhere faster to reach so you skip the expensive fetch next time. The concept is straightforward, but doing it well in a distributed system is surprisingly tricky.

When most people think “cache,” they think Redis. Redis is a solid choice, but caching exists at nearly every layer. Browsers cache. CDNs cache. Your application process can hold data in memory. Even databases run their own internal cache layers under the hood.

This article walks through where caching happens, how different cache patterns work, and the problems that come with it.

Where to Cache

External Caching

An external cache is a dedicated service like Redis or Memcached that sits between your application and the database. Your app servers connect to it over the network and use it to hold frequently requested data so the database doesn't get hammered on every read.

The big advantage is that all of your app servers share a single cache. Built-in eviction policies like LRU and time-based expiry via TTL keep memory usage in check automatically.

If you're dealing with high traffic, an external cache is usually the first lever to pull. Get that working, then consider adding CDN or client-side caching on top.

CDN (Content Delivery Network)

A CDN places copies of your content on servers spread across the globe. When a user makes a request, it gets served from whichever edge node is physically closest rather than traveling all the way back to your origin.

Providers like Cloudflare, Fastly, and Akamai have grown well beyond static file hosting. They can cache API responses, render HTML at the edge, and even enforce security rules before traffic reaches your infrastructure. That said, the highest-impact use case is still serving images, videos, and other static media.

How it works:

1.A user requests an image from your app.

2.The request goes to the nearest CDN edge server.

3.If the image is cached there, it is returned immediately.

4.If not, the CDN fetches it from your origin server, stores it, and returns it.

5.Future users in that region get the image instantly from the CDN.

To put it in numbers: a request from India to a Virginia-based origin adds roughly 250–300ms of round-trip latency. Serve that same asset from a nearby CDN node and you're looking at 20–40ms.

CDNs can do a lot more these days, but for most teams the biggest latency savings still come from offloading static media delivery.

Client-Side Caching

Client-side caching keeps data on the requester's own device so it never has to cross the network again. In practice that could be a browser's HTTP cache, localStorage, or a mobile app's on-device storage.

It also applies at the library level. A Redis client, for instance, caches cluster metadata locally so it knows which shard owns which slot. That lets it route commands to the correct node without an extra round trip every time.

The tradeoff is control. You can't easily invalidate what's sitting on someone's phone or browser. Strava, for example, stores your run history locally so the app works offline and syncs when connectivity returns. Your browser reusing an image it already downloaded is the same idea.

In-Process Caching

Your servers already have memory sitting there. Instead of reaching out to Redis or the database every time, you can stash frequently needed data right inside the application process itself.

If the same handful of values keep getting requested, a HashMap or a local LRU cache inside the process eliminates the lookup entirely. Reading from process memory is faster than even Redis because there's zero network overhead.

This works well for small, frequently accessed values that rarely change:

Configuration values
Feature flags
Small reference datasets
Hot keys
Rate limiting counters
Precomputed values

The obvious downside is isolation. Every app instance maintains its own copy, so nothing is shared across servers. If one instance invalidates a key, the rest still serve the old value until their own copy expires.

Best suited for small, slow-changing values you read constantly. It won't replace Redis, but it makes a great first line of defense before you even hit the network.

Cache Architectures

Where you put the cache is only half the story. The other half is how reads and writes flow through it. Each pattern makes a different tradeoff between speed, consistency, and complexity.

Cache-Aside (Lazy Loading)

By far the most widely used pattern. If you look at how most production services talk to Redis, this is what they're doing.

How it works:

1.Application checks the cache.

2.If the data is there, return it.

3.If not, fetch from the database, store it in the cache, and return it.

Because data only enters the cache after it's actually requested, you avoid filling memory with things nobody needs. The tradeoff: every first request for a key pays an extra round trip.

If you only remember one caching pattern, make it cache-aside.

Write-Through Caching

In this pattern the app sends every write to the cache, and the cache synchronously persists it to the database before acknowledging. Nothing returns to the caller until both stores are updated.

This isn't something Redis does out of the box. You typically need a caching library or a thin wrapper in your application code that coordinates the dual write.

Writes are slower since the caller blocks on two operations. You also risk filling the cache with entries that nobody ever reads again.

There's also the dual-write problem: if the cache write succeeds but the database write fails (or the other way around), you end up with inconsistent state.

Reach for this when your reads absolutely need the latest value and you can afford the extra write latency.

Write-Behind (Write-Back) Caching

Write-behind flips the durability guarantee. The app writes to the cache and returns immediately. The cache then flushes changes to the database asynchronously in the background, often in batches.

You get blazing-fast writes, but with a real risk: if the cache goes down before flushing, those writes are gone. Only use this where losing a small window of data is tolerable.

Analytics pipelines, event counters, and metrics collectors are classic fits. High write volume, eventual consistency is fine.

Read-Through Caching

Read-through puts the cache in charge of fetching. Your application only ever talks to the cache. On a miss, the cache itself queries the database, stores the result, and hands it back.

Think of it as the mirror image of write-through. One manages reads, the other manages writes, and many systems pair them together so the application never touches the database directly for either path.

It keeps caching logic out of your app code, but you need a library or service that knows how to talk to the database. CDNs work exactly this way - on a miss they pull from origin, store it, and serve it. For app-level caching with Redis, though, cache-aside is far more common in practice.

Cache Eviction Policies

Memory is finite. When the cache fills up, something has to go. Eviction policies are the rules that decide what gets dropped.

LRU (Least Recently Used)

Drops whichever entry hasn't been touched the longest. Under the hood it typically uses a doubly-linked list plus a hash map so removals and promotions happen in O(1). It's the default in Redis and most caching libraries because recent access is a surprisingly good predictor of future access.

LFU (Least Frequently Used)

Tracks how often each key is accessed and removes the one with the lowest count. Exact frequency tracking can be expensive, so many systems use an approximate version. LFU shines when popularity is stable over time, like a trending video that stays hot for hours.

FIFO (First In First Out)

A simple queue: the oldest entry gets evicted first, regardless of how often it's still being read. Easy to implement, but it can throw out hot data for no good reason. You rarely see FIFO in production caches for that reason.

TTL (Time To Live)

Strictly speaking, TTL isn't an eviction policy. It's a per-key expiration timer that automatically removes stale entries. Most setups layer TTL on top of LRU or LFU to get both freshness and memory control. Essential for anything that must eventually refresh, like API responses or session tokens.

Common Caching Problems

Speed comes with strings attached. Each of these failure modes has taken down production systems, and knowing them helps you design around them before they bite.

Cache Stampede (Thundering Herd)

Picture a popular key expiring while hundreds of requests are in flight. Every single one misses the cache and hits the database simultaneously. What should be one query becomes hundreds, and the sudden spike can bring the database to its knees.

Say your homepage feed is cached with a 60-second TTL. At the moment it expires, every concurrent request falls through to the database at once. Under heavy traffic that burst alone can trigger cascading failures downstream.

How to handle it:

→Request coalescing (single flight): Allow only one request to rebuild the cache while others wait for the result.

→Cache warming: Refresh popular keys proactively before they expire. This only helps when using TTL-based expiration.

Cache Consistency

This is what happens when the cache and the database disagree. Since most apps write to the database first and the cache second, there's always a brief window where the cache still holds stale data.

A user changes their profile picture. The database gets the new image, but the cache is still serving the old one. Everyone else sees the outdated photo until the cached entry expires or gets explicitly invalidated.

How to handle it:

→Cache invalidation on writes: Delete the cache entry after updating the database so it gets repopulated with fresh data.

→Short TTLs for stale tolerance: Let slightly stale data live temporarily if eventual consistency is acceptable.

→Accept eventual consistency: For feeds, metrics, and analytics, a short delay is usually fine.

Hot Keys

Sometimes one key gets orders of magnitude more traffic than everything else. Your overall hit rate looks fine, but that single key is hammering one Redis shard so hard it becomes the bottleneck for the whole system.

Think of a social platform where a celebrity goes viral. The cache key for their profile might see millions of reads per second, all routed to the same shard. Everything else in the cluster is healthy, but that one node is on fire.

How to handle it:

→Replicate hot keys: Store the same value on multiple cache nodes and load balance reads across them.

→Add a local fallback cache: Keep extremely hot values in-process to avoid pounding Redis.

→Apply rate limiting: Slow down abusive traffic patterns on specific keys.

References

→Hello Interview — Caching Core Concepts