Helpful context:


A database query takes 40ms. That is fast for a single query. But if ten thousand users per second each trigger that same query and the underlying data has not changed, your database is doing the same 40ms of work ten thousand times a second - and returning the same result every time. That is not a performance problem. It is a design problem.

A cache stores the result of an expensive operation so the next request can skip the operation entirely. The database sees one query. The cache answers the other 9,999.

This sounds simple. The complexity is in deciding when a cached answer is no longer trustworthy.

Where Caches Live

Before talking about strategies, it helps to recognize that “the cache” is not one thing. Caches exist at every layer of a system, and a slow application might need to fix the cache at a different layer than the one everyone assumes.

Browser cache: your browser stores HTML, CSS, JavaScript, and images on local disk. The second visit to a website loads assets from disk in milliseconds, not from the network. Controlled by Cache-Control response headers from the server.

CDN edge cache: geographically distributed servers store copies of content close to users. A user in Mumbai fetches a photo from a server 20ms away, not from a server in Virginia. Most large-scale read problems for public content are solved at this layer, not the application layer.

Application cache: an in-memory store - almost always Redis or Memcached - that sits between your application servers and your database. This is what engineers usually mean when they say “we added caching.”

Database buffer pool: PostgreSQL’s shared buffer pool and MySQL’s InnoDB buffer pool are caches built into the database itself. When people say “the query is fast because the data is in memory,” this is usually what they mean - the data pages are already cached in RAM by the database engine. Queries that repeatedly access the same rows benefit automatically, without any application-level caching.

Most performance problems that look like “we need caching” are actually “we need to understand which cache layer is failing us.”

Four Patterns, Four Trade-offs

Cache-Aside (Lazy Loading)

The most common pattern. The application explicitly manages the cache.

Read path: check the cache first. On a hit, return immediately. On a miss, query the database, store the result in cache, then return it.

Write path: write to the database, then delete the cache entry for that key. The next read will re-populate from the fresh database value.

def get_user(user_id):
    key = f"user:{user_id}"
    cached = redis.get(key)
    if cached:
        return json.loads(cached)
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    redis.setex(key, 3600, json.dumps(user))  # cache for 1 hour
    return user

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    redis.delete(f"user:{user_id}")  # invalidate; next read gets fresh data

The appeal: only data that is actually requested gets cached. If a record is never read, it never occupies cache space. Cache restarts are graceful - the cache warms up organically as requests come in.

The risk: the first request after a cache miss always hits the database. If many requests arrive simultaneously for a cold key, all of them see a miss and all go to the database at once. This is the thundering herd problem, discussed in detail below.

Write-Through

On every write, update both the cache and the database atomically.

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    redis.set(f"user:{user_id}", json.dumps(data))  # update cache simultaneously

The appeal: the cache is always current. Reads after a write always find the new data. No stale reads, no cache invalidation logic needed.

The cost: every write is two writes instead of one - the database write plus the cache write. Write performance is slightly slower. You also cache data that may never be read - if you write a record that nobody queries for the next hour, you spent cache space for nothing. A TTL on cached entries cleans up the cold data over time.

Write-through is the right choice when you absolutely cannot serve stale data: a pricing service, a permission check, anything where showing the user outdated information causes a real problem.

Write-Back (Write-Behind)

Write to the cache first. Return the response to the user immediately. Flush to the database asynchronously, in batches.

This is how your laptop saves battery when writing files: writes go to an in-memory buffer and are periodically flushed to disk. The write feels instant; the actual disk write happens later in the background.

The appeal: dramatically faster writes. A counter that increments on every page view - if you wrote that to the database on every increment, you would be doing millions of database writes per day. Write-back accumulates the increments in Redis and flushes the total every few seconds. The database sees one write per flush interval, not one per page view.

The risk: data in the cache that has not been flushed to the database is at risk. If the cache server crashes before the flush, those writes are lost. This is acceptable for like counters and analytics (losing a few events in a crash is fine), and completely unacceptable for financial transactions.

Write-Around

Skip the cache on writes entirely. Writes go directly to the database. The cache is only populated when data is subsequently read.

This is the right pattern for data that is written once and rarely read, or for bulk data imports where caching every incoming record would evict useful data from the cache. Log ingestion pipelines, batch imports, archival operations - all are write-around candidates.

Cache Invalidation: The Hard Part

Phil Karlton’s observation that “there are only two hard things in computer science: cache invalidation and naming things” is not a joke. It captures a genuine asymmetry: making a cache fast is easy; keeping it correct is hard.

The fundamental problem: a cache is a copy of data. The original can change. The copy does not automatically know it has changed. Every strategy below is a different answer to the question of how to decide when the copy is no longer trustworthy.

TTL (Time-to-Live): set an expiry on every cached entry. When it expires, the cache evicts it and the next read goes to the database.

TTL is simple and works well when serving slightly stale data for a short period is acceptable - a product catalog, a public news feed, a user’s follower count. It is a poor fit when freshness is critical - a bank balance, a seat availability counter, a permission check.

Choosing the right TTL is a judgment call. Too short: the cache provides little benefit and the database is hit frequently. Too long: users see stale data after updates. There is no universal answer; it depends on how often data changes and how much staleness the application can tolerate.

Event-driven invalidation: instead of waiting for a TTL, explicitly delete or update the cache entry the moment the underlying data changes.

def update_product_price(product_id, new_price):
    db.execute("UPDATE products SET price = %s WHERE id = %s", new_price, product_id)
    redis.delete(f"product:{product_id}")           # invalidate single product
    redis.delete("product_catalog_homepage")         # invalidate any cached list that included this product

This works well when writes are infrequent and come through a known code path. The weakness: if data is modified outside your application (a migration script, a manual database edit, an external system writing directly to the database), the cache is not notified. The stale entry stays until its TTL expires.

Write-through as automatic invalidation: if every write also updates the cache, the cache never holds stale data. This is “invalidation” in the sense that you do not need to delete stale entries - you always replace them on write.

The Thundering Herd Problem

Picture a cached key with a 5-minute TTL: the homepage product list. At peak traffic, 5,000 users per second request this page. The cache is serving all of them from its stored copy.

The TTL expires. In the next 50 milliseconds, thousands of requests all see a cache miss and all go to the database. The database that was handling zero queries for this key is now handling thousands simultaneously. This overwhelms the database connection pool. Queries time out. The application throws errors. The frontend retries. More queries. The database falls over.

This is the thundering herd (also called cache stampede). It is not a hypothetical. High-traffic systems hit this regularly after cache invalidation events or after a cache restart warms up.

Four practical solutions:

Mutex on first miss: when a cache miss occurs, the first request acquires a Redis lock (SET lockkey 1 NX EX 10 - set if not exists, expire in 10 seconds). That request goes to the database and populates the cache. All other concurrent requests for the same key see the lock is held, wait briefly, then read from the freshly populated cache. Only one database query happens regardless of how many requests arrived simultaneously.

def get_with_lock(key, fetch_fn, ttl=3600):
    cached = redis.get(key)
    if cached:
        return json.loads(cached)
    lock_key = f"lock:{key}"
    acquired = redis.set(lock_key, "1", nx=True, ex=10)
    if acquired:
        try:
            result = fetch_fn()
            redis.setex(key, ttl, json.dumps(result))
            return result
        finally:
            redis.delete(lock_key)
    else:
        import time
        time.sleep(0.05)
        return get_with_lock(key, fetch_fn, ttl)

Probabilistic early expiration (XFetch): instead of waiting for TTL to hit zero, start probabilistically refreshing the cache before it expires. The closer the key is to expiry, the higher the probability that any given request triggers a background refresh. Popular keys stay warm continuously; the cache is never actually empty when traffic is high.

Stale-while-revalidate: serve the stale cached value immediately, while triggering a background refresh. The user gets a fast response (possibly 1-2 minutes stale). The cache refreshes asynchronously. The HTTP Cache-Control: stale-while-revalidate=60 directive instructs CDNs to do exactly this. For application-level caching, the same idea applies: return the old value now, refresh the cache in a background thread.

Request coalescing: if a thousand requests arrive for a cold key simultaneously, funnel them into a single upstream database request. All pending requests wait for the one upstream call to complete, then share its result. The Dataloader library (used heavily in GraphQL resolvers) implements this as its core primitive.

Redis Data Structures for Practical Caching

Redis is more than a key-value store. Its data structures enable patterns that a plain hash map cannot support.

Atomic counters for high-write metrics: INCR likes:post-42 is an atomic operation - no two concurrent increments interfere. Run a million likes per second through Redis INCR, then batch-flush the totals to the database periodically. The database sees writes per flush interval, not per like. This is how Instagram’s like counter works at scale.

Sorted sets for leaderboards: ZINCRBY leaderboard 1 user:42 atomically increments user 42’s score and keeps the set ranked. ZRANGE leaderboard 0 9 WITHSCORES REV returns the top 10 players in a single O(log n + 10) operation. No GROUP BY, no ORDER BY, no database round-trip.

SETNX for distributed locks: SET lockkey value NX EX 30 sets the key only if it does not already exist, with a 30-second automatic expiry. This is the primitive for preventing thundering herd stampedes and for coordinating distributed systems. If two application servers both try to execute the same scheduled job, the SETNX lock ensures only one succeeds.

Hash for partial updates: instead of serializing a full user object as a JSON string, store individual fields: HSET user:42 name "Megha" email "meg@example.com" subscription "pro". HGET user:42 name retrieves a single field without deserializing the entire object. HINCRBY user:42 login_count 1 increments one field atomically. Useful when different parts of your application need to update different fields of the same cached entity.

When Caching Makes Things Worse

The instinct to cache everything is a common mistake. Caching is not free.

Caching writes is usually wrong. Write-back requires infrastructure to guarantee eventual flush. Write-through doubles write latency. Cache-aside on the write path adds complexity without benefit. Unless writes are the bottleneck (they rarely are - reads dominate most applications), focus caching on reads.

Caching correctness-critical operations is dangerous. A booking system that reads seat availability from the cache and makes transactional decisions based on it will double-book seats. The rule: use the cache for display and for tolerable staleness; always read from the primary database for decisions that must be correct. “Show the user how many seats are left” can use cache. “Deduct this seat from inventory when the user clicks Book” must go to the database.

Too many cached keys with no TTL fills your cache and causes evictions of useful data. Every cached key should have a TTL, even if it is long. Redis’s LRU eviction will manage space automatically, but intentional TTLs give you control.


Pattern Write path Read freshness Risk Best for
Cache-aside Normal (DB only) Eventual (TTL or invalidation) Thundering herd on cold cache Standard read-heavy workloads
Write-through Double (cache + DB) Strong (always current after write) Wasted space for unread data Must never show stale data
Write-back Fast (cache first, async flush) Eventual (flush interval lag) Data loss if cache fails before flush High-write metrics where some loss is acceptable
Write-around Normal (DB only, skip cache) Strong (always DB on first read) No cache benefit on recent writes Write-once data, bulk imports

Read Next: