The Hot Partition Problem - When Everyone Writes to the Same Key // Megha Bose

Helpful context:

Databases & Indexes - The Structures That Make Queries Fast

It is Black Friday. The time is 8:59 AM Eastern. A product page for a 79-dollar gaming headset is loaded 100,000 times in the next sixty seconds because a popular influencer mentioned it in a tweet. The database partition holding that product record receives traffic it has never seen in its operating lifetime. The node’s CPU hits 100%. Disk IOPS - Input/Output Operations Per Second, the number of read and write requests a storage device can physically handle in one second - saturate at their hardware ceiling. Response latencies climb from 2 milliseconds to 30 seconds. The load balancer starts timing out requests. The product page goes blank.

The entire e-commerce platform is showing errors. Not because the system ran out of capacity overall - twenty other partitions are sitting idle. One partition is on fire.

This is the hot partition problem. A quick note on terminology: a shard (also called a partition) is a horizontal slice of a database - instead of storing all data on one machine, you split it across many machines, each holding a subset. Sharding is the act of splitting data this way. The machine holding a given piece of data is determined by a shard key - typically a hash of some identifier like a user ID or product ID. More capacity does not fix a hot partition. Adding shards distributes the new capacity evenly, but the traffic is not even - it is concentrated. The hot shard’s share of traffic does not decrease when you add more shards.

What Creates Hot Partitions

Power-law distributions in access patterns are the fundamental cause. Hash-based sharding distributes keys evenly across the keyspace - but only if traffic is distributed evenly across keys. In almost every real system it is not. User-generated content follows a power law: a tiny fraction of content accounts for the vast majority of reads. A single tweet from Beyoncé gets hundreds of millions of impressions; the average tweet gets three. The shard holding Beyoncé’s data absorbs traffic proportional to her popularity, not to the total number of users.

Celebrity and viral content is the acute version. An account with 50 million followers posts something. That post’s shard absorbs a spike orders of magnitude larger than any typical post. The shard has no way to adapt - it is a fixed slice of the keyspace, and all requests for that content must go through it.

Sequential and time-based partition keys create a different failure mode: monotonic hot partitions. If your shard key is a timestamp or auto-incrementing integer, all new writes go to the “latest” partition - always the same one. This is one of the most common operational mistakes in systems using Bigtable, HBase, or DynamoDB with a poorly chosen key. The last partition is always hot; all prior partitions are cold and idle. You get the worst of both worlds: wasted capacity and a saturated write path.

Thundering herd at cache miss: a thundering herd is what happens when many processes or requests are all waiting for the same event and then all wake up simultaneously when it occurs. The classic example: a cached item expires. Thousands of concurrent requests all simultaneously see an empty cache and all rush to fetch the data from the underlying database at the same time. The spike is brief but can be catastrophic. This is structurally the same problem as viral content - concentrated traffic on one destination - triggered by a timer rather than popularity.

The Real Story: Beyoncé and Twitter’s Architecture

To understand the Beyoncé problem, you first need to understand how Twitter’s timeline system worked.

When you open Twitter, you see a feed of tweets from people you follow. The naïve way to build this is to query all the tweets from all accounts you follow and sort them by time - but this is slow, especially for users who follow thousands of accounts. Twitter’s solution was to precompute each user’s timeline feed. When someone you follow posts a tweet, Twitter immediately writes a copy of that tweet ID into your precomputed feed.

This approach is called fan-out - the idea is that one write event “fans out” into many downstream writes, like a single signal branching into multiple outputs. For most accounts, if you have 500 followers, posting a tweet triggers 500 background writes. Manageable.

For Beyoncé, who had tens of millions of followers, a single tweet triggered tens of millions of write operations - all of them hitting the fan-out workers simultaneously. Write amplification is the term for this phenomenon: one logical write (Beyoncé posts a tweet) causes many more physical writes (tens of millions of copies written into follower timelines). The amplification factor here was roughly equal to the follower count.

The fan-out workers fell behind. The queue grew. Timelines stopped updating. The system designed for scale was overwhelmed by a single event.

Twitter, Reddit during major news events, and Steam during major game launches (2021’s Cyberpunk launch caused Steam to show errors for hours) have all hit variations of this problem. The pattern is always the same: one item, one event, one moment - concentrated load that infrastructure assumed would be spread.

Detecting Hot Partitions

You cannot fix what you cannot see. The mistake is monitoring only aggregate cluster metrics. A healthy cluster-level average hides a saturated individual partition.

Per-partition metrics are required: request rate, CPU utilization, read/write throughput, and latency - all broken down by partition or shard. The diagnostic signatures:

One partition’s throughput is 10x the median, while others are under their provisioned capacity
P99 latency for requests to one shard is 500ms while others are 5ms
One Kafka partition’s consumer lag grows continuously while neighboring partitions stay current

Most managed services expose this natively. DynamoDB’s CloudWatch metrics include ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits per partition. Kafka exposes per-partition offset lag through JMX - Java Management Extensions, a standard interface for monitoring and managing Java applications. Redis Cluster exposes per-slot statistics via CLUSTER INFO.

Solution 1: Key Salting

The most direct fix for a hot key is to artificially spread it across multiple virtual shards by appending a random suffix.

For a hot key like product:viral-item, instead of routing all reads and writes to one location, distribute across N:

product:viral-item:0
product:viral-item:1
...
product:viral-item:N-1

Pick the shard suffix randomly on write. On read, you need to retrieve data from all N shards and combine the results - this is called scatter-gather: you scatter the query outward to multiple locations in parallel, then gather (merge) the responses back into a single answer. The hot key’s traffic is now distributed N-ways.

The cost of scatter-gather is that reads become N parallel requests instead of one, plus the work of merging. For write-heavy hot keys, this is a good trade. For read-heavy keys with high N, the overhead of merging becomes significant. N is tunable - size it to the actual hotness factor, not a fixed number.

Solution 2: DynamoDB Adaptive Capacity - and Why It Is Not Enough

DynamoDB has a feature called adaptive capacity that automatically redistributes provisioned throughput from cold partitions to hot ones. It detects usage imbalance and shifts capacity in response. For moderate, sustained hot partitions - where one item legitimately needs more throughput than its partition was allocated - this buys significant relief without application changes.

The limitation is fundamental physics. A DynamoDB partition has a hard ceiling on its throughput regardless of how much capacity you have provisioned for the table overall. Adaptive capacity can reallocate capacity up to the partition’s physical ceiling. If a viral event drives traffic a hundred times above that ceiling, adaptive capacity cannot help. The partition is saturated. You need application-level solutions.

DynamoDB’s recommendation for truly hot items is to add a random suffix to the partition key (key salting) and use DAX - DynamoDB Accelerator, an in-memory caching layer purpose-built for DynamoDB. DAX sits in front of DynamoDB, absorbs read traffic for hot keys in memory, and dramatically reduces how often requests reach DynamoDB itself.

Solution 3: Caching as a Hot Partition Shield

For read-heavy hot partitions, placing a caching layer in front eliminates most of the load. The idea is simple: the first time a hot item is requested, fetch it from the database and store a copy in a fast in-memory cache. Every subsequent request gets the cached copy instead of hitting the database.

AWS ElastiCache (backed by Redis) is the canonical choice for this. Redis handles hundreds of thousands of reads per second on a single node. DynamoDB at default settings handles far fewer per partition. A cache hit rate of 99% means the underlying partition sees only 1% of the original read traffic.

The complexity this introduces is cache invalidation - keeping the cached copy synchronized with the source of truth when the underlying data changes. The options:

TTL-based expiry: TTL stands for Time To Live - a timer attached to a cache entry. When it expires, the next request fetches fresh data from the database. Simple to implement; accepts that reads may briefly return stale data. Fine for product view counts; problematic for inventory levels.
Write-through caching: update the cache at the same time as the underlying store, on every write. Keeps the cache current but adds latency to every write operation.
Pub/sub invalidation: pub/sub (publish/subscribe) is a messaging pattern where one component publishes a notification to a channel, and all subscribers to that channel receive it. When an item changes, publish an invalidation event; every cache node subscribed to that channel evicts its copy and re-fetches. More moving parts, but avoids staleness entirely.

The cold start problem: when a new viral item appears, there is no cache entry yet. The first wave of requests all miss the cache simultaneously and all hammer the underlying partition together. This is the thundering herd again. Request coalescing (next section) is the fix.

Solution 4: Request Coalescing

When many concurrent requests arrive for the same key within a short window, coalescing deduplicates them. The first request fetches from the underlying store; all subsequent requests for the same key, arriving while the first fetch is still in flight, wait for that first request’s result and share it.

This is also called request collapsing or thundering herd protection. It converts N simultaneous partition reads into one, regardless of concurrency level. Varnish (a widely-used HTTP cache) implements this at the HTTP layer with its “grace mode.” Application-level implementations use a shared promise: the first goroutine (or thread) to see a cache miss “owns” the fetch and registers a future result; subsequent goroutines subscribe to the same future and receive the result when it arrives.

This is particularly effective for the cold-start problem on newly viral content.

Solution 5: Write Buffering with a Queue

For write-heavy hot partitions, buffering writes through a message queue decouples the write rate from the application’s request rate. Instead of every request writing directly to the hot partition, requests write to a queue. A consumer drains the queue at a controlled rate and batch-writes to the database.

AWS SQS - Simple Queue Service, Amazon’s managed message queue - is a common choice for this. The application drops writes into SQS, which holds them reliably until a consumer is ready to process them. The database sees a smooth, controlled write stream instead of a spike.

This does not eliminate the writes, but it smooths the spike - converting bursty write traffic into a steady, controlled rate. The cost is latency: the data is not written immediately, so a read right after a write may return the old value. Acceptable for view counts and follower counts; not acceptable for inventory levels where selling more than you have in stock is a real business problem.

Real-World Architectures

Twitter’s hybrid fan-out: Twitter resolved the celebrity problem by setting a follower threshold (roughly 10,000 followers in early versions of the system). Accounts below the threshold use write-time fan-out: a post is pushed into each follower’s precomputed timeline at post time. Accounts above the threshold are handled differently - their posts are not fanned out on write. Instead, they are pulled and merged into each requesting user’s feed at read time. This bounds write amplification for high-follower accounts. The fan-out workers are no longer asked to write to 100 million timelines for a single post.

Redis Cluster and hot slots: Redis Cluster divides its entire keyspace into 16,384 hash slots. Every key is mapped to one of these slots via a hash function, and each slot lives on exactly one node. If multiple high-traffic keys happen to map to the same slot, that node is hot. Redis Cluster does not automatically rebalance within a slot - the application must use key salting and perform scatter-gather at the application layer.

Consistent hashing and virtual nodes: Most distributed databases use a technique called consistent hashing to distribute keys across nodes. Imagine the keyspace arranged in a circle - a hash ring. Each node owns a segment of that ring, and a key is routed to the node whose segment starts closest to the key’s position on the ring. The advantage over simple modular hashing is that when you add or remove a node, only the keys in the affected segment need to be moved - not all keys.

Virtual nodes extend this: instead of each physical node owning one segment of the ring, each physical node is assigned multiple smaller segments scattered around the ring. This improves distribution, especially when nodes have different capacities. But consistent hashing and virtual nodes distribute keyspace uniformly - they do not distribute access patterns uniformly. A hot key on consistent hashing is still a hot key; you have just made it cheaper to add more nodes around it.

When Solutions Make Things Worse

Caching hot data introduces cache invalidation complexity that can become its own source of bugs. A cache that is out of sync with the source of truth is worse than no cache in systems where correctness matters - you end up with users seeing different versions of the same data depending on which cache node their request hit.

Key salting adds application complexity: every write must choose a shard suffix, every read must scatter-gather across all suffixes. The number of suffixes becomes a configuration parameter you must tune per key type. Changing it after the fact requires a data migration.

Write buffering via a queue means your system is no longer strongly consistent by default. A client that writes and then immediately reads may read stale data. This is acceptable in many contexts but must be a deliberate design choice, not an accidental one.

Multi-Region Complications

In a multi-region deployment, hot partitions are complicated by cross-region replication lag. If a viral item is written in US-EAST-1 and read in EU-WEST-1, the replication lag means EU-WEST-1 may serve stale data during peak traffic. The hot partition problem combines with the eventual consistency problem.

Global tables in DynamoDB provide multi-active replication - reads and writes in any region, with last-writer-wins conflict resolution (when two regions write to the same key at the same time, the write with the later timestamp wins). This spreads read traffic globally and reduces per-region hot partition severity. But it does not eliminate the hot key problem - it distributes it geographically.

Summary

Cause	Solution	Trade-off
Viral/celebrity content reads	Read-through cache (Redis/ElastiCache)	Cache invalidation complexity
Viral content write spikes	Key salting + scatter-gather reads	N reads per logical read
Sequential write keys	Pre-split partitions, random prefix	Query complexity
Thundering herd on cache miss	Request coalescing	Implementation complexity
Write-heavy spikes	Queue-based write buffering (SQS)	Eventual consistency
Sustained moderate hot key	DynamoDB adaptive capacity	Physical ceiling still exists

Read Next:

Feed Design - Push vs Pull and the Cost of Freshness