Feed Design - Push vs Pull and the Cost of Freshness // Megha Bose

Helpful context:

The Hot Partition Problem - When Everyone Writes to the Same Key

Katy Perry posts a tweet. She has 100 million followers. In what sense should all 100 million people see it in their feeds within seconds? And more importantly - should the system even try to make that happen?

This question turns out to be the central design decision in social feed architecture. The answer shapes the entire system: storage requirements, write amplification, read latency, consistency guarantees, and how your infrastructure behaves when a celebrity does something newsworthy at 8 PM on a Tuesday.

The Geometry of the Problem

Consider two variables: the number of people a user follows, and the number of followers a user has.

For a typical user following 500 accounts: loading their feed naively means querying 500 separate data locations, fetching recent posts from each, and merging the results by timestamp. Doing this as 500 sequential round trips would be extremely slow. The alternative is a scatter-gather query - you fire all 500 requests in parallel simultaneously (the scatter), then wait for all responses and merge them into one sorted result (the gather). Even with scatter-gather, multiplied by millions of concurrent feed loads, this is not a system that survives real traffic.

For a celebrity with 100 million followers posting once: if the system writes that post into each follower’s precomputed feed at post time, that is 100 million writes triggered by a single action. At one microsecond per write (an optimistic estimate), that is 100 seconds of serial writes. In a parallel pipeline it is still an enormous fan-out operation - one input event branching into millions of downstream writes - that consumes significant infrastructure for a single event.

These two problems define the design space. You can solve the read problem by precomputing feeds at write time. You can solve the write problem by computing feeds at read time. You cannot fully solve both simultaneously - the tradeoff is structural.

Push: Fan-Out on Write

The push model precomputes a user’s feed when content is created. When a user posts, the system immediately writes a reference to that post into each follower’s stored feed. By the time a follower loads their feed, the work is already done - it is a single key lookup.

Redis and sorted sets

Redis is an in-memory data store - it keeps all its data in RAM rather than on disk, which makes reads and writes extremely fast (microseconds rather than milliseconds). One of Redis’s built-in data structures is the sorted set: a collection where each entry has a value (like a post ID) and a numeric score (like a timestamp). Entries are automatically kept sorted by score, so you can efficiently ask “give me the 20 most recent posts” without scanning everything.

A precomputed feed stored in Redis looks like: ZADD feed:{user_id} {timestamp} {post_id} - adding a post ID with its timestamp as the score. Reading the most recent posts is ZREVRANGEBYSCORE - a range query by score in descending order. To stop memory growing unboundedly, platforms cap feed depth (keeping only the most recent 1,000 posts) and trim older entries with ZREMRANGEBYRANK after each insertion.

Cassandra as the durable backing store

Redis is fast but expensive: RAM costs far more than disk, so you cannot keep every user’s full feed history in Redis forever. Cold users - those who haven’t logged in for days or weeks - have their feeds evicted from Redis.

This is where a backing store comes in: a durable, disk-based database that holds the canonical, long-term copy of the data, as opposed to the cache (Redis) which holds a fast but temporary copy. Cassandra fills this role in many feed systems. Cassandra is a distributed database designed for high write throughput and time-series data - it stores data on disk across many nodes and is built for durability. A Cassandra table with partition key user_id and clustering key post_timestamp DESC (the combination of these two is the compound primary key - together they uniquely identify a row and control physical ordering on disk) supports efficient time-range scans. Cassandra’s wide rows - the ability for a single logical row to contain a very large number of columns - handle large fan-outs naturally: a user’s entire timeline fits in one partition.

When an inactive user logs back in, the system rebuilds their feed from Cassandra. This cold-start rebuild is the cost of keeping Redis memory usage bounded.

When push works well: follower counts are in the thousands to low hundreds of thousands, posts are infrequent relative to reads, and the user base is active (precomputed feeds are actually read, not just stored and forgotten).

Where push fails:

Write amplification is the fundamental problem. A user with 10 million followers generates 10 million writes per post. Storage cost grows as followers multiplied by posts - every post is duplicated across every follower’s feed. Deleting a post means removing it from potentially millions of feed entries. A user unfollowing someone requires cleaning up past feed entries retroactively - or accepting that stale entries will be filtered at read time.

The inactive user problem is subtle but real. Users who have not logged in for months still have feeds being updated. Storage accumulates for feeds no one will read.

Pull: Fan-Out on Read

The pull model computes a user’s feed on demand. When a user loads their feed, the system queries the post store for recent posts from each account they follow, merges and sorts the results in application memory, and returns the result.

When pull works well: users follow few accounts, the write path must be simple, or the content changes frequently enough that precomputing is wasteful.

Where pull fails:

Read latency grows with the number of followed accounts. A user following 2,000 accounts requires touching 2,000 data locations per feed load - either as 2,000 sequential round trips or one large scatter-gather query. For most social platforms where median follow counts are in the hundreds, this is a difficult path to make fast at scale.

Real-time ranking is expensive. A ranked feed (Instagram’s non-chronological ordering, Twitter’s algorithmic feed) requires running ML scoring at read time against a fresh set of candidates. At feed load time, with millions of concurrent users, you need to fetch more candidates than you will show, score them, and return the top N - all within the latency budget of a single page load.

Twitter’s Architecture Evolution

Twitter started with a push-based model. Fan-out workers - background processes whose job is to consume post events and distribute them to follower feeds - wrote each new tweet into every follower’s precomputed timeline. Timelines were stored in a Redis cluster: multiple Redis nodes working together to hold more data than any single machine could hold in RAM, with the keyspace split across nodes.

For most users this worked: reads were fast, timelines were fresh, the system was conceptually simple.

Then Beyoncé announced her pregnancy at the 2013 Grammy Awards.

The fan-out workers were designed to handle ordinary write amplification. A user with 100,000 followers posting a tweet was significant - 100,000 Redis writes - but manageable. Beyoncé had 58 million followers at the time, and the post happened in real-time during a live broadcast, meaning millions of users were actively refreshing Twitter simultaneously.

The fan-out queue grew faster than the workers could drain it. Timelines fell behind. Users saw stale feeds. The system did not crash, but it visibly lagged.

The solution Twitter eventually shipped is the hybrid model, which is now the standard architecture across large social platforms.

The Hybrid Model

The hybrid model draws a line at a follower threshold. Accounts below the threshold (typical users) use push fan-out on write. Accounts above the threshold (celebrities, major media accounts) are excluded from the push fan-out entirely.

When a user loads their feed, the system:

Loads the user’s precomputed feed (populated via push fan-out from non-celebrity accounts they follow)
For each celebrity account the user follows, issues a pull query against that celebrity’s post store
Merges the two streams, sorted by timestamp (or ranking score, for an algorithmic feed)

This bounds both failure modes. Write amplification is bounded: no fan-out for high-follower accounts, regardless of follower count. Read latency is bounded: pull only applies to a small number of celebrity accounts, not to every followed account.

The system must classify accounts dynamically. A user who gains a million followers becomes a celebrity in the system’s view. Instagram uses a similar model, with the split point at roughly 100,000 followers and real-time adjustment based on actual traffic patterns.

Kafka as the Fan-Out Backbone

The fan-out operation is naturally event-driven: something happens (a user posts), and that event needs to trigger downstream work (updating follower feeds). This is a good fit for a message queue.

Kafka is a distributed messaging system. Producers write events to named channels called topics. A Kafka topic is just a durable, ordered log of events - think of it as a queue that is written to disk and can hold millions of entries. Multiple consumer processes can read from the same topic independently, each tracking where they are in the log.

In a feed system: when user A posts, a post event is written to a Kafka topic. A fleet of fan-out workers reads from that topic and writes to each follower’s Redis sorted set (for active users) and Cassandra row (for durability).

Kafka provides two important properties here. First, durability: if fan-out workers fall behind during a celebrity post spike, the event stays safely in Kafka until a worker can process it - nothing is lost. Second, backpressure: rather than the post event overwhelming workers directly, Kafka acts as a buffer. Workers drain the queue at whatever rate they can sustain. The fan-out lag is visible through Kafka consumer lag metrics - how many events are in the topic waiting to be processed - and bounded. The post itself is written immediately to the author’s post store; the fan-out is best-effort asynchronous - it will happen, just not necessarily within milliseconds.

ML-Ranked Feeds

Chronological feeds are straightforward to build. Ranked feeds - Instagram’s current default, Twitter’s algorithmic feed, Facebook’s News Feed - are a fundamentally different beast.

Ranking requires, at minimum: candidate retrieval (fetch recent posts from followed accounts), feature computation (engagement signals, recency, relationship strength, post-type preferences), and scoring (running candidates through a ranking model). This pipeline must complete within feed load latency constraints - typically under 200ms.

Most platforms solve this by pre-computing ranked candidates periodically (every few minutes) and caching the ranked feed. The feed you see may be a few minutes stale, but the ranking computation did not happen in your request’s critical path. When you scroll past the precomputed candidates, the system fetches and ranks more - this is why infinite scroll loads the first page instantly and occasionally shows a spinner.

Pagination: Why Offset-Based Is Wrong

Feeds are paginated - you load 20 posts, then 20 more as you scroll. There are two ways to implement this.

Offset-based pagination uses a numeric position: “skip the first 400 results, then return the next 20.” In SQL this looks like LIMIT 20 OFFSET 400. The problem is that the database still has to count through those 400 rows to know where to start - it is not a free operation. At offset 400 this is tolerable. At offset 10,000 on a large table, it is slow. Worse: if new posts arrive while you are paginating, all positions shift. The post that was at position 400 is now at position 401. Page 2 may contain items you already saw on page 1, or skip items entirely.

Cursor-based pagination replaces the position with a pointer to the last item you saw. Instead of “skip 400 rows,” the query says “give me posts older than this specific timestamp” - WHERE post_timestamp < cursor_timestamp LIMIT 20. This is a direct index lookup: the database goes straight to that timestamp in the index and returns the next 20 entries. It is fast regardless of how deep into the feed you are. It is also stable: new posts arriving at the top do not affect where your cursor points. Both Redis (ZREVRANGEBYSCORE with the cursor timestamp as the upper bound) and Cassandra (WHERE post_timestamp < cursor LIMIT 20) support cursor-based queries natively.

The Decentralized Future

Bluesky’s AT Protocol and Mastodon’s ActivityPub represent a different model: decentralized feeds where the feed algorithm is itself replaceable. In AT Protocol, feeds are computed by external feed generators - third-party services that define their own ranking and selection logic. A user can choose their feed algorithm the same way they choose their podcast app.

This shifts the architectural problem: instead of one company’s infrastructure handling all fan-out, the protocol standardizes how posts are published and discovered, and the computation is distributed. The hot partition problem does not go away - it just changes who owns it.

ML-ranked feeds are also under pressure. The recommendation that maximizes engagement can systematically surface outrage and division - this is a well-documented effect. Chronological and user-controlled algorithmic feeds are being re-explored not just for technical reasons but for product and societal ones.

Summary

Model	Read Cost	Write Cost	Storage	Best For
Push (fan-out on write)	O(1) single lookup	O(followers) writes	High - every post duplicated	Low-follower-count users
Pull (fan-out on read)	O(following) scatter-gather	O(1)	Low - one copy	Low-following-count users
Hybrid	O(celebrity count) pull + O(1) precomputed	O(non-celebrity followers)	Medium	Large-scale social platforms

Read Next:

Search & ElasticSearch - Indexing the World So It Can Be Found