Prerequisite: Networking: IP, TCP, HTTP · Databases & Indexes

System design is the discipline of turning requirements into an architecture. It does not have right answers in the mathematical sense - it has trade-offs. The same feature at different scales requires a different design, and the same design at different team sizes requires different operational trade-offs. This post covers the mental framework for approaching a design problem and the standard building blocks that appear in almost every large system.

How to Approach a Design Problem

A design session without structure produces local optimizations without a coherent whole. A reliable sequence:

1. Clarify requirements. Separate functional requirements (what the system does) from non-functional requirements (how fast, how available, how consistent). “Build a URL shortener” leaves open: how many redirects per second? Is custom alias required? How long are links valid? What happens on link expiry - 404 or a redirect to a landing page?

2. Estimate scale. Back-of-envelope calculations identify which components will be stressed. They do not need to be precise - order of magnitude is enough.

3. Design the data model. What entities exist? What are the read and write patterns? The data model drives storage technology choice.

4. Design the API. What do clients call, and with what parameters? This forces you to think about what operations the system supports before you commit to internal designs.

5. Design components. Work top-down from client to storage. Identify where each request goes and why.

6. Identify bottlenecks. Where does the design break as scale grows? What fails first - storage, compute, network? Address each bottleneck with a known pattern.

Back-of-Envelope Calculations

Numbers worth memorizing:

  • 1 million requests/day ≈ 12 requests/second (divide by 86,400)
  • A typical web request payload: 1–10 KB
  • A database row: 100 bytes to 1 KB
  • SSD random read: ~100 microseconds; RAM: ~100 nanoseconds; network round trip (same datacenter): ~500 microseconds

For a URL shortener handling 100 million redirects per day: 100M / 86,400 ≈ 1,200 requests/second peak. If you provision for 2–3× average, target ~3,000 RPS. A single Nginx instance handles tens of thousands of RPS; a single PostgreSQL read replica handles thousands. At 1,200 RPS, a single server is fine - you are designing for failure tolerance and growth, not raw throughput.

Storage: 100M new URLs/day × 365 days × 500 bytes/record ≈ 18 TB/year. That fits on a single large database server but grows fast - plan for partitioning or archival.

Horizontal vs Vertical Scaling

Vertical scaling: Add more resources to one machine. Simple operationally, but has a ceiling and is a SPOF if not paired with redundancy.

Horizontal scaling: Add more machines. Nearly unbounded, but requires the application to be stateless - a request routed to any server must produce the same result. State (sessions, in-progress computations) must live in an external store, not in server memory.

Designing services to be stateless is one of the highest-leverage architecture decisions. It costs almost nothing early and enables every horizontal scaling strategy later.

Caching Layers

Caching is the most effective tool for reducing latency and shielding databases from load. There are multiple levels:

Client-side cache: Browser caches HTTP responses using Cache-Control and ETag headers. Zero server cost on cache hit.

CDN: Geographically distributed edge servers cache static assets (images, JavaScript, CSS) and sometimes dynamic responses. A user in Tokyo hits an edge node 10ms away rather than an origin server in us-east-1 200ms away.

Reverse proxy cache: Nginx or Varnish in front of your application servers. Caches at the HTTP layer.

Application cache (Redis/Memcached): Your application code explicitly reads from and writes to the cache. Flexible - you decide what to cache and how to key it. Redis adds persistence, pub/sub, and data structures; Memcached is simpler and faster for pure caching.

Database query cache: Some databases cache query results internally. Less useful than application-level caching because you have less control.

Cache Invalidation Strategies

Getting cache data in is easy. Keeping it correct is the hard part.

  • TTL (time to live): Data expires after N seconds. Simple, but stale data is possible up to TTL.
  • Write-through: Write to cache and database together. Cache is always current; adds write latency.
  • Write-back (write-behind): Write to cache immediately, persist to database asynchronously. Lowest write latency; risk of data loss on cache failure.
  • Write-around: Write to database, skip cache. Cache is populated on next read miss. Useful for data that is written once and rarely read.

The right strategy depends on how stale data can be tolerated and how often writes happen versus reads.

Load Balancers

A load balancer distributes incoming requests across a pool of servers. Common algorithms:

  • Round-robin: Each server gets the next request in turn. Simple, works when requests are uniform.
  • Least connections: Route to the server with the fewest active connections. Better for variable-length requests.
  • Consistent hashing: Route requests for the same key to the same server. Used for cache affinity (though at the cost of flexibility when servers are added or removed).

Load balancers also perform health checks, removing unhealthy servers from the pool automatically. They can terminate TLS, so backend servers handle plaintext. Layer 4 load balancers operate on TCP (fast, protocol-agnostic); Layer 7 load balancers inspect HTTP (can route on path, headers, or body content).

API Gateway

An API gateway is a single entry point for all client traffic. It handles cross-cutting concerns that would otherwise be duplicated in every service: authentication, rate limiting, request logging, routing. Clients call the gateway; the gateway calls the appropriate service.

This pattern simplifies clients - they need one hostname and one auth mechanism - and centralizes policies. The trade-off: the gateway becomes a critical path component and a potential bottleneck.

Asynchronous Processing

Some work does not need to happen in the request/response cycle. Sending a confirmation email, generating a thumbnail, processing a payment with a retry loop - these are better done asynchronously:

  1. Client sends a request.
  2. Server queues the work and returns a job ID immediately.
  3. A worker picks up the job, executes it, and stores the result.
  4. Client polls for status or receives a push notification.

This pattern keeps API latency low regardless of how long the background work takes. It also enables retries and rate limiting on the work queue without affecting the client.

Database Choice

Choosing storage technology is a consequential early decision. The main categories:

  • Relational (PostgreSQL, MySQL): Strong consistency, flexible queries, ACID transactions. Default choice until a specific need forces otherwise.
  • Document (MongoDB, Firestore): Schema-flexible, good for hierarchical data. Loses joins and cross-document transactions.
  • Key-value (Redis, DynamoDB): O(1) lookup by key. No joins, no queries by non-key fields. Excellent for caches, sessions, leaderboards.
  • Time-series (InfluxDB, TimescaleDB): Optimized for append-heavy workloads with time-based queries. Used for metrics, monitoring, IoT.
  • Search (Elasticsearch): Full-text search and aggregations. Not a primary store - synced from the primary database.

Start with a relational database. Only diverge when you have a concrete reason: need for extreme write throughput (time-series), flexible schema evolution (document), or sub-millisecond key lookups at scale (key-value).

Examples

Design a URL shortener:

Requirements: Shorten URLs, redirect on visit, custom aliases optional, links never expire. Scale: 1,000 writes/day, 100M reads/day.

Components:

  • API server: POST /shorten returns a short code; GET /{code} returns 301 redirect.
  • Database: maps short code → long URL. PostgreSQL handles this easily at this scale.
  • Cache: Redis caches popular codes. 100M reads/day to the database without caching would work but is wasteful - most reads are for a small fraction of popular links.

Short code generation: Hash the long URL (MD5, take first 7 characters) or use a distributed counter. Counter is simpler and avoids collisions at scale.

Bottleneck at 10B reads/day: Add more read replicas or shard the mapping table by code prefix. Move cache to a distributed Redis cluster.

Estimate Twitter’s storage:

500M tweets/day × 280 bytes/tweet ≈ 140 MB/day of tweet text. Add metadata (timestamps, user IDs, engagement counts): ~500 bytes/tweet → 250 MB/day. Over ten years: ~1 TB of tweet text. Images and video are orders of magnitude larger - Twitter stores media on object storage (S3-compatible) and only stores URLs in the tweet database.


Read Next: Latency, Throughput, & Queues · Consistency & CAP Theorem · Message Queues & Event-Driven Architecture