Helpful context:


A user in Mumbai opens Instagram and their feed appears in under a second. The engineers who built Instagram are in California. The servers that stored those photos are in us-east-1, physically 13,000 kilometers away. At the speed of light in fiber (roughly 200,000 km/s), the round-trip signal alone from Mumbai to Virginia takes about 130ms - before the server has done any work. A page that makes 20 asset requests would spend over 2.5 seconds just on physics.

Instagram does not solve this with faster servers. It solves it by moving the data physically closer to the user before the user asks for it.

That is what a CDN does.

The Latency Ceiling

Before getting into CDN mechanics, it is worth sitting with the core constraint: latency is bounded by physics.

A round-trip from Singapore to a US-east server is roughly 180ms just for the signal. That is the floor - you cannot go faster than light through fiber. No amount of better code, faster CPUs, or more RAM overcomes it. The only way to reduce this latency is to move the data geographically closer to where it is needed.

This is not a new insight. Amazon figured out that customers in Germany were experiencing slow loads from US servers. The solution was not to buy better hardware in the US. It was to build a datacenter in Germany. A CDN is the automated, cached version of this: instead of building full datacenters with application logic everywhere, you put lightweight caching servers everywhere and let them absorb the bulk of the traffic.

What a CDN Is

A CDN is a globally distributed network of servers called Points of Presence (PoPs) or edge nodes, located in cities around the world. Instead of every user fetching content from your origin server (your application server in one datacenter), they fetch from the nearest PoP. A user in Mumbai gets content from the PoP in Mumbai or Singapore. A user in Frankfurt gets it from Frankfurt.

Cloudflare operates roughly 300 PoPs in over 100 countries. AWS CloudFront has over 400 edge locations. A request from almost anywhere in the world hits a PoP within 20-50ms instead of 150-300ms for a far origin.

The key mechanism: PoPs cache copies of content. The first time a user in Mumbai requests profile-photo-42.jpg, the PoP does not have it - it fetches the image from your origin server, caches it locally, and returns it to the user. Every subsequent request for that image from any user in Mumbai (or nearby) is served from the PoP’s local cache. Your origin never sees those requests.

How a Request Actually Travels

Walk through what happens when a user requests https://static.instagram.com/photo-abc.jpg:

DNS resolves to the nearest PoP. CDNs use anycast routing: many PoPs share the same IP address range, and the internet’s BGP routing protocol automatically steers each request toward the topologically closest PoP. The user does not choose; the network chooses for them.

TCP handshake happens over 20ms instead of 150ms. The TLS negotiation happens close by. This alone saves 200-400ms on the initial connection.

The PoP checks its local cache. If the image is there and not expired: return it immediately. This is a cache hit. The origin server is not involved at all.

On a cache miss: the PoP makes a request upstream - either to a regional cache (a mid-tier PoP) or directly to your origin. The origin returns the image with cache headers. The PoP stores it and returns it to the user. All future requests to this PoP for this URL are cache hits.

For popular content (a viral post, a celebrity’s photo, a product on a sale page), the cache hit rate approaches 99%. Your origin server handles the remaining 1% - effectively 100x less traffic than without a CDN.

Cache-Control Headers: The Contract

The origin server controls how long PoPs cache content through HTTP response headers. Getting this right is the core of CDN configuration - the CDN respects your headers. If you do not tell it to cache, it often will not.

Cache-Control: public, max-age=31536000, immutable

public means any shared cache (CDN, proxy) can store this, not just the user’s browser. max-age=31536000 is one year in seconds. immutable tells the browser not to revalidate during a refresh - a hint that the content will never change. This is the right header for versioned static assets (JavaScript bundles, fonts, images with a hash in the filename like logo-a3f9c1.png).

Cache-Control: no-cache

Confusingly, no-cache does not mean “do not cache.” It means: cache it, but always revalidate before serving. The CDN sends a conditional request to the origin (If-None-Match or If-Modified-Since). If the origin responds 304 Not Modified, the CDN serves its cached copy without re-downloading. This is good for content that changes occasionally but you want users to always see the current version.

Cache-Control: no-store

This means never cache. Use it for sensitive, user-specific responses: bank statements, health records, anything that must not be stored on shared infrastructure.

Cache-Control: private, max-age=3600

private tells CDN nodes not to cache this response - only the user’s browser can cache it. Use this for personalized responses (a user’s dashboard, their settings page) where caching on a shared PoP would serve one user’s private data to another.

Origin Pull vs Origin Push

There are two models for getting content onto PoP caches:

Origin pull (the default): PoPs are lazy. They only fetch from the origin when they get a cache miss. Content propagates to PoPs on demand. A photo stored in us-east-1 only reaches the Mumbai PoP the first time a Mumbai user requests it. After that first miss, all Mumbai users get it from cache.

This works well for most content. If nobody in São Paulo ever requests a particular file, it wastes no space in the São Paulo PoP.

Origin push: you proactively push content to all PoPs before requests arrive. Netflix uses a variant of this with extraordinary commitment: it physically ships Open Connect Appliances - servers it has designed and manufactured - directly to ISPs and internet exchanges. These appliances sit inside Telstra’s network, inside Comcast’s network, inside Airtel’s network. At night, Netflix pre-loads the most-watched content in each region onto these appliances. When a user in Bangalore streams a show the next day, the video bytes travel from a server inside Airtel’s network, not across the ocean from AWS us-east-1.

Netflix has effectively become its own last-mile CDN by owning the hardware inside ISPs. This is extreme, but it explains how Netflix can serve 15 petabytes of data per day without AWS bandwidth bills that would destroy their margins.

Video Delivery: The Adaptive Bitrate Problem

Serving a video file is fundamentally different from serving an image. An image has a fixed size. A video is a stream - and the user’s bandwidth changes while they are watching. You cannot serve 4K video to a user on a flaky 4G connection; you will buffer constantly. But you also do not want to serve 480p to a user on gigabit fiber.

The solution is adaptive bitrate streaming: the video is encoded at multiple quality levels and cut into short segments. The player measures download speed continuously and picks the right quality for each segment.

HLS (HTTP Live Streaming), developed by Apple, and DASH (Dynamic Adaptive Streaming over HTTP), the open standard, both use this mechanism. When you upload a video to YouTube, the platform encodes it at roughly 8 quality levels (240p through 4K) and cuts each into 2-10 second segments. The CDN stores all of these segments. The player downloads a manifest file (.m3u8 for HLS, .mpd for DASH) that describes all available variants:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=854x480
480p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8

The player picks a starting quality based on an initial bandwidth estimate. After each 2-second segment downloads, it measures how fast the download was and adjusts. If the 2-second segment downloaded in 0.5 seconds - four times playback speed - the player can afford higher quality. If a segment took 3 seconds to download, the player steps down to avoid buffering.

This algorithm runs continuously, every segment. The quality shifts you see on YouTube as you move from WiFi to mobile data are this in action. Every segment request is an ordinary HTTP GET, which CDNs cache like any other file. Manifest files have a short TTL (so live streams can update them), but segments are immutable once encoded and can be cached for years.

This also answers a common interview question: yes, YouTube literally stores separate encoded copies for each resolution. A 1-hour video at 8 quality levels is 8 full copies, each cut into roughly 1800 two-second segments. The storage cost is real and significant. The adaptive streaming experience it enables is what makes it acceptable.

Image Delivery at Scale

Static images have their own optimization pipeline before reaching users.

Format selection: JPEG is the baseline, but AVIF achieves 50% smaller file sizes at equivalent visual quality. WebP sits in between. Modern CDNs detect the browser’s capabilities from the Accept request header and transparently serve the best format:

# Browser sends:
Accept: image/avif,image/webp,image/apng,image/*,*/*;q=0.8

# CDN serves AVIF to Chrome, WebP to older Safari, JPEG to IE11

The origin stores one master image. The CDN encodes and caches the right format for each browser type on first request. Subsequent requests for the same format hit the cache.

Responsive sizing: a 4000x3000 photo served to a phone displaying it at 400x300 wastes 90% of the bandwidth. CDNs can resize on demand: image.jpg?w=400&h=300&fit=crop. The resized version is cached at the edge - the first request generates it, every subsequent request serves from cache.

Progressive loading: JPEG supports progressive encoding. Instead of loading the image top-to-bottom as a stripe, a progressive JPEG loads as a blurry full image first, then sharpens in passes. Instagram uses low-quality image placeholders (LQIP): a tiny 20x20 blurred version is encoded directly into the HTML as a base64 data URI, displayed instantly while the full image loads in the background.

This is why Netflix can show you a thousand thumbnails the moment you hover over the progress bar. Each thumbnail is tiny, aggressively compressed, and already cached at your nearest PoP - a server 20ms away, not 200ms.

CDN Invalidation: Updating Cached Content

If you update a file and the CDN has a cached version with a long TTL, users see the old file until the cache expires. This is the CDN invalidation problem.

The best solution is to make it impossible: content-addressed filenames. Include a hash of the file content in the filename: styles-a3f9c1.css. When the CSS changes, the hash changes, the filename changes, the CDN treats it as a new file and caches it fresh. The old file quietly expires without any cache purge needed. Every modern frontend build tool (Webpack, Vite, Next.js) does this by default.

When you cannot use content-addressed names - for HTML pages, API responses, or canonical URLs - CDN providers expose an API to explicitly purge cached copies. CloudFront’s CreateInvalidation API removes entries from all edge nodes globally. This takes 10-60 seconds to propagate. Use it for emergencies: a wrong price on a product page, a broken image at a stable URL.

Why the API Latency Is Different in Different Countries

When someone asks “the API responds in 200ms in India but 1.5 seconds in the US,” the diagnosis almost always involves geography and CDN placement.

Dynamic API responses (personalized data, database-driven content) cannot be cached on a CDN - they are different for every user. Those requests must reach the origin server. If the origin is in us-east-1 and the user is in Singapore, every API call travels to Virginia and back: 180ms just for the signal, plus server processing time.

The solutions, in order of increasing complexity:

  • CDN for static assets: ensure JS, CSS, images, and fonts are served from a CDN. Dynamic API calls will still travel to origin, but the page can render before those calls return.
  • API edge caching: for API responses that are the same for many users (a public news feed, product catalog, search results), add cache headers and let the CDN cache them.
  • Edge functions: run API logic at the CDN edge (Cloudflare Workers, Vercel Edge Functions). The function executes in the PoP closest to the user. Stateless logic runs in 5ms from anywhere in the world.
  • Multi-region deployment: run a copy of your application and database in multiple regions. Users in Singapore hit the Singapore region; users in Virginia hit us-east-1. This solves the latency problem at the cost of managing data consistency across regions - see the Multi-Region Architecture post for what that entails.

Concept What it means in practice
PoP (Point of Presence) CDN server in a city; absorbs requests without touching your origin
Anycast DNS routes users to the nearest PoP automatically using BGP
Cache hit PoP serves from local storage; origin never contacted
Cache-Control: public CDN is allowed to cache this response
Cache-Control: no-cache Cache but always revalidate; useful for semi-static content
Cache-Control: no-store Never cache; for private user data
Origin pull PoPs fetch content on first miss - the default
Origin push Pre-load PoPs before demand; Netflix Open Connect is the extreme version
HLS / DASH Video segmented into 2-10s chunks; player picks quality per segment
Manifest file .m3u8 or .mpd; index of quality variants; player downloads first
Adaptive bitrate Player adjusts quality each segment based on measured download speed
Content-addressed names Hash in filename makes CDN invalidation unnecessary

Read Next: