The Live Web - Real-Time Data Without Polling Forever
Helpful context:
- Networking - How Packets Find Their Way Across the Internet
- REST APIs - Resources, Verbs, and the Architecture of the Web
A live sports scoreboard ticks from 2-1 to 2-2 while you’re watching. A ChatGPT response streams word by word as it’s generated. A Figma canvas shows your teammate’s cursor moving in real time. None of these involved you hitting refresh. The server decided to send you data, and your browser received it. How?
The answer depends on which decade you’re in. Each era of the live web invented a progressively less embarrassing hack, until we got the two clean solutions we use today.
The Problem with HTTP’s Mental Model
HTTP was designed for documents. The model is request-response: you ask, the server answers, the connection closes. This works magnificently for loading a webpage. It fails immediately when you need the server to tell you something - when the server is the one with new information.
The core constraint: in HTTP, the client must initiate every exchange. The server has no mechanism to reach out unprompted. Every real-time feature built on HTTP before 2011 was an attempt to fake server-initiated communication using client-initiated requests.
The Polling Era: Necessary and Terrible
Simple polling: every N seconds, the client sends a fresh HTTP request. “Anything new?” The server responds immediately: usually “no,” occasionally “yes, here’s the update.” At one request per second across 50,000 concurrent users, that’s 50,000 HTTP requests per second, the vast majority returning nothing useful. The application server burns CPU and connections processing empty responses. This is how early email clients worked - and why they felt slow.
Long-polling was the first genuine improvement. The client sends an HTTP request, but instead of responding immediately, the server holds the connection open until it has data to send. When something happens, the server responds. The client immediately opens a new long-poll request. This cuts down on empty responses but keeps server threads or file descriptors occupied for the duration of each hold. Under load, this becomes a resource exhaustion problem.
COMET (coined around 2006) was the umbrella term for these techniques - streaming HTML, persistent XHR, multi-part responses. Browser vendors hadn’t standardized anything, so implementations were fragile and browser-specific. It worked, but it was an infrastructure tax everyone paid to get a feature that felt like it should be native.
Server-Sent Events: The Underrated Solution
SSE (Server-Sent Events), standardized in the HTML5 spec in 2011, is the simplest possible server push mechanism. The client makes one normal HTTP request. The server responds with Content-Type: text/event-stream. The response never ends. The server writes newline-delimited text events down the open connection whenever it has something to say.
The wire format is deliberately minimal:
id: 1
event: score-update
data: {"home": 3, "away": 2, "minute": 67}
id: 2
event: score-update
data: {"home": 3, "away": 3, "minute": 89}
Each event ends with a blank line. Fields: data (required), event (optional event type), id (for resumption), retry (reconnect delay in milliseconds). The browser’s EventSource API handles all the plumbing:
const source = new EventSource('/api/match/87654/live');
source.addEventListener('score-update', (event) => {
const payload = JSON.parse(event.data);
updateScoreboard(payload);
});
source.addEventListener('match-end', () => {
source.close();
});
source.onerror = (err) => {
// EventSource reconnects automatically, sending Last-Event-ID
// so the server can replay missed events
};
The automatic reconnection is one of SSE’s best features - you get it for free without any client-side retry logic. When the connection drops, EventSource reconnects and sends a Last-Event-ID header with the last event ID it received. Your server can use this to replay events from that point, giving you guaranteed delivery without any extra infrastructure.
SSE is unidirectional: server to client only. That’s a constraint worth thinking about before dismissing it. A huge proportion of “real-time” features only need data to flow one way: live feeds, notifications, progress indicators, dashboards, streaming AI responses. ChatGPT’s streaming output is SSE. GitHub Copilot’s inline suggestions are SSE. Vercel’s build log streaming is SSE.
SSE is HTTP. It works through every proxy, CDN, and load balancer without special configuration. It uses HTTP/2’s multiplexing naturally - multiple SSE streams on one connection. It’s cacheable, compressible, and observable with standard HTTP tooling. These are enormous operational advantages that WebSockets doesn’t share.
WebSockets: Full-Duplex Over a Persistent Connection
WebSockets (RFC 6455, 2011) provide genuine bidirectional communication - both client and server can send frames at any time without waiting for the other. The connection begins as HTTP and then upgrades:
GET /chat HTTP/1.1
Host: chat.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The server responds with 101 Switching Protocols. From this point, the connection speaks the WebSocket framing protocol rather than HTTP. Data travels in frames: small binary units carrying text, binary data, ping/pong heartbeats, or close signals.
const ws = new WebSocket('wss://chat.example.com/rooms/general');
ws.onopen = () => {
ws.send(JSON.stringify({ type: 'join', user: 'megha' }));
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
appendMessage(message);
};
ws.onclose = (event) => {
// event.code and event.reason explain why
if (event.code !== 1000) { // 1000 = normal closure
scheduleReconnect();
}
};
ws:// is unencrypted. wss:// is TLS-wrapped - always use wss:// in production. Unlike SSE, WebSockets have no automatic reconnection. Your code must detect closure and re-establish the connection, implement exponential backoff, and handle the case where the server is temporarily unavailable.
Binary frames are a meaningful advantage for certain workloads. Sending game state, audio/video data, or compressed payloads over WebSockets is more efficient than base64-encoding binary data into JSON for SSE.
The Connection Lifecycle and Heartbeats
WebSocket connections are persistent by design, but networks are not. Firewalls and NAT devices drop idle connections after timeouts (typically 60 - 300 seconds). A WebSocket that sends no data for 2 minutes may find its connection silently dropped, with neither side knowing.
The fix is application-level heartbeats. The WebSocket spec defines ping/pong frames for this purpose. The server sends a ping frame; the client must respond with a pong. If no pong is received within a timeout, the server closes the connection. This also surfaces dead clients - a user who closed their laptop mid-session - so the server can clean up the associated state.
# FastAPI WebSocket with heartbeat
import asyncio
from fastapi import WebSocket
async def heartbeat(websocket: WebSocket):
while True:
await asyncio.sleep(30)
try:
await websocket.send_text("ping")
except Exception:
break
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
heartbeat_task = asyncio.create_task(heartbeat(websocket))
try:
while True:
data = await websocket.receive_text()
await handle_message(data, websocket)
except Exception:
pass
finally:
heartbeat_task.cancel()
The Stateful Problem: Load Balancers and WebSockets
This is the most important architectural challenge with WebSockets, and the reason many teams reach for pub/sub infrastructure even before they need it.
An HTTP request is stateless. Any server in a fleet can handle any request - round-robin load balancing works perfectly. A WebSocket connection is stateful. The connection persists on a specific server process, and state accumulates there: room membership, message history, user presence. When user A on server-1 sends a message that user B on server-2 needs to receive, server-1 has no way to reach B’s connection directly.
Sticky sessions are the naive fix: configure the load balancer to route a given user to the same server every time (using a session cookie or IP hash). This works until a server dies - all its connections are dropped and clients must reconnect, potentially to different servers. It also causes uneven load distribution: if a particular server got unlucky and has more active connections, it stays overloaded.
Pub/Sub backend is the proper solution. Each server process subscribes to a shared message bus (Redis Pub/Sub, Kafka). When server-1 needs to deliver a message to all users in a chat room, it publishes the message to the room’s channel. Every server subscribed to that channel delivers it to their locally connected clients in that room.
User A (server-1) → sends message
Server-1 → PUBLISH room:general "{message}"
Redis → fans out to all subscribers
Server-1 → delivers to A's other connections in room
Server-2 → delivers to B
Server-3 → delivers to C, D
This is how Slack and Discord operate at scale. The individual WebSocket servers are stateless with respect to message routing - they just need to know which local connections are in which rooms.
Real Systems: Slack, Discord, ChatGPT
Slack uses WebSockets for real-time message delivery. The desktop client maintains a persistent WebSocket connection to a Slack gateway server. When you send a message, it goes over the WebSocket. When someone else sends a message to a channel you’re in, Slack’s backend publishes to a topic for your user and the gateway server delivers it over your WebSocket. Slack has detailed about using Kafka as the pub/sub backbone for this fan-out.
Discord serves millions of concurrent users and has published their WebSocket architecture: each gateway server can maintain hundreds of thousands of connections. Users connect to a shard (a specific gateway server) determined by their user ID, which ensures consistent routing. Discord also moved from Erlang to Rust for their gateway to improve memory efficiency per connection.
ChatGPT uses SSE for streaming responses, not WebSockets. When you submit a prompt, the page makes an HTTP POST to the API. The response is Content-Type: text/event-stream. As the model generates tokens, they’re streamed as SSE events. This is a one-way server push scenario - exactly what SSE is designed for. There’s no need for a persistent bidirectional connection. The simplicity is a feature.
Choosing Between WebSocket and SSE
The decision framework is simpler than most articles make it:
Does the client need to send data to the server over the real-time channel? If yes, WebSocket. If no, seriously consider SSE first.
| Scenario | Recommendation |
|---|---|
| Live feed / notifications / news ticker | SSE |
| Streaming AI responses | SSE |
| Progress indicators for background jobs | SSE |
| Live dashboards and monitoring | SSE |
| Chat (messages flow both ways) | WebSocket |
| Collaborative editing (Figma, Google Docs) | WebSocket |
| Multiplayer games | WebSocket (or WebRTC) |
| Financial trading terminals | WebSocket |
SSE’s advantages are often undersold. It works through all HTTP proxies without configuration. Nginx, Cloudflare, AWS ALB - they all handle SSE transparently. WebSocket’s Upgrade header requires explicit proxy support, and some corporate firewalls block non-HTTP connections. SSE reconnects automatically; WebSocket doesn’t. SSE is simpler to debug (it’s just text over HTTP).
The main advantage of WebSocket is bidirectionality and binary frame support. If you need the client to send data in real-time - and most truly interactive applications do - WebSocket is the right choice.
Scaling Beyond One Server
Both SSE and WebSocket connections are long-lived, which creates two scaling challenges: connection count per server, and message fan-out.
Connection count: A modern server with non-blocking I/O (Node.js, Python asyncio, Go, Rust) can handle tens of thousands of concurrent connections per instance. The bottleneck is memory per connection, not CPU. Each WebSocket connection needs state - authentication info, subscriptions, send buffer. At 100KB per connection, 100,000 connections needs 10GB of RAM. Horizontal scaling of gateway servers distributes this.
Fan-out with Kafka: For high-throughput scenarios (broadcast to millions, or high-frequency updates), Redis Pub/Sub can become a bottleneck. Kafka is more appropriate: it’s persistent (messages are durable, not lost if a consumer is slow), partitioned (parallel consumption), and handles millions of messages per second. The tradeoff is operational complexity - Kafka is not a simple Redis PUBLISH.
AWS API Gateway’s WebSocket API provides a managed WebSocket service. The connection state and routing are handled by AWS; you implement Lambda functions that handle $connect, $disconnect, and message routes. AWS handles the connection scaling; you pay per message and per connection-minute. This removes the operational burden of managing WebSocket servers but introduces cold start latency for Lambda and per-invocation costs that can become significant at scale.
The Critique: WebSockets and the Stateful Tax
WebSockets impose a stateful tax on your architecture that doesn’t disappear - it just moves around. If you don’t use sticky sessions, you need pub/sub. If you use Redis Pub/Sub, you’ve added a dependency whose failure brings down real-time features for all users. If you use Kafka, you’ve added a system that requires a team to operate.
SSE is genuinely underused. The industry has a bias toward WebSockets because bidirectionality feels more “real-time,” even in applications where clients only need to receive data. Choosing WebSocket when SSE suffices adds connection management complexity (reconnection logic), requires more elaborate server infrastructure, and provides no user-visible benefit.
The other critique: WebSocket messages are not automatically persisted or replayed. If a client disconnects and reconnects, it misses events during the gap unless you implement event replay in your backend. SSE has the id and Last-Event-ID mechanism for this, but WebSocket has no equivalent - you implement it yourself or lose messages.
Future Outlook
WebTransport is a new protocol spec built on HTTP/3/QUIC that provides unreliable (datagram) and reliable (stream) bidirectional communication in the browser. It combines WebSocket’s bidirectionality with HTTP/3’s connection properties - no head-of-line blocking, 0-RTT reconnection, no upgrade handshake. The spec is finalized and Chrome ships it; cross-browser support is maturing. For game-like applications and real-time collaboration at low latency, WebTransport may eventually replace WebSockets.
At the infrastructure layer, managed real-time services (Ably, Pusher, Liveblocks) abstract away WebSocket server management. You publish events to an API; they handle connection management, fan-out, and reliability. The tradeoff is cost and lock-in, but for teams that don’t want to operate real-time infrastructure, the operational savings are substantial.
| Concept | Key Point |
|---|---|
| HTTP polling | Simple; wasteful; client always initiates |
| Long-polling | Holds connection until data available; resource-intensive at scale |
| SSE | HTTP persistent stream; server-to-client only; auto-reconnect with Last-Event-ID |
| WebSocket | Full-duplex; client and server both send; no auto-reconnect |
| SSE use cases | Live feeds, streaming AI, dashboards, notifications |
| WebSocket use cases | Chat, collaborative editing, gaming, trading |
| Heartbeats | Detect dead connections; required for WebSockets in production |
| Sticky sessions | Routes user to same server; breaks on failure; naive solution |
| Pub/Sub backend | Redis or Kafka fan-out; servers are stateless; correct production pattern |
| ChatGPT streaming | Uses SSE, not WebSocket - one-way push |
| WebTransport | QUIC-based; unreliable + reliable streams; future of low-latency browser comms |
Read Next: