gRPC - When REST Is Not Enough // Megha Bose

Helpful context:

REST is the default choice for APIs. It is simple, it works over plain HTTP, every language has client libraries, and browsers can speak it natively. For public-facing APIs and communication between loosely coupled services, REST is usually right. But inside a large system, where dozens of internal services call each other thousands of times per second, REST starts to show its costs: the overhead of text encoding, the absence of a schema contract, the lack of streaming, and the round-trip expense of HTTP/1.1. gRPC was built to address all of these at once.

What REST Gets Wrong for Internal Services

REST sends data as JSON: human-readable text. Parsing JSON is fast enough for most purposes, but at scale - millions of inter-service calls per minute - the CPU time spent serializing and deserializing adds up. JSON has no schema: if a service renames a field, clients break at runtime, not at compile time. JSON has no type system: a number might be an integer or a float, a nullable field might be missing or null, and neither the sender nor receiver knows in advance.

HTTP/1.1, which most REST APIs use, has head-of-line blocking: only one outstanding request per connection. Browsers work around this by opening multiple connections, but microservices doing this reintroduce TCP slow-start overhead. HTTP/2 solves this with multiplexing, but most REST implementations do not enable it.

Finally, REST is request-response: one request, one response, done. If you want to stream a large response or have the server push events, you bolt something on top - chunked transfer, Server-Sent Events, WebSockets - each with its own complexity.

gRPC addresses all of this systematically.

What gRPC Is

gRPC is a Remote Procedure Call framework. The idea is older than REST: instead of thinking about “resources” and “HTTP verbs,” you think about “calling a function on another machine.” The network is hidden behind what looks like a local function call.

gRPC was developed at Google, open-sourced in 2015, and is built on two technologies: Protocol Buffers for serialization and HTTP/2 for transport.

You define your API in a .proto file:

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (stream User);
}

message GetUserRequest {
  int64 user_id = 1;
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

The protoc compiler takes this definition and generates client and server stub code in whatever language you choose - Go, Python, Java, Rust, C++, Node.js, and many others. The generated client code has a GetUser method you call like any local function. The generated server code has an interface you implement. The network, serialization, and HTTP/2 framing are completely invisible.

Protocol Buffers: Schema-First Binary Serialization

Protocol Buffers (protobuf) is Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. Unlike JSON, it is binary and schema-first.

Each field has a number (the = 1, = 2 in the example) that is what actually goes on the wire. Field names are not serialized - only numbers. This makes the wire format compact. It also enables backward compatibility: if you add a new field to a message, old clients that don’t know about it simply ignore the unknown field number. If you remove a field, old clients that expected it get the default value (zero, empty string, etc.). As long as you never reuse a field number for a different type, old and new code can interoperate.

The binary format is significantly smaller and faster to parse than JSON. Google’s own benchmarks show protobuf messages typically 3-10x smaller than equivalent JSON, and 20-100x faster to serialize and deserialize. For services making millions of calls per minute, this is meaningful CPU and bandwidth savings.

The schema is also a contract. If a service adds a required field and the client does not send it, the error surfaces at compile time (when you regenerate the client code) or at the schema validation layer - not as a mysterious null pointer exception in production.

HTTP/2: What It Unlocks

gRPC uses HTTP/2 as its transport, and this is not incidental. HTTP/2’s properties are precisely what make gRPC’s capabilities possible.

Multiplexing: multiple RPC calls can be in flight simultaneously over a single TCP connection. Unlike HTTP/1.1’s one-request-at-a-time constraint, HTTP/2 assigns each request to a numbered stream. Response frames for different streams interleave on the wire. A slow response on one call does not block fast responses on others.

Header compression: HTTP/1.1 sends the full set of headers (content-type, authorization, etc.) on every request. HTTP/2 compresses headers using HPACK and can refer back to previously sent headers. For gRPC calls that send the same authorization token on every request, this saves significant bytes.

Binary framing: HTTP/2 frames are binary, not text. The HTTP/1.1 parsing problem (headers are free-form text with complex rules) is gone. This is both faster and less error-prone.

Full-duplex streaming: a single HTTP/2 stream can carry frames in both directions simultaneously. This enables gRPC’s streaming modes in a way HTTP/1.1 cannot.

Four Communication Patterns

This is where gRPC genuinely surpasses REST. While REST is locked into request-response, gRPC supports four interaction modes defined in the .proto file:

Unary RPC: one request, one response. Equivalent to a REST call. The standard case.

Server streaming: the client sends one request; the server sends back a stream of responses. Useful for: sending a large dataset row by row without buffering it all first, streaming live prices or events, returning paginated results without pagination tokens.

rpc ListAllOrders (ListOrdersRequest) returns (stream Order);

Client streaming: the client sends a stream of messages; the server responds once at the end. Useful for: uploading a large file in chunks, sending a batch of sensor readings, aggregating many inputs into one output.

rpc UploadChunks (stream Chunk) returns (UploadResult);

Bidirectional streaming: both sides send a stream of messages simultaneously. Useful for: chat applications, real-time collaborative editing, game state synchronization, live call transcription where you send audio and receive transcript fragments concurrently.

rpc Chat (stream ChatMessage) returns (stream ChatMessage);

REST requires bolting on WebSockets or SSE to approximate even the server-streaming case. gRPC offers all four patterns natively, with the same code generation and the same connection management.

Deadlines, Cancellation, and Interceptors

gRPC has first-class support for deadlines: a client can attach a deadline to any call, meaning “if this is not done by time T, give up.” The deadline propagates through the entire call chain. If service A calls B which calls C, and A’s deadline expires, the cancellation propagates to C. This is how you prevent a slow dependency from causing cascading timeouts across the entire system.

Cancellation is the corollary: a client can cancel an in-flight RPC. The server is notified and can stop doing work. For streaming RPCs, this is especially important: a server streaming a large dataset can stop generating data the moment the client disconnects.

Interceptors are the gRPC equivalent of middleware. They wrap every RPC call on the client or server side, allowing you to inject cross-cutting concerns: authentication, logging, metrics, distributed tracing, retry logic. An auth interceptor on the server verifies the bearer token on every incoming call. A tracing interceptor propagates trace IDs across service boundaries. These are defined once and applied everywhere.

gRPC vs REST: When to Use Which

Dimension	REST	gRPC
Schema enforcement	None by default (optional: OpenAPI)	Strong (protobuf required)
Serialization	JSON (text, verbose)	Protobuf (binary, compact)
Transport	HTTP/1.1 (usually)	HTTP/2 (always)
Streaming	Bolted on (SSE, WebSockets)	Native (4 modes)
Browser support	Native	Limited (grpc-web proxy needed)
Debugging	Easy (readable JSON, curl)	Harder (binary protocol, needs grpcurl)
Code generation	Optional	Required
Ecosystem	Enormous	Large but smaller

REST wins for: public APIs where consumers are unknown, browser clients (gRPC requires a translation proxy in browsers), APIs where human readability matters for debugging, teams without a good protobuf workflow.

gRPC wins for: internal microservice communication where both client and server are under your control, high-throughput low-latency calls, streaming data, polyglot environments where you want type-safe generated clients in multiple languages, and anywhere you want the contract encoded and verified automatically.

Where gRPC Is Used in Production

Google uses gRPC for all of its internal service communication. Netflix uses it internally. Square, Dropbox, Cisco, and most large tech companies have adopted it for their internal microservice meshes. Kubernetes uses gRPC for communication between its control plane components. The Envoy proxy, which forms the data plane of most service meshes (Istio, Linkerd), speaks gRPC natively and uses it for its xDS (configuration) protocol.

The pattern that emerges in practice: REST for the external-facing API layer (because public clients need it), gRPC for internal service-to-service communication (because the performance and contract guarantees matter, and you control both ends).

Read next: