Classical Cryptography - Perfect Secrecy Before Computers Made It Practical // Megha Bose

Helpful context:

Modular Arithmetic - Arithmetic That Wraps Around

Cryptography begins with a simple goal - send a message that only the intended recipient can read. The history of classical cryptography is a cycle of clever constructions and clever attacks. Each cipher seemed secure until someone found a systematic way to break it, demanding something stronger. Caesar shifts gave way to substitution ciphers, which gave way to polyalphabetic ciphers, each generation closing the gap that the previous one left open - until the next attacker arrived.

Shannon’s 1949 paper ended this cycle by asking the right question: what does it actually mean for a cipher to be secure, in a mathematical sense that admits no clever attacks at all? Not “no known attack” or “no efficient attack” - no attack, period, against an adversary with unlimited time and computing power. The answer, which Shannon called perfect secrecy, is beautiful and deeply disappointing. Perfect security is achievable. It is also essentially impractical. Understanding why forces you to understand what cryptography can and cannot do.

The story goes from ad hoc constructions to a precise definition, and from that definition to an impossibility result that reshapes what we ask of cryptography entirely.

The Syntax of Encryption

Before asking whether a cipher is secure, we need a precise description of what a cipher even is. An encryption scheme consists of three algorithms:

Gen: a randomized key generation algorithm. On input a security parameter $1^n$, it outputs a key $k \leftarrow \text{Gen}(1^n)$. The security parameter controls the key length and overall security level.
Enc: an encryption algorithm. On input a key $k$ and message $m$, it outputs a ciphertext $c \leftarrow \text{Enc}_k(m)$. Enc may be randomized - two encryptions of the same message under the same key need not produce the same ciphertext.
Dec: a deterministic decryption algorithm. On input a key $k$ and ciphertext $c$, it outputs $m := \text{Dec}_k(c)$.

Correctness. For every key $k$ output by Gen and every message $m$ in the message space $\mathcal{M}$: $$\text{Dec}_k(\text{Enc}_k(m)) = m$$

The three spaces are the message space $\mathcal{M}$, the key space $\mathcal{K}$, and the ciphertext space $\mathcal{C}$. A scheme is fully specified by these spaces and the three algorithms.

This formal syntax matters because it separates three things that are easy to conflate: how keys are chosen, how messages are encrypted, and what security means. Correctness is just a sanity check - it says decryption undoes encryption. Security is a separate question entirely.

Classical Ciphers

Caesar cipher. Shift every letter forward by 3: A becomes D, B becomes E, and so on. There is exactly one key. Broken by inspection - just shift back.

Shift cipher. Generalize: the key is $k \in \{0, 1, \ldots, 25\}$ and encryption is $c_i = (m_i + k) \bmod 26$. Key space size: 26. A brute-force attacker tries all 26 shifts and finds the one that produces readable English in seconds.

Monoalphabetic substitution cipher. The key is a bijection $\sigma: \{A, \ldots, Z\} \to \{A, \ldots, Z\}$. Every plaintext letter is replaced by its image under $\sigma$. Key space: $26! \approx 4 \times 10^{26}$. This is enormous - brute force is completely out of the question.

It is also completely broken by frequency analysis. The letter ‘e’ appears in roughly 13% of English text. Whatever $\sigma(e)$ is, it will appear in roughly 13% of the ciphertext. The most common ciphertext letter is almost certainly $\sigma(e)$; the second most common is almost certainly $\sigma(t)$; and so on. A few common bigrams (‘th’, ‘he’, ‘in’) collapse any remaining ambiguity. The whole key is recoverable from a few hundred characters of ciphertext.

The lesson is sharp: a large key space is necessary for security but nowhere near sufficient. Structure in the plaintext - the non-uniform distribution of letters and letter combinations - leaks through a substitution cipher no matter how many keys exist. The key space measures how many ciphers there are; it says nothing about whether those ciphers are distinguishable from each other.

Vigenère cipher. The key is a word of length $t$, interpreted as a sequence of shifts $(k_0, k_1, \ldots, k_{t-1})$. Position $i$ in the message is shifted by $k_{i \bmod t}$. For example, key “KEY” = $(10, 4, 24)$ shifts positions $0, 3, 6, \ldots$ by 10, positions $1, 4, 7, \ldots$ by 4, and positions $2, 5, 8, \ldots$ by 24.

This improves on monoalphabetic substitution: each plaintext letter can map to multiple ciphertext letters, so raw letter frequencies are obscured. The weakness is periodicity. Positions $i$ and $i + t$ are encrypted with the same shift, so the ciphertext has period $t$.

Kasiski’s attack exploits this directly. Repeated substrings in the ciphertext are likely to arise when the same plaintext substring is encrypted with the same key substring - which happens with period $t$. The distances between repeated substrings are multiples of $t$, so their GCD reveals $t$. Once $t$ is known, the ciphertext decomposes into $t$ independent Caesar ciphers, each breakable by frequency analysis in isolation.

The pattern of failure is the same as before: reuse creates structure. When the same key material is applied to multiple positions, correlations survive encryption.

Shannon’s Perfect Security

After a century of ciphers failing in hindsight, Shannon asked a different question: rather than build a cipher and hope no one finds an attack, can we define what it means to be immune to all possible attacks, and then prove a cipher meets that definition?

The key insight is to think probabilistically. The message $M$ is a random variable with some prior distribution over $\mathcal{M}$ - whatever the attacker believes before seeing the ciphertext. The ciphertext $C$ is a random variable determined by $M$ and the random key $K$. Perfect secrecy says: observing $C$ does not change what the attacker knows about $M$.

Definition (Perfect Secrecy). An encryption scheme $(Gen, Enc, Dec)$ over message space $\mathcal{M}$ is perfectly secret if for every distribution over $\mathcal{M}$, every message $m \in \mathcal{M}$, and every ciphertext $c \in \mathcal{C}$: $$\Pr[M = m \mid C = c] = \Pr[M = m]$$

The posterior equals the prior. The ciphertext provides zero information about the plaintext - not merely a little, but none at all. An adversary who intercepts $c$ knows exactly as much about $m$ as one who never saw the ciphertext.

An equivalent formulation that removes the choice of message distribution: for all $m, m' \in \mathcal{M}$ and all $c \in \mathcal{C}$: $$\Pr[\text{Enc}_k(m) = c] = \Pr[\text{Enc}_k(m') = c]$$

Every message produces every ciphertext with the same probability. No statistical test of any kind, applied to $c$ alone, can favor any message over any other. This holds even against adversaries with unlimited computational power - there is no computation that extracts information which is not there.

The One-Time Pad

The one-time pad is the canonical perfectly secret cipher, invented (in various forms) in the early 20th century and given its security proof by Shannon.

Construction. Let $\mathcal{M} = \mathcal{K} = \mathcal{C} = \{0,1\}^n$.

$\text{Gen}$: sample $k \leftarrow \{0,1\}^n$ uniformly at random.
$\text{Enc}_k(m) = m \oplus k$
$\text{Dec}_k(c) = c \oplus k$

Correctness: $\text{Dec}_k(\text{Enc}_k(m)) = (m \oplus k) \oplus k = m$, since XOR is its own inverse.

Theorem. The OTP is perfectly secret.

Proof. Fix any $m, c \in \{0,1\}^n$. There is exactly one key $k = m \oplus c$ such that $\text{Enc}_k(m) = m \oplus k = c$. Since $k$ is chosen uniformly over $\{0,1\}^n$, this key is selected with probability $2^{-n}$. Therefore: $$\Pr[\text{Enc}_k(m) = c] = 2^{-n}$$ for every $m$ and every $c$. The probability does not depend on $m$, so every message produces every ciphertext with equal probability. By the equivalent definition of perfect secrecy, the OTP is perfectly secret. $\square$

The XOR with a uniform key is the perfect shuffler. For any fixed message $m$, as $k$ ranges over all of $\{0,1\}^n$, the ciphertext $m \oplus k$ also ranges over all of $\{0,1\}^n$ with uniform distribution. Every ciphertext is equally reachable from every message. There is nothing for an adversary to learn.

Perfect Security Equivalents

Shannon proved that for encryption schemes where $|\mathcal{M}| = |\mathcal{C}| = |\mathcal{K}|$, the following conditions are equivalent:

(Shannon’s definition) $\Pr[M = m \mid C = c] = \Pr[M = m]$ for all $m \in \mathcal{M}$, $c \in \mathcal{C}$, and all distributions over $\mathcal{M}$.
(Indistinguishability) For all $m_0, m_1 \in \mathcal{M}$ and all $c \in \mathcal{C}$: $\Pr[\text{Enc}_k(m_0) = c] = \Pr[\text{Enc}_k(m_1) = c]$.
(Key variety) For each pair $(m, c) \in \mathcal{M} \times \mathcal{C}$, the number of keys $k \in \mathcal{K}$ such that $\text{Enc}_k(m) = c$ is exactly $|\mathcal{K}| / |\mathcal{C}|$.

Each formulation captures a different intuition. Formulation (1) is the Bayesian view: no posterior update. Formulation (2) is the experimental view: no test distinguishes which message was sent. Formulation (3) is the combinatorial view: the encryption function is balanced - every message-ciphertext pair is realized by the same number of keys. These are the same statement at different levels of abstraction.

The Limits of Perfect Security

Shannon’s lower bound. If a scheme over $\mathcal{M}$ is perfectly secret, then $|\mathcal{K}| \geq |\mathcal{M}|$. The key space must be at least as large as the message space - which means the key must be at least as long as the message.

Proof sketch. Fix any ciphertext $c \in \mathcal{C}$. The set of messages that $c$ can decrypt to is $\{\text{Dec}_k(c) : k \in \mathcal{K}\}$, which has at most $|\mathcal{K}|$ elements. For perfect secrecy, every message $m \in \mathcal{M}$ must have positive probability given $c$ - otherwise the adversary seeing $c$ would know that $m$ was not sent, violating indistinguishability. So every message must be achievable from $c$, which requires $|\mathcal{K}| \geq |\mathcal{M}|$. $\square$

This is a hard information-theoretic barrier. No clever design avoids it. Any perfectly secret cipher communicating $n$-bit messages requires $n$-bit keys.

The key reuse attack. The OTP requires a fresh uniformly random key for every message. If the same key is used twice: $$c_1 \oplus c_2 = (m_1 \oplus k) \oplus (m_2 \oplus k) = m_1 \oplus m_2$$

The attacker who intercepts both ciphertexts immediately obtains $m_1 \oplus m_2$. For English text, this is devastating: the XOR of two English strings carries enough redundancy that both messages can be recovered by a short crib-dragging attack. The Venona project broke Soviet OTP-encrypted traffic from World War II precisely this way - Soviet operators reused key material under pressure, and American cryptanalysts recovered thousands of messages decades later.

The full picture. Perfect secrecy requires keys as long as messages (impractical for large data), true randomness (genuinely hard to generate and securely store), and single use (every message needs a new key). Perhaps most crucially: the key must be distributed to both parties before communication begins, through some secure channel. But establishing a shared secret is the very problem cryptography is supposed to solve. Perfect secrecy defers the problem rather than solving it.

Symmetric vs. Public Key: The Preview

Every cipher discussed here is a symmetric key scheme: both parties share a secret key established through some out-of-band channel. The key distribution problem is assumed away. In practice this assumption is often untenable - two parties who have never met cannot share a secret key without some prior infrastructure.

Public key cryptography, developed by Diffie, Hellman, Rivest, Shamir, and Adleman in the 1970s, eliminates this assumption. Encryption and decryption use different keys: a public key that anyone can use to encrypt, and a private key that only the recipient holds for decryption. This seems paradoxical - if encrypting is public, how can decrypting remain private? - but it is achievable based on the assumed computational hardness of certain mathematical problems.

The tradeoff is fundamental. Perfect secrecy is information-theoretic: it holds against adversaries with unlimited power, with no assumptions. Computational security is weaker: it holds only against adversaries bounded by realistic running times, and only if certain computational problems are hard. But it escapes the key-length barrier and the key-distribution problem. Making this tradeoff precise - what does “computationally secure” actually mean? - is the next step.

Summary

Cipher	Key space	How it breaks
Caesar / Shift	$\leq 26$	Brute force
Monoalphabetic substitution	$26!$	Frequency analysis
Vigenère	$26^t$	Kasiski attack (periodicity)
OTP	$2^n$	Secure - but key length = message length, single use

Concept	Statement
Perfect secrecy	Ciphertext is independent of plaintext
OTP security	Unique key per ciphertext; one per message suffices
Shannon’s lower bound	$
Key reuse attack	$c_1 \oplus c_2 = m_1 \oplus m_2$

Read next: