Digital Signatures & ZKP - Proving You Know Without Revealing What
A MAC authenticates a message, but only to someone who shares the key. If Alice sends Bob a MAC-tagged message, Bob knows the message is from someone who knows the key - but he cannot prove this to anyone else, because he could have computed the tag himself. Digital signatures solve this: a signature is verifiable by anyone who knows the public key, cannot be repudiated by the signer, and cannot be forged by anyone who lacks the secret key. Beyond signatures, cryptography has developed increasingly powerful primitives that let parties prove things about secret values without revealing the secrets themselves: zero-knowledge proofs, oblivious transfer, and secure multi-party computation. This post covers the landscape from signatures to these advanced protocols.
Helpful context:
- Public Key Cryptography - Secrets Without Meeting in Advance
- Message Authentication & Hashing - Proof That No One Tampered With This
Digital Signatures
Syntax. A signature scheme is a triple $(\text{Gen}, \text{Sign}, \text{Vrfy})$:
- $\text{Gen}(1^n) \to (pk, sk)$: key generation.
- $\text{Sign}_{sk}(m) \to \sigma$: signing.
- $\text{Vrfy}_{pk}(m, \sigma) \to \{0,1\}$: verification.
- Correctness: $\text{Vrfy}{pk}(m, \text{Sign}{sk}(m)) = 1$ always.
Security (EUF-CMA). Adversary gets $pk$ and a signing oracle $\text{Sign}{sk}(\cdot)$. It wins by producing $(m^, \sigma^)$ where $m^*$ was not signed, and $\text{Vrfy}{pk}(m^, \sigma^) = 1$. Security requires negligible advantage for all PPT adversaries.
Key difference from MAC: verification uses $pk$, which is public. Anyone can verify; only the holder of $sk$ can sign.
Textbook RSA Signatures and Its Attacks
Textbook RSA signature: $\text{Sign}{sk}(m) = m^d \bmod N$, $\text{Vrfy}{pk}(m, \sigma) = [\sigma^e \equiv m \pmod{N}]$.
Attacks:
- No-message forgery. Choose any $\sigma$, compute $m = \sigma^e \bmod N$. This is a valid $(m, \sigma)$ pair - the adversary forged a signature on $m$ without any signing queries.
- Message multiplication. Given $\sigma_1 = \text{Sign}(m_1)$ and $\sigma_2 = \text{Sign}(m_2)$: $\sigma_1 \sigma_2 \bmod N$ is a valid signature on $m_1 m_2 \bmod N$. Multiplicative malleability creates signatures on never-signed messages.
Both attacks are elementary and require no computation beyond modular arithmetic. Textbook RSA signatures are completely insecure.
Hash-and-Sign
The fix: hash the message before signing. $$\text{Sign}{sk}(m) = H(m)^d \bmod N, \quad \text{Vrfy}{pk}(m, \sigma) = [\sigma^e \equiv H(m) \pmod{N}]$$
If $H$ is a collision-resistant hash function:
- The no-message attack produces $m = H^{-1}(\sigma^e)$, which requires inverting $H$ - hard.
- Multiplication of signatures on $H(m_1)$ and $H(m_2)$ gives a signature on $H(m_1) \cdot H(m_2) \bmod N$, which is not $H(m)$ for any obvious $m$ - collision resistance makes finding such $m$ hard.
More carefully: in the random oracle model, RSA-PKCS1-v1.5 signatures and RSA-PSS are proved EUF-CMA secure. RSA-PSS (Probabilistic Signature Scheme) is the standardized scheme.
Generic hash-and-sign. If $\Pi$ is a signature scheme secure for messages in $\{0,1\}^n$ and $H: \{0,1\}^* \to \{0,1\}^n$ is collision-resistant, then sign $H(m)$ instead of $m$. Security reduction: a forger for the composed scheme implies either a forger for $\Pi$ on $n$-bit messages (with the same signature query translated through $H$) or a collision in $H$.
Impossibility Sources
Cannot sign without a key. Any secure signature scheme requires a secret key. If the signer and verifier share no secret information, an adversary with access to the verification algorithm can simply simulate it to find a valid forgery.
No deterministic EUF-CMA-secure signature scheme. Actually this is FALSE - RSA-based deterministic schemes can be secure. But a key limitation: the signer must know the secret key even to produce consistent signatures; without the key, no forger can produce valid signatures, by definition.
Signature schemes cannot provide perfect security. Because $pk$ is public and the verification algorithm is public, an unbounded adversary can always find $sk$ by exhaustive search over all key pairs. Signatures are computationally secure only.
Oblivious Transfer
Protocol (1-out-of-2 OT). Sender has two messages $m_0, m_1$. Receiver has a choice bit $b$. At the end:
- Receiver learns $m_b$ and nothing about $m_{1-b}$.
- Sender learns nothing about $b$.
Why this is important. OT is a foundational primitive: it is complete for secure computation (any two-party function can be computed using OT as the only primitive). It is also impossible to achieve without computational assumptions - information-theoretically, the sender can always learn something about $b$.
ElGamal-based OT. Receiver generates two public keys $pk_0, pk_1$, one of which is a “valid” key (receiver knows $sk_b$) and the other is an “indistinguishable garbage” key (receiver doesn’t know its secret key). Receiver sends both. Sender encrypts $m_i$ under $pk_i$. Receiver can decrypt only $m_b$.
Secret Sharing
Problem. Store a secret $s$ so that any $t$ out of $n$ parties can reconstruct it, but any $t-1$ parties learn nothing.
Shamir’s $(t,n)$-secret sharing (1979). Interpret $s$ as an element of $\mathbb{F}q$ (prime field). Choose a random polynomial $f(x)$ of degree $t-1$ with $f(0) = s$: $$f(x) = s + a_1 x + a_2 x^2 + \cdots + a{t-1} x^{t-1}, \quad a_i \leftarrow \mathbb{F}_q$$ Give party $i$ the share $f(i)$. Any $t$ parties can reconstruct $f$ by polynomial interpolation (Lagrange interpolation on $t$ points of a degree-$t-1$ polynomial). Any $t-1$ parties see a polynomial with $t-1$ constraints and one free coefficient - consistent with any value of $f(0)$.
Perfect secrecy of secret sharing. Shamir’s scheme is information-theoretically secure: the shares of any $t-1$ parties are identically distributed regardless of $s$. No computational assumption is needed.
General access structures. A $(t,n)$-threshold is a special case. A general access structure defines a collection $\mathcal{A}$ of “authorized sets” (subsets of parties that can reconstruct) and “forbidden sets” (subsets that must learn nothing). A secret sharing scheme for $\mathcal{A}$ ensures: any authorized set can reconstruct $s$; any forbidden set learns nothing. Monotone span programs characterize which access structures have efficient secret sharing schemes.
Zero-Knowledge Proofs
Goal. Prover $P$ wants to convince verifier $V$ that it knows a secret $x$ satisfying some relation $R(x, y)$ (e.g., $y = g^x$ in a group, i.e., $P$ knows the discrete log of $y$) - without revealing $x$.
Three properties:
- Completeness. If $P$ knows $x$, $V$ accepts (with high probability).
- Soundness. If $P$ does not know $x$, $V$ accepts with negligible probability.
- Zero-knowledge. There exists a simulator $S$ that, without knowing $x$, produces a transcript indistinguishable from a real interaction between $P$ and $V$.
The zero-knowledge property says: the verifier learns nothing from the interaction beyond the fact that the statement is true. The simulator can produce fake transcripts that look real, proving that the interaction cannot have transmitted $x$ (or anything else about it).
Schnorr protocol (ZKP for discrete log). To prove knowledge of $x$ with $y = g^x$:
- $P$ samples $r \leftarrow \mathbb{Z}_q$, sends commitment $c = g^r$.
- $V$ sends challenge $e \leftarrow \mathbb{Z}_q$.
- $P$ sends response $s = r + ex \bmod q$.
- $V$ accepts if $g^s = c \cdot y^e$.
Completeness: $g^s = g^{r+ex} = g^r \cdot (g^x)^e = c \cdot y^e$. Soundness: if two valid responses $s, s'$ to the same commitment $c$ but different challenges $e \neq e'$ are given, then $x = (s - s')/(e - e')$ - extractable. Zero-knowledge: simulator chooses random $e, s$, computes $c = g^s \cdot y^{-e}$; produces $(c, e, s)$ - a valid-looking transcript without knowing $x$.
Noise-Based Encryption (LWE)
Learning With Errors (LWE). Given many samples $(a_i, b_i)$ where $a_i \leftarrow \mathbb{Z}_q^n$ and $b_i = \langle a_i, s \rangle + e_i \bmod q$ (with small noise $e_i$), find $s$.
LWE is believed hard even for quantum computers. This is the basis of post-quantum cryptography. Regev (2005) showed that LWE is as hard as worst-case lattice problems (e.g., GapSVP). LWE-based encryption (Regev’s scheme) and key exchange (Kyber, the NIST-selected post-quantum standard) are built on this hardness.
Multi-Party Computation
Goal. $n$ parties each have a private input $x_i$. They want to jointly compute $f(x_1, \ldots, x_n)$ without any party revealing its input to any other party.
TTP (Trusted Third Party). The trivial solution: everyone sends their input to a trusted third party; TTP computes $f$ and sends the result. Secure, but requires a trustworthy party that doesn’t exist in practice.
Private MPC. Without a TTP, can the parties compute $f$ themselves? Yes, in general:
Yao’s garbled circuits (1986). For two parties: the function $f$ is represented as a boolean circuit. One party “garbles” the circuit (encrypts gate truth tables with double keys) and sends it to the other. The other party obliviously evaluates the circuit using OT to receive the right keys for its inputs. At the end, only the output is learned. This achieves secure computation for any two-party function.
GMW protocol. Generalizes to $n$ parties using secret sharing and OT. Each wire value is additively shared among all parties; gate computation uses local XOR (free) and OT for AND gates.
Security: semi-honest (parties follow the protocol honestly but try to extract information) vs. malicious (parties can deviate arbitrarily). Malicious security requires additional techniques (commitments, ZKPs of correct execution).
Complexity: P, NP, BPP, and ZKP
The complexity classes relevant to cryptography:
- P: problems solvable in deterministic polynomial time.
- NP: problems where a polynomial-length witness can be verified in polynomial time. Most hard problems in cryptography (factoring, DLP) are in NP but not known to be in P.
- BPP (Bounded-error Probabilistic Polynomial time): problems solvable by a randomized polynomial-time algorithm with error probability $\leq 1/3$. Primality testing (before AKS, with Miller-Rabin) is in BPP; it is now known to be in P.
- BPNP = MA: problems where a probabilistic polynomial-time verifier can check a witness. Merlin-Arthur games.
- IP (Interactive Proofs): problems where an interactive protocol between a PPT verifier and an unbounded prover convinces the verifier. $\text{IP} = \text{PSPACE}$.
- ZKP (as complexity class): the class of languages with zero-knowledge proofs. Under certain assumptions, every NP language has a ZKP (via Blum’s three-coloring ZKP: any NP-complete problem’s ZKP gives ZKPs for all of NP). This means: if you can prove you know a satisfying assignment for any boolean formula, you can prove knowledge of any NP witness in zero knowledge.
The hierarchy: $\text{P} \subseteq \text{BPP} \subseteq \text{NP} \subseteq \text{IP} = \text{PSPACE}$. The class ZKP (languages with ZKPs) contains NP under computational assumptions (OWF).
Summary
| Concept | Key idea |
|---|---|
| Digital signature (EUF-CMA) | Only $sk$-holder can sign; anyone with $pk$ can verify |
| Textbook RSA signature | Broken: no-message forgery, multiplicative malleability |
| Hash-and-sign | Sign $H(m)$; collision resistance prevents forgery on related messages |
| Oblivious transfer | Receiver gets $m_b$ without revealing $b$; complete for MPC |
| Shamir secret sharing | Degree-$t-1$ polynomial; any $t$ shares reconstruct; any $t-1$ learn nothing |
| Zero-knowledge proof | Completeness + soundness + ZK (simulator without witness produces valid transcript) |
| Schnorr protocol | ZKP for discrete log; commitment-challenge-response |
| LWE / post-quantum | Hard even for quantum computers; basis of NIST PQC standards |
| MPC | Compute $f(x_1,\ldots,x_n)$ without revealing inputs; Yao’s garbled circuits |
| ZKP as complexity class | All NP has ZKP under OWF assumption; IP = PSPACE |
Read next: