Digital Signatures & ZKP - Proving You Know Without Revealing What // Megha Bose

A MAC authenticates a message, but only to someone who shares the key. If Alice sends Bob a MAC-tagged message, Bob knows the message is from someone who knows the key - but he cannot prove this to anyone else, because he could have computed the tag himself. Digital signatures solve this: a signature is verifiable by anyone who knows the public key, cannot be repudiated by the signer, and cannot be forged by anyone who lacks the secret key. Beyond signatures, cryptography has developed increasingly powerful primitives that let parties prove things about secret values without revealing the secrets themselves: zero-knowledge proofs, oblivious transfer, and secure multi-party computation. This post covers the landscape from signatures to these advanced protocols.

Helpful context:

Digital Signatures

Syntax. A signature scheme is a triple $(\text{Gen}, \text{Sign}, \text{Vrfy})$:

$\text{Gen}(1^n) \to (pk, sk)$: key generation.
$\text{Sign}_{sk}(m) \to \sigma$: signing.
$\text{Vrfy}_{pk}(m, \sigma) \to \{0,1\}$: verification.
Correctness: $\text{Vrfy}{pk}(m, \text{Sign}{sk}(m)) = 1$ always.

Security (EUF-CMA). Adversary gets $pk$ and a signing oracle $\text{Sign}{sk}(\cdot)$. It wins by producing $(m^, \sigma^)$ where $m^*$ was not signed, and $\text{Vrfy}{pk}(m^, \sigma^) = 1$. Security requires negligible advantage for all PPT adversaries.

Key difference from MAC: verification uses $pk$, which is public. Anyone can verify; only the holder of $sk$ can sign.

Textbook RSA Signatures and Its Attacks

Textbook RSA signature: $\text{Sign}{sk}(m) = m^d \bmod N$, $\text{Vrfy}{pk}(m, \sigma) = [\sigma^e \equiv m \pmod{N}]$.

Attacks:

No-message forgery. Choose any $\sigma$, compute $m = \sigma^e \bmod N$. This is a valid $(m, \sigma)$ pair - the adversary forged a signature on $m$ without any signing queries.
Message multiplication. Given $\sigma_1 = \text{Sign}(m_1)$ and $\sigma_2 = \text{Sign}(m_2)$: $\sigma_1 \sigma_2 \bmod N$ is a valid signature on $m_1 m_2 \bmod N$. Multiplicative malleability creates signatures on never-signed messages.

Both attacks are elementary and require no computation beyond modular arithmetic. Textbook RSA signatures are completely insecure.

Hash-and-Sign

The fix: hash the message before signing. $$\text{Sign}{sk}(m) = H(m)^d \bmod N, \quad \text{Vrfy}{pk}(m, \sigma) = [\sigma^e \equiv H(m) \pmod{N}]$$

If $H$ is a collision-resistant hash function:

The no-message attack produces $m = H^{-1}(\sigma^e)$, which requires inverting $H$ - hard.
Multiplication of signatures on $H(m_1)$ and $H(m_2)$ gives a signature on $H(m_1) \cdot H(m_2) \bmod N$, which is not $H(m)$ for any obvious $m$ - collision resistance makes finding such $m$ hard.

More carefully: in the random oracle model, RSA-PKCS1-v1.5 signatures and RSA-PSS are proved EUF-CMA secure. RSA-PSS (Probabilistic Signature Scheme) is the standardized scheme.

Generic hash-and-sign. If $\Pi$ is a signature scheme secure for messages in $\{0,1\}^n$ and $H: \{0,1\}^* \to \{0,1\}^n$ is collision-resistant, then sign $H(m)$ instead of $m$. Security reduction: a forger for the composed scheme implies either a forger for $\Pi$ on $n$-bit messages (with the same signature query translated through $H$) or a collision in $H$.

Impossibility Sources

Cannot sign without a key. Any secure signature scheme requires a secret key. If the signer and verifier share no secret information, an adversary with access to the verification algorithm can simply simulate it to find a valid forgery.

No deterministic EUF-CMA-secure signature scheme. Actually this is FALSE - RSA-based deterministic schemes can be secure. But a key limitation: the signer must know the secret key even to produce consistent signatures; without the key, no forger can produce valid signatures, by definition.

Signature schemes cannot provide perfect security. Because $pk$ is public and the verification algorithm is public, an unbounded adversary can always find $sk$ by exhaustive search over all key pairs. Signatures are computationally secure only.

Oblivious Transfer

Protocol (1-out-of-2 OT). Sender has two messages $m_0, m_1$. Receiver has a choice bit $b$. At the end:

Receiver learns $m_b$ and nothing about $m_{1-b}$.
Sender learns nothing about $b$.

Why this is important. OT is a foundational primitive: it is complete for secure computation (any two-party function can be computed using OT as the only primitive). It is also impossible to achieve without computational assumptions - information-theoretically, the sender can always learn something about $b$.

ElGamal-based OT. Receiver generates two public keys $pk_0, pk_1$, one of which is a “valid” key (receiver knows $sk_b$) and the other is an “indistinguishable garbage” key (receiver doesn’t know its secret key). Receiver sends both. Sender encrypts $m_i$ under $pk_i$. Receiver can decrypt only $m_b$.

Problem. Store a secret $s$ so that any $t$ out of $n$ parties can reconstruct it, but any $t-1$ parties learn nothing.

Shamir’s $(t,n)$-secret sharing (1979). Interpret $s$ as an element of $\mathbb{F}q$ (prime field). Choose a random polynomial $f(x)$ of degree $t-1$ with $f(0) = s$: $$f(x) = s + a_1 x + a_2 x^2 + \cdots + a{t-1} x^{t-1}, \quad a_i \leftarrow \mathbb{F}_q$$ Give party $i$ the share $f(i)$. Any $t$ parties can reconstruct $f$ by polynomial interpolation (Lagrange interpolation on $t$ points of a degree-$t-1$ polynomial). Any $t-1$ parties see a polynomial with $t-1$ constraints and one free coefficient - consistent with any value of $f(0)$.

Perfect secrecy of secret sharing. Shamir’s scheme is information-theoretically secure: the shares of any $t-1$ parties are identically distributed regardless of $s$. No computational assumption is needed.

General access structures. A $(t,n)$-threshold is a special case. A general access structure defines a collection $\mathcal{A}$ of “authorized sets” (subsets of parties that can reconstruct) and “forbidden sets” (subsets that must learn nothing). A secret sharing scheme for $\mathcal{A}$ ensures: any authorized set can reconstruct $s$; any forbidden set learns nothing. Monotone span programs characterize which access structures have efficient secret sharing schemes.

Zero-Knowledge Proofs

Goal. Prover $P$ wants to convince verifier $V$ that it knows a secret $x$ satisfying some relation $R(x, y)$ (e.g., $y = g^x$ in a group, i.e., $P$ knows the discrete log of $y$) - without revealing $x$.

Three properties:

Completeness. If $P$ knows $x$, $V$ accepts (with high probability).
Soundness. If $P$ does not know $x$, $V$ accepts with negligible probability.
Zero-knowledge. There exists a simulator $S$ that, without knowing $x$, produces a transcript indistinguishable from a real interaction between $P$ and $V$.

The zero-knowledge property says: the verifier learns nothing from the interaction beyond the fact that the statement is true. The simulator can produce fake transcripts that look real, proving that the interaction cannot have transmitted $x$ (or anything else about it).

Schnorr protocol (ZKP for discrete log). To prove knowledge of $x$ with $y = g^x$:

$P$ samples $r \leftarrow \mathbb{Z}_q$, sends commitment $c = g^r$.
$V$ sends challenge $e \leftarrow \mathbb{Z}_q$.
$P$ sends response $s = r + ex \bmod q$.
$V$ accepts if $g^s = c \cdot y^e$.

Completeness: $g^s = g^{r+ex} = g^r \cdot (g^x)^e = c \cdot y^e$. Soundness: if two valid responses $s, s'$ to the same commitment $c$ but different challenges $e \neq e'$ are given, then $x = (s - s')/(e - e')$ - extractable. Zero-knowledge: simulator chooses random $e, s$, computes $c = g^s \cdot y^{-e}$; produces $(c, e, s)$ - a valid-looking transcript without knowing $x$.

Noise-Based Encryption (LWE)

Learning With Errors (LWE). Given many samples $(a_i, b_i)$ where $a_i \leftarrow \mathbb{Z}_q^n$ and $b_i = \langle a_i, s \rangle + e_i \bmod q$ (with small noise $e_i$), find $s$.

LWE is believed hard even for quantum computers. This is the basis of post-quantum cryptography. Regev (2005) showed that LWE is as hard as worst-case lattice problems (e.g., GapSVP). LWE-based encryption (Regev’s scheme) and key exchange (Kyber, the NIST-selected post-quantum standard) are built on this hardness.

Multi-Party Computation

Goal. $n$ parties each have a private input $x_i$. They want to jointly compute $f(x_1, \ldots, x_n)$ without any party revealing its input to any other party.

TTP (Trusted Third Party). The trivial solution: everyone sends their input to a trusted third party; TTP computes $f$ and sends the result. Secure, but requires a trustworthy party that doesn’t exist in practice.

Private MPC. Without a TTP, can the parties compute $f$ themselves? Yes, in general:

Yao’s garbled circuits (1986). For two parties: the function $f$ is represented as a boolean circuit. One party “garbles” the circuit (encrypts gate truth tables with double keys) and sends it to the other. The other party obliviously evaluates the circuit using OT to receive the right keys for its inputs. At the end, only the output is learned. This achieves secure computation for any two-party function.

GMW protocol. Generalizes to $n$ parties using secret sharing and OT. Each wire value is additively shared among all parties; gate computation uses local XOR (free) and OT for AND gates.

Security: semi-honest (parties follow the protocol honestly but try to extract information) vs. malicious (parties can deviate arbitrarily). Malicious security requires additional techniques (commitments, ZKPs of correct execution).

Complexity: P, NP, BPP, and ZKP

The complexity classes relevant to cryptography:

P: problems solvable in deterministic polynomial time.
NP: problems where a polynomial-length witness can be verified in polynomial time. Most hard problems in cryptography (factoring, DLP) are in NP but not known to be in P.
BPP (Bounded-error Probabilistic Polynomial time): problems solvable by a randomized polynomial-time algorithm with error probability $\leq 1/3$. Primality testing (before AKS, with Miller-Rabin) is in BPP; it is now known to be in P.
BPNP = MA: problems where a probabilistic polynomial-time verifier can check a witness. Merlin-Arthur games.
IP (Interactive Proofs): problems where an interactive protocol between a PPT verifier and an unbounded prover convinces the verifier. $\text{IP} = \text{PSPACE}$.
ZKP (as complexity class): the class of languages with zero-knowledge proofs. Under certain assumptions, every NP language has a ZKP (via Blum’s three-coloring ZKP: any NP-complete problem’s ZKP gives ZKPs for all of NP). This means: if you can prove you know a satisfying assignment for any boolean formula, you can prove knowledge of any NP witness in zero knowledge.

The hierarchy: $\text{P} \subseteq \text{BPP} \subseteq \text{NP} \subseteq \text{IP} = \text{PSPACE}$. The class ZKP (languages with ZKPs) contains NP under computational assumptions (OWF).

Summary

Concept	Key idea
Digital signature (EUF-CMA)	Only $sk$-holder can sign; anyone with $pk$ can verify
Textbook RSA signature	Broken: no-message forgery, multiplicative malleability
Hash-and-sign	Sign $H(m)$; collision resistance prevents forgery on related messages
Oblivious transfer	Receiver gets $m_b$ without revealing $b$; complete for MPC
Shamir secret sharing	Degree-$t-1$ polynomial; any $t$ shares reconstruct; any $t-1$ learn nothing
Zero-knowledge proof	Completeness + soundness + ZK (simulator without witness produces valid transcript)
Schnorr protocol	ZKP for discrete log; commitment-challenge-response
LWE / post-quantum	Hard even for quantum computers; basis of NIST PQC standards
MPC	Compute $f(x_1,\ldots,x_n)$ without revealing inputs; Yao’s garbled circuits
ZKP as complexity class	All NP has ZKP under OWF assumption; IP = PSPACE

Read next: