P, NP, and the Hardest Problems - The Question That Keeps Theorists Up at Night // Megha Bose

Helpful context:

P vs NP is one of the seven Millennium Prize Problems - a list assembled by the Clay Mathematics Institute in the year 2000, each carrying a one-million-dollar prize for a solution. Of the seven, six remain unsolved.

Personally, I suspect anyone capable of actually solving this wouldn’t be doing it for the million dollars - the kind of conviction it takes to crack a problem this deep tends to be its own reward.

Almost every cryptographer believes P $\neq$ NP - that encrypted data is genuinely safe from fast decryption, not just safe from decryption methods we happen to know today. Almost every algorithm researcher believes P $\neq$ NP - that the hard problems on their desks won’t suddenly become easy. Decades of intense effort by the sharpest minds in mathematics and computer science have produced no proof either way.

It is the most important open problem in mathematics and computer science. Here is why.

Decidability Was Only the Beginning

The previous post asked: which problems have any algorithmic solution at all? Many don’t - the halting problem, Rice’s theorem, and most problems in the uncountable universe of all possible questions.

Complexity theory asks a sharper question: among problems that are decidable, which ones are efficiently solvable? “Efficiently” means something precise: polynomial time.

We say a TM runs in time $f(n)$ if on every input of length $n$, it halts within $f(n)$ steps. This gives a measurement that’s robust across different reasonable models of computation - multi-tape TMs, random access machines, and real computers are all polynomially equivalent (simulating each other with at most polynomial overhead).

P: Problems You Can Actually Solve

P is the class of problems solvable in polynomial time - $O(n^k)$ steps for some fixed constant $k$, where $n$ is the input size.

$$\mathbf{P} = \bigcup_{k \geq 0} \text{TIME}(n^k)$$

Why polynomial? Two reasons. First, polynomial time is closed under composition: if you run a polynomial algorithm as a subroutine a polynomial number of times, you still get polynomial time. Second, polynomial time is model-independent: the same class is defined by TMs, RAM machines, and real hardware. Exponential time, by contrast, is extremely sensitive to constant factors - whether it’s $2^n$ or $2^{100n}$ matters enormously in practice.

Examples of problems in P:

Sorting: $O(n \log n)$
Shortest paths in a graph: $O(m \log n)$ with Dijkstra
Primality testing: $O(\log^6 n)$ with the AKS algorithm (proved in 2002 - determining whether a number is prime took decades to put in P)
Linear programming: polynomial via the ellipsoid method

These are the “tractable” problems. Given enough data, these algorithms finish in reasonable time.

NP: Problems Where You Can Check an Answer

NP stands for Nondeterministic Polynomial - not “Not Polynomial,” as many people assume. It is the class of problems where, if someone hands you a proposed solution, you can verify it in polynomial time.

The intuition: checking someone else’s work is far easier than doing the work yourself. A concrete example makes this vivid.

The Traveling Salesman Problem (TSP): Given a set of cities and the road distance between every pair, find the shortest route that visits every city exactly once and returns to the starting city. With 10 cities there are already $10!/2 \approx 1.8$ million possible routes to check. With 30 cities, more routes than grains of sand on Earth. With 100 cities, more than atoms in the observable universe. No computer can enumerate them all. Yet if someone hands you a specific tour and claims it covers fewer than $B$ miles total, you can verify this in seconds - just add up the distances. Verification takes $O(n)$ time. Finding the optimum seems to take exponential time. This gap - easy to check, hard to find - is exactly what NP captures.

The “Nondeterministic” in the name formalizes the same idea: a nondeterministic Turing machine can “guess” the right answer instantly and then verify it in polynomial time. A real machine has no magic guessing - it must actually search.

Formally: $L \in \mathbf{NP}$ if there exists a polynomial-time verifier $V$ such that

$$w \in L \iff \text{there exists a certificate } c \text{ with } |c| \leq \text{poly}(|w|) \text{ and } V(w, c) \text{ accepts}$$

The certificate is the “proof” or “witness” - the proposed solution that the verifier checks.

Examples:

3-SAT: can you satisfy a Boolean formula where each clause has 3 literals? Certificate: an assignment of true/false to variables. Verification: plug it in and check, $O(n)$.
Graph coloring: can you color a graph with 3 colors so no adjacent vertices share a color? Certificate: the coloring. Verification: check all edges, $O(m)$.
Hamiltonian path: does a graph have a path visiting every vertex exactly once? Certificate: the path. Verification: check adjacency and that each vertex appears once, $O(n)$.
Clique: does a graph have a group of $k$ vertices all pairwise connected? Certificate: the $k$ vertices. Verification: check all $\binom{k}{2}$ pairs, $O(n^2)$.

For every one of these, finding the solution seems to require trying exponentially many possibilities. But checking a proposed solution is fast.

The equivalent formulation using nondeterminism: NP is the class of languages decidable in polynomial time on a nondeterministic TM - a machine that can “guess” the right answer at each step and then verify it. The two definitions are equivalent.

The P vs NP Question

We know $\mathbf{P} \subseteq \mathbf{NP}$: if you can solve a problem in polynomial time, you can certainly verify a solution in polynomial time (just re-solve it). The question is whether this containment is strict.

$$\mathbf{P} \stackrel{?}{=} \mathbf{NP}$$

If P = NP: every problem where a solution can be checked efficiently can also be found efficiently. The consequences would be staggering. Current public-key cryptography (RSA, elliptic curves) relies on problems believed to be in NP but not in P - if P = NP, these systems are broken. Mathematical proofs could be found automatically: any theorem whose proof can be verified in polynomial time could be discovered in polynomial time. Protein folding, drug design, scheduling, packing - all would become tractable overnight. The structure of the intellectual world would change.

If P $\neq$ NP: there exist problems that are easy to verify but genuinely hard to solve - no polynomial-time shortcut exists. Encryption is safe. Hard problems stay hard. This is the world virtually every researcher believes we live in.

Discomfort check. Why can’t we just try all possible solutions? If verification is polynomial, we can check all $2^n$ possible Boolean assignments in $2^n \times O(n)$ time - exponential. Is exponential time really so much worse than polynomial? For $n = 1000$, a polynomial algorithm takes roughly $10^9$ steps (a few seconds). An exponential algorithm takes $2^{1000} \approx 10^{301}$ steps - more than the number of atoms in the observable universe, even if each step takes a femtosecond. For $n = 10{,}000$ the gap is $10^{1000}$ times larger. Yes, exponential time is catastrophically worse than polynomial. The P vs NP question is asking whether a fundamentally smarter approach exists - not whether our current brute-force approach is slow.

NP-Hard and NP-Complete

Some problems in NP are the hardest problems in NP: if you could solve any one of them in polynomial time, you could solve every problem in NP in polynomial time.

A problem $X$ is NP-hard if every problem in NP reduces to $X$ in polynomial time. A problem is NP-complete if it is NP-hard and also in NP.

“Reduces to” means there is a polynomial-time translation: given any instance of any NP problem, you can convert it into an instance of $X$ such that $X$’s answer gives you the original answer. If you can solve $X$ efficiently, you can solve everything in NP efficiently. So $X$ is at least as hard as every NP problem at once.

Why “reduces”? The name is counterintuitive - it sounds like we’re making a problem simpler, but that’s not what it means here. “A reduces to B” means: I reduce my workload by offloading A to B. If I had a black box that solved B, I could use it to solve A (by translating instances). So A is “reduced” to B in the sense that A is no longer my problem - B handles it. The direction is: A reduces to B means A is no harder than B. When we prove that X is NP-hard, we show that known hard problems (like 3-SAT) reduce to X - meaning X must be capable of solving them, so X is at least that hard. We’re going from hard to hard. The “easy” problem being reduced is easy only relative to X, because X can simulate it.

NP-complete problems are the hardest problems in NP. If any one NP-complete problem has a polynomial-time algorithm, then P = NP and every NP problem becomes tractable. Conversely, if P $\neq$ NP, no NP-complete problem has a polynomial-time algorithm.

The remarkable fact - proved by Cook and Karp in the early 1970s - is that thousands of problems from scheduling, graph theory, biology, and cryptography are all NP-complete. They form a web of reductions: 3-SAT reduces to Graph Coloring reduces to Clique reduces to TSP. Solve any one efficiently and you solve all of them. NP-complete problems are all secretly the same problem wearing different clothes.

The canonical NP-complete problems include:

3-SAT: Boolean satisfiability with 3-literal clauses
Traveling Salesman Problem (TSP): does a tour of all cities exist within budget $B$?
Graph Coloring: can a graph be colored with $k$ colors?
Knapsack: can you fill a knapsack to a target value without exceeding weight capacity?
Clique, Independent Set, Vertex Cover: various graph selection problems
Scheduling: can jobs be assigned to machines meeting all deadlines?

These problems are all interconnected: a polynomial algorithm for any one of them would immediately give polynomial algorithms for all of them. That’s what “NP-complete” means.

How Do You Prove a Problem Is NP-Complete?

To prove a new problem $X$ is NP-complete, you need two things.

Step 1 - Show $X$ is in NP. Describe a certificate (a proposed solution someone might hand you) and a polynomial-time verifier that checks it. For example, to show TSP is in NP: the certificate is a specific tour, and the verifier adds up the edge weights and checks they sum to at most $B$. That’s $O(n)$ work. Done.

Step 2 - Show $X$ is NP-hard. Pick a problem already known to be NP-complete (usually 3-SAT, or something structurally closer to $X$) and give a polynomial-time reduction from it to $X$. You’re showing: if I could solve $X$, I could also solve 3-SAT. Since 3-SAT is already NP-hard, $X$ must be at least that hard.

The tricky part is always the reduction - you need to design a translation that converts 3-SAT instances into $X$ instances faithfully. The reduction for Graph Coloring, for instance, converts each Boolean variable into a gadget of vertices and each clause into a set of edges that force the gadget to be consistently colored iff the clause is satisfied. These constructions can be clever.

The first NP-complete problem had no existing NP-complete problem to reduce from. Stephen Cook’s 1971 theorem proved SAT is NP-complete by a completely different argument: he showed that the execution of any polynomial-time verifier on any input can be encoded as a Boolean formula, so satisfying the formula is equivalent to the verifier accepting. Every NP problem’s verification trace becomes a SAT instance. That’s what put the first problem on the map. After Cook, everything else only required finding reductions.

NP-Hard Problems Beyond NP

Not all NP-hard problems are NP-complete. NP-complete means NP-hard and in NP. There are NP-hard problems that aren’t in NP at all - they’re not just hard, they’re in a completely different tier of difficulty.

Optimization versions of NP-complete problems. NP is defined for yes/no decision problems. The decision version of TSP - “does a tour of cost $\leq B$ exist?” - is NP-complete. The optimization version - “what is the minimum cost tour?” - is NP-hard, but it’s not a decision problem, so it can’t be “in NP” by definition. Most real-world problems are optimization problems, not decision problems, so they land here: NP-hard but not NP-complete.

Quantified Boolean Formulas (QBF). Take a Boolean formula and allow both $\exists$ (“there exists an assignment…") and $\forall$ (“for all assignments…") quantifiers. Deciding whether such a formula is true is PSPACE-complete - believed to be strictly harder than any NP problem. Yet every NP problem reduces to it, so it’s NP-hard. This is the class of problems like: “does a winning strategy exist for player 1 in this game?” - where the $\forall$ captures the opponent’s best play.

Generalized board games. Determining whether the current player has a winning strategy in a generalized $n \times n$ chess or Go position is EXPTIME-complete - exponentially harder than NP. These are NP-hard problems that would stay hard even if P = NP.

The halting problem. Undecidable - no Turing machine can solve it. Yet every NP problem reduces to it, making it NP-hard. It’s NP-hard in the way that a god is “at least as strong as a human”: technically true, but the comparison undersells the gap. Problems like these are NP-hard because NP is the floor of the comparison, not because they’re merely NP-hard.

The Landscape

P is inside NP - anything solvable quickly can also be verified quickly. NP-complete problems live at the hard edge of NP, where NP and NP-hard overlap. NP-hard extends beyond NP to include problems like the halting problem - undecidable, yet at least as hard as every NP problem.

What NP-Hardness Means in Practice

When you prove your problem is NP-hard, it’s not a defeat - it’s information. You know:

Don’t keep looking for a fast exact algorithm for the worst case. It (probably) doesn’t exist.
Instead: approximation algorithms (solutions guaranteed to be within some factor of optimal), heuristics (fast algorithms that work well on typical instances), special structure (maybe your instances are sparse, or have bounded treewidth, which makes them tractable), or exponential algorithms with good constants (branch and bound, integer programming solvers).

For example, vertex cover is NP-hard, but there’s a simple 2-approximation: repeatedly take both endpoints of any unmatched edge. It runs in polynomial time and always produces a cover at most twice the optimal size.

Summary

Class	Informal definition	Examples
P	Solvable in polynomial time	Sorting, shortest paths, primality
NP	Solution verifiable in polynomial time	3-SAT, TSP, graph coloring, clique
NP-hard	At least as hard as every NP problem	HALT (not in NP), TSP (in NP)
NP-complete	NP-hard and in NP	3-SAT, clique, vertex cover, knapsack

P is contained in NP. NP-complete problems are the hardest in NP. Whether P = NP is the central open question. The known evidence strongly suggests P $\neq$ NP, but no proof exists.

Read next: