SAT - The Problem at the Heart of NP // Megha Bose

Helpful context:

In 2016, mathematicians Marijn Heule, Oliver Kullmann, and Victor Marek solved the Boolean Pythagorean triples problem using a SAT solver. The problem: can you color every positive integer either red or blue such that no Pythagorean triple $(a, b, c)$ with $a^2 + b^2 = c^2$ is monochromatic (all the same color)? They answered no, and their proof required checking a Boolean formula encoded across 200 terabytes of clauses. The proof file is longer than any human could read in a lifetime.

SAT solvers - programs for deciding Boolean satisfiability - are NP-complete in theory. Yet they routinely solve instances with millions of variables in seconds. The gap between “NP-complete in the worst case” and “fast in practice” is enormous for SAT, and understanding that gap is one of the most important stories in modern computer science.

Boolean Satisfiability

A Boolean formula in Conjunctive Normal Form (CNF) is an AND of clauses, where each clause is an OR of literals. A literal is a variable $x_i$ or its negation $\neg x_i$.

$$\varphi = (x_1 \vee \neg x_2 \vee x_3) \wedge (\neg x_1 \vee x_2) \wedge (x_2 \vee \neg x_3)$$

SAT asks: is there an assignment of true/false to the variables making $\varphi$ true? For this formula, try $x_1 = 1, x_2 = 1, x_3 = 0$: first clause - $1 \vee 0 \vee 0 = 1$, second clause - $0 \vee 1 = 1$, third clause - $1 \vee 1 = 1$. Yes, it’s satisfiable.

Each clause rules out exactly one assignment (the one that falsifies all its literals). Satisfying the formula means finding an assignment that survives every constraint simultaneously.

SAT is NP-complete (Cook-Levin theorem). The general question - given a CNF formula, does a satisfying assignment exist? - is at least as hard as every problem in NP. But the theory gets richer when you restrict the clause size.

2-SAT Is in P

If every clause has exactly 2 literals, something remarkable happens: the problem becomes solvable in linear time.

The key insight: every 2-clause $(a \vee b)$ is equivalent to two implications: $\neg a \Rightarrow b$ and $\neg b \Rightarrow a$. Build an implication graph with a node for every literal (each variable $x_i$ contributes both $x_i$ and $\neg x_i$) and a directed edge for each implication.

Now run Strongly Connected Components (SCC) on the implication graph. Two nodes in the same SCC can each be derived from the other - so if $x_i$ and $\neg x_i$ are in the same SCC, we’d need $x_i$ to imply $\neg x_i$ and $\neg x_i$ to imply $x_i$ simultaneously. That’s a contradiction: the formula is unsatisfiable.

If no variable and its negation share an SCC, the formula is satisfiable. The assignment is: for each variable, assign the value corresponding to whichever of $x_i$ or $\neg x_i$ appears in the SCC with the higher topological order (appears later in the topological sort of the SCC DAG).

This whole procedure runs in $O(n + m)$ time, where $n$ is the number of variables and $m$ is the number of clauses.

The jump from 2 to 3 literals per clause is one of the sharpest dichotomies in all of complexity theory: 2-SAT is in P, 3-SAT is NP-complete.

The DPLL Algorithm

For 3-SAT and beyond, we can’t guarantee polynomial time. The first systematic algorithm was DPLL (Davis, Putnam, Logemann, Loveland, 1962), which remains the conceptual backbone of modern SAT solving.

DPLL is backtracking search, enhanced with two propagation rules:

Unit propagation. If a clause has exactly one unset literal, that literal must be set to true. (Setting it false would falsify the clause.) This forced assignment can trigger more forced assignments - a cascade of implications that rapidly restricts the search space.

Pure literal elimination. If a literal $x_i$ appears only positively in all remaining clauses (never as $\neg x_i$), set $x_i = 1$. Setting it true satisfies all clauses containing it and harms nothing. Similarly for literals appearing only negatively.

The algorithm: pick an unset variable, branch on its value (try true, then false), propagate forced assignments using the two rules above, detect conflicts (a clause becomes empty - all its literals are falsified), and backtrack.

This is still exponential in the worst case - random 3-SAT near the phase transition (a specific clause-to-variable ratio around 4.27) can require exponential search even for the best solvers. But for structured instances, it’s often fast.

CDCL: The Modern Revolution

Conflict-Driven Clause Learning (CDCL) is the innovation that turned SAT from a research curiosity into an industrial tool.

When DPLL hits a conflict, it backtracks to the most recent branch point and tries the other value. CDCL does something smarter: it analyzes the conflict to learn why it happened, then adds a new clause to the formula that prevents the same conflict from arising again.

Here’s the process:

Step 1: Build the implication graph. As forced assignments cascade from variable decisions, track what caused each assignment: “variable $x_3$ was forced to true because clause $(x_3 \vee \neg x_1 \vee \neg x_7)$ became unit after $x_1 = 1$ and $x_7 = 1$ were set.” This creates a directed graph of implication relationships.

Step 2: Identify the Unique Implication Point (UIP). When a conflict is reached (a clause is entirely falsified), analyze the implication graph. Find the “last” decision variable that dominates all paths to the conflict - this is the UIP.

Step 3: Learn a clause. Cut the implication graph at the UIP to extract a new clause. This clause is logically implied by the original formula (it’s not adding new information), but it directly encodes the reason for the conflict in a compact form. The solver learns it and adds it to its clause database.

Step 4: Non-chronological backjumping. Instead of backtracking just one level, jump back to the second-highest decision level among the learned clause’s literals. This is often many levels back - you might skip over thousands of useless search branches in one jump.

Discomfort check. CDCL keeps adding new clauses as it searches. Doesn’t the formula grow and become harder to solve? The learned clauses are redundant - they are logically implied by the original formula, so they don’t change what assignments are satisfying. But they actively prune the search space: every future search branch that would have led to the same conflict is immediately cut off by unit propagation on the learned clause. The formula grows, but the search tree shrinks, often dramatically. In practice, solvers prune their clause databases periodically (keeping only the most “useful” learned clauses), managing memory while preserving the most powerful constraints.

Learned clauses accumulate into a rich set of constraints that encode the solver’s knowledge about which partial assignments lead to dead ends. This is essentially the solver building a compressed record of everything it has already tried.

Why SAT Solvers Are Fast in Practice

The theory says 3-SAT is NP-complete - no polynomial-time algorithm exists (assuming P $\neq$ NP). How are SAT solvers solving industrial instances with millions of variables in minutes?

The key is that worst-case hardness is rare in practice. Industrial SAT instances - from hardware verification, software analysis, planning - have structure. They tend to have:

Small effective treewidth: the clauses form a graph with a hierarchical structure that limits how much of the formula interacts with itself
Relatively few clauses per variable: the “underdetermined” regime where many satisfying assignments exist
Local structure: variables relevant to each other tend to appear in nearby clauses

The absolute hardest instances are random 3-SAT formulas at the phase transition - a specific ratio of clauses to variables (around 4.27 for 3-SAT) where the probability of satisfiability drops sharply from nearly 1 to nearly 0. These formulae have no structure to exploit, and CDCL provides little advantage over exhaustive search.

Industrial instances sit far from the phase transition. Solvers exploit their structure through clause learning, smart variable selection heuristics (VSIDS: prioritize variables that appeared in recent conflicts), and periodic restarts (clearing the search state but keeping learned clauses, escaping unproductive search regions).

Applications

Hardware verification. Intel, AMD, and every major chip manufacturer use SAT solvers to verify that a circuit satisfies its specification. The circuit’s logic is encoded as a CNF formula: gates become clauses, and checking whether any input produces an incorrect output is a satisfiability query. A satisfying assignment is a counterexample (a bug); unsatisfiability means the circuit is correct. The 2016 Pythagorean triples proof was a mathematical curiosity; hardware verification is how SAT saves billions of dollars in chip respins.

Software verification. Tools like Microsoft’s Z3 (an SMT solver - SAT extended with arithmetic and data structures) check software for buffer overflows, assertion violations, and security vulnerabilities. Developers write assertions; the tool checks whether any input can violate them.

AI planning. Finding a sequence of actions that achieves a goal - robot planning, logistics optimization - encodes naturally as SAT: variables represent which actions happen at which times, clauses encode preconditions and effects.

Cryptanalysis. Certain attacks on hash functions and symmetric ciphers reduce to SAT instances. The SAT formulation captures the cipher’s structure compactly, and CDCL can sometimes find collisions faster than brute force for weakened versions of real ciphers.

Summary

Variant	Complexity	Algorithm	Key idea
2-SAT	P ($O(n + m)$)	SCC on implication graph	$(a \vee b) \Leftrightarrow (\neg a \Rightarrow b)$
3-SAT	NP-complete	DPLL / CDCL	Backtracking with clause learning
MAX-SAT	NP-hard to optimize	Approximation (7/8 for random)	Randomized rounding
DPLL	Exponential worst case	Unit propagation + backtracking	Forced assignments prune search
CDCL	Exponential worst case	Conflict analysis + learned clauses	Non-chronological backjumping

Read next:

Space Complexity - When Memory, Not Time, Is the Bottleneck