Helpful context:


A standard 9×9 Sudoku grid has 81 cells. Each cell can hold a digit from 1 to 9. A brute-force solver that tries every possible assignment would explore $9^{81} \approx 2 \times 10^{77}$ grids. For reference, the observable universe contains about $10^{80}$ atoms. You would run out of universe before you ran out of grids to check.

A backtracking solver with constraint propagation solves any Sudoku in milliseconds.

The difference is not hardware. It’s the recognition that most grids can be ruled out immediately - not after checking them, but before. That’s the central idea behind constraint satisfaction.


The Framework: CSPs

A Constraint Satisfaction Problem (CSP) has three components:

  • Variables $\mathcal{X} = \{X_1, X_2, \ldots, X_n\}$: the things we’re trying to assign values to.
  • Domains $\mathcal{D} = \{D_1, D_2, \ldots, D_n\}$: the finite set of values each variable may take.
  • Constraints $\mathcal{C}$: relationships that valid assignments must satisfy.

A solution is a complete assignment - one value for each variable - that satisfies every constraint.

For Sudoku: variables are the 81 cells, domains are $\{1, \ldots, 9\}$, constraints are “all-different” within each row, column, and 3×3 box.

For map coloring: variables are regions, domains are $\{\text{red}, \text{green}, \text{blue}\}$, constraints are “adjacent regions must differ.”

The CSP framework is powerful because the same algorithm - backtracking with constraint propagation - works for all of these problems.


The basic algorithm is depth-first search over the space of partial assignments.

Backtrack(assignment, csp):
    if assignment is complete: return assignment
    X = SelectUnassignedVariable(csp, assignment)
    for v in OrderDomainValues(X, assignment, csp):
        if v is consistent with assignment:
            assignment[X] = v
            result = Backtrack(assignment, csp)
            if result != failure: return result
            delete assignment[X]          // undo - backtrack
    return failure

At each step: pick an unassigned variable, try each value in its domain, recurse. If a contradiction is reached (some constraint is violated), undo the last assignment and try the next value. If all values fail, backtrack further.

Without any pruning, this explores at worst $\prod_i |D_i|$ complete assignments - exponential in the number of variables. The key question is how much we can prune before getting there.


Arc Consistency: Propagating Constraints Forward

The most powerful pruning technique is arc consistency. The idea: don’t just check constraints after assignment - use them to reduce domains before you even branch.

An arc $(X_i, X_j)$ is arc-consistent if for every value $v$ in $D_i$, there exists at least one value $w$ in $D_j$ such that assigning $X_i = v$ and $X_j = w$ satisfies the constraint between them. If a value $v \in D_i$ has no valid “support” in $D_j$, we can delete $v$ from $D_i$ right now - it can never be part of a solution.

The AC-3 algorithm enforces arc consistency by processing a worklist of arcs:

AC3(csp):
    queue = all arcs (Xi, Xj) in csp
    while queue not empty:
        (Xi, Xj) = dequeue
        if Revise(csp, Xi, Xj):
            if D[Xi] is empty: return false    // no solution
            for each Xk in Neighbors(Xi) \ {Xj}:
                enqueue (Xk, Xi)               // Xi's domain shrank - recheck
    return true

Revise(csp, Xi, Xj):
    revised = false
    for each v in D[Xi]:
        if no w in D[Xj] satisfies constraint(Xi=v, Xj=w):
            remove v from D[Xi]
            revised = true
    return revised

Whenever a domain shrinks, the neighbors of that variable might have values that are now unsupported - so we re-add those arcs to the queue.

Time complexity. Let $e$ = number of constraints (arcs), $d = \max_i |D_i|$. Each arc $(X_i, X_j)$ is re-added to the queue at most $d$ times (once per value removed from $D_i$). Each Revise call takes $O(d^2)$. Total: $O(e d^3)$.


Forward Checking: A Lighter Touch

Arc consistency is powerful but expensive to run at every step. A cheaper alternative: forward checking.

After assigning $X_i = v$, scan each unassigned neighbor $X_j$. Remove from $D_j$ all values incompatible with $v$. If any $D_j$ becomes empty, backtrack immediately without recursing further.

Forward checking is strictly weaker than AC-3 - it only checks one step ahead, not through chains of constraints. But it’s much cheaper per step and often sufficient in practice.


Variable Ordering: Choose the Most Constrained Variable First

The order in which you assign variables matters enormously.

Minimum Remaining Values (MRV): Pick the variable with the fewest legal values remaining in its domain. This is also called “fail-first” - you tackle the hardest variable first, which means you discover dead ends earlier and prune larger subtrees.

Formally: $$X^* = \arg\min_{X \text{ unassigned}} |D_X|$$

If a variable has only one legal value left, assign it next - there’s no choice to make. If a variable has zero legal values, backtrack immediately.

Degree heuristic (tie-breaking). When MRV ties, choose the variable involved in the most constraints on other unassigned variables. More constraints means more pruning potential.


N-Queens: A Backtracking Trace

Place $n$ queens on an $n \times n$ chessboard so that no two queens share a row, column, or diagonal.

Variables: $Q_1, Q_2, \ldots, Q_n$ where $Q_i$ is the column of the queen in row $i$. Domain: $\{1, \ldots, n\}$. Constraints: $Q_i \neq Q_j$ and $|Q_i - Q_j| \neq |i - j|$ for all $i \neq j$.

For $n = 4$: start with $Q_1 = 1$ (column 1, row 1). Forward checking removes column 1 and the diagonals from remaining rows. Try $Q_2 = 3$. Forward checking removes columns 1, 3, and diagonals - row 3 now has only column 1 available, which is immediately eliminated. $Q_2 = 3$ fails. Try $Q_2 = 4$. This works. Continue with MRV: row 3 now has only column 2 available. Assign $Q_3 = 2$. Row 4 has only column 4, which conflicts with $Q_2 = 4$. Backtrack. Continue…

For $n = 8$: there are 92 solutions out of $8^8 \approx 16.7$ million naive assignments. Backtracking with forward checking finds all 92 in milliseconds.


Apply the CSP framework to Sudoku.

Variables: 81 cells. Domain of each cell: $\{1, \ldots, 9\}$, minus any digit already given. Constraints: all-different within each row, column, and box.

The solving strategy:

  1. Run AC-3 to propagate all initial clues. Many cells collapse to singletons.
  2. Apply MRV: pick the cell with fewest remaining candidates.
  3. Assign a value, run forward checking, recurse.
  4. If any domain empties, backtrack.

For most newspaper-level Sudokus, step 1 alone solves the puzzle - AC-3 propagates clues until every cell has exactly one candidate. “Expert” puzzles require a few backtracks. The hardest known Sudokus require deeper search, but even those are solved in milliseconds because the search space has been so aggressively pruned.

Discomfort check. Backtracking is just brute force with early stopping - is it actually efficient? For worst-case inputs, no - the time is still exponential. But for well-structured problems like Sudoku, the constraint propagation prunes the search space so aggressively that almost nothing is left to search. The key insight is that “consistency” is not a binary property: even a partial assignment violates most of the space. When you combine MRV (pick the most constrained variable first) with forward checking (prune immediately on conflict), you typically reduce the effective branching factor from 9 to something close to 1 for most cells. The worst case is exponential; the practical case is near-linear.


Graph Coloring and Computational Complexity

Graph coloring is the CSP version of map coloring: given a graph, assign colors to vertices so that adjacent vertices have different colors, using as few colors as possible.

Map coloring is a special case: the “graph” is the adjacency graph of regions. The famous four-color theorem says every planar graph is 4-colorable. But determining the chromatic number of a general graph (minimum colors needed) is NP-complete - there is no known polynomial-time algorithm, and most researchers believe none exists.

This gives us a preview of computational complexity. CSPs range widely in difficulty. 2-coloring (is the graph bipartite?) can be solved in $O(V + E)$ by BFS. 3-coloring is NP-complete. Sudoku (9-coloring a specific constraint structure) is NP-complete in general, but the fixed $9 \times 9$ size makes it tractable in practice.


Summary

Technique What it does Complexity
Basic backtracking Depth-first search over partial assignments $O(d^n)$ worst case
Forward checking Prune domains after each assignment Reduces branches significantly
AC-3 arc consistency Propagate constraints through the whole graph $O(ed^3)$ per invocation
MRV variable ordering Pick most constrained variable first (fail-first) No extra asymptotic cost
Degree heuristic Break MRV ties by constraint count No extra asymptotic cost

The lesson of constraint satisfaction is that constraints are not obstacles - they’re information. Every constraint you propagate forward is a branch of the search tree you never have to explore. The art is figuring out how to extract that information cheaply and early.


Read next: