Helpful context:


In 1936, a 24-year-old Alan Turing was trying to solve a foundational problem in mathematics. David Hilbert had asked in 1928: is there a mechanical procedure that could, given any mathematical statement, decide whether it is provable? The Entscheidungsproblem - the decision problem. Could mathematics itself be automated?

To answer the question, Turing first had to define what “mechanical procedure” means. He invented an abstract machine as simple as he could make it: no graphics, no keyboard, no network. Just a tape, a read/write head, and a finite set of rules. He proved this bare machine could compute anything computable. Then he proved it couldn’t decide the Entscheidungsproblem - answering Hilbert’s question with a decisive no.

We are still using Turing’s definition of “computation” today. Every time someone says “this problem is unsolvable” or “this algorithm runs in polynomial time,” they are implicitly talking about Turing machines.


The Machine

Imagine an infinitely long tape divided into cells. Each cell holds a symbol - either a character from some alphabet, or a special blank symbol (written $\sqcup$). A read/write head sits on one cell at a time. A finite control (like a DFA, but now with more authority) is in one of finitely many states.

At each step, the machine:

  1. Reads the symbol under the head
  2. Writes a (possibly different) symbol to that cell
  3. Moves the head one cell left or one cell right
  4. Transitions to a new state

That’s the entire machine. Nothing else. Formally, a Turing machine is a 7-tuple $M = (Q, \Sigma, \Gamma, \delta, q_0, q_\text{accept}, q_\text{reject})$ where $\Gamma$ is the tape alphabet (including the blank), and $\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}$ is the transition function that reads a state and tape symbol and outputs a new state, a symbol to write, and a direction to move.

The machine starts with its input written on the tape, head at the leftmost character, in state $q_0$. It halts when it reaches $q_\text{accept}$ (accept) or $q_\text{reject}$ (reject). If it never reaches either, it runs forever.

A configuration captures everything about the machine’s current situation: the current state, the entire tape contents, and the head position. You can think of the computation as a sequence of configurations, each yielding the next by one transition step.


Why This Is More Powerful Than a PDA

A pushdown automaton has a stack - but a stack only lets you access the top. The TM has a full read/write tape it can traverse in both directions. That two-way access is what makes the difference.

Consider $\{a^n b^n c^n : n \geq 0\}$ - equal numbers of $a$’s, $b$’s, and $c$’s. We proved in the previous post that no pushdown automaton can recognize this. A Turing machine can. Here’s how:

  1. Scan right across the tape. If you see an $a$, cross it off (write $X$). Continue right, skip any crossed-off symbols, and cross off the first uncrossed $b$. Continue right, skip crossed-off symbols, cross off the first uncrossed $c$.
  2. Return the head to the left end of the tape.
  3. Repeat until all symbols are crossed off, then accept - or reject if the counts don’t match up.

The machine can do this because it can move the head left and right freely, using the tape as working memory. A PDA can match $a$’s against $b$’s with its stack, but it can’t also match against $c$’s - the stack is gone after matching $a$’s with $b$’s.


The Church-Turing Thesis

We now have three models: DFAs (recognizing regular languages), PDAs (context-free languages), and Turing machines. Each is strictly more powerful than the previous. Is there something more powerful than a TM?

The Church-Turing thesis says: any function that can be computed by any reasonable physical process can be computed by a Turing machine.

This is not a theorem - it can’t be, because “reasonable physical process” is not a mathematical term. It’s a thesis about the relationship between the formal model and physical reality. But it has been tested against every model of computation anyone has proposed:

  • Lambda calculus (Church’s formalism, same year as Turing’s paper)
  • Register machines
  • Cellular automata
  • RAM models
  • Quantum computers

Every one of these models computes exactly the same set of functions as Turing machines (with quantum computers computing the same functions, sometimes faster - more on this below). No one has ever proposed an intuitively computable function that a TM can’t compute.

The practical implication: to show a problem is undecidable (has no algorithmic solution), it suffices to show no Turing machine can decide it. You don’t need to analyze every conceivable computing device.

Discomfort check. The Church-Turing thesis says a Turing machine can compute anything computable. But quantum computers can do things classical computers struggle with - Shor’s algorithm factors integers exponentially faster than any known classical algorithm. Doesn’t this refute the thesis?

No. Quantum computers don’t compute new functions - they compute the same functions, just sometimes faster. Shor’s algorithm factors integers, which a classical TM can also factor (just not in polynomial time). The Church-Turing thesis is about what can be computed, not about how fast. The stronger “Extended” Church-Turing thesis - that polynomial-time computation is equivalent across models - is actually threatened by quantum computing, but that’s a different claim.


Variants: More Tapes, More Nondeterminism, Same Power

Once you have one TM, you can ask: what if we make it fancier?

Multi-tape Turing machines have $k$ tapes, each with its own head, operating simultaneously. This is much more convenient for algorithm design - you can use one tape for input, one for scratch work, one for output. But you can simulate any $k$-tape TM with a single-tape TM, with at most an $O(t^2)$ slowdown (where $t$ is the number of steps). So multi-tape TMs recognize the same languages; they just do it faster.

Nondeterministic Turing machines (NTMs) allow the transition function to produce a set of possible next configurations rather than just one - the machine branches into all possibilities simultaneously, accepting if any branch accepts. This is a huge conceptual extension. But again, you can simulate any NTM with a deterministic TM, by doing a breadth-first search over the computation tree. The simulation may be exponentially slower, but it halts whenever the NTM halts. Same languages, different time.

Probabilistic TMs flip coins at each step. Same languages again.

All these variants are equivalent in terms of what they can recognize. This robustness is one reason the Turing machine model is so compelling - the definition doesn’t depend on the specific choices you make about the hardware.


The Universal Turing Machine

Here is the idea that changed everything: you can write a Turing machine that takes another Turing machine’s description as input and simulates it.

This is the Universal Turing Machine (UTM). Given as input an encoding of a TM $M$ and a string $w$, the UTM simulates $M$ running on $w$: it accepts if $M$ accepts, rejects if $M$ rejects, and loops if $M$ loops.

The UTM is the theoretical basis of stored-program computers. Before this idea, the conventional view was that the “program” was the hardware (you physically reconfigured the machine to change what it computed). Turing’s insight was that the program could be data - the machine reads a description of what to compute, then computes it. Every modern computer, from your phone to a supercomputer, is a physical realization of a Universal Turing Machine.

Discomfort check: what does “program as input” actually mean?

When the text says “a Turing machine takes another Turing machine as input,” it is easy to picture something circular - like running a .py file inside another Python interpreter. But the mechanism is simpler and more concrete than that.

A Turing machine is fully described by its transition table: a finite list of rules of the form “if in state $q$ reading symbol $s$, then write $s'$, move left/right, go to state $q'$.” That table is just a string of symbols. You can write it out as text - say, a numbered list of rows. That text is a valid input string, and any Turing machine can read it off its tape the way it reads any other input.

So “giving a TM as input” means: encode the transition table as a string and put that string on the tape. The UTM reads this encoding, rebuilds the described machine’s rules in its own working memory, and then steps through what that machine would do on the rest of the tape - character by character, state by state, as if it were the described machine running.

A tiny example. Say you have a very simple machine $M$ with just two rules:

  • If in state $q_0$ reading 0, accept.
  • If in state $q_0$ reading 1, reject.

You encode $M$ as the string (q0,0,ACCEPT)(q0,1,REJECT) and put it on the UTM’s tape, followed by a separator # and the actual input - say 0. The tape looks like:

(q0,0,ACCEPT)(q0,1,REJECT) # 0

The UTM now runs. It first reads everything before #, learning the two rules. Then it moves to the input section, sees 0, and looks up: “my simulated machine is in state $q_0$ and reading 0 - rule says ACCEPT.” So the UTM accepts. If the input were 1, it would look up the second rule and reject. The UTM itself never “knew” anything about accepting or rejecting on 0 vs 1 - it learned that behavior by reading $M$’s description.

This is exactly what happens when you run python script.py. The Python interpreter (UTM) reads script.py (the encoding of $M$), then processes sys.argv (the input $w$). The interpreter has no built-in knowledge of what your script does - it learns by reading the file.

Is this the same as programs we write on computers? Yes, exactly. A Python script is a text description of what a virtual machine should do. Python’s interpreter reads that text and executes it. Your CPU is a physical UTM: it reads machine-code instructions (an encoding of “what to compute”) from memory and carries them out. The theoretical result from 1936 is the conceptual foundation of the stored-program computer - the idea that the program and the data live in the same place, that code is just a special kind of data, and that one general-purpose machine can run any program you hand it.

The UTM also enables the diagonalization argument that makes the halting problem undecidable - which is the subject of the next post.


Summary

Model Memory Closure under complement Example language decidable
Finite automaton None (finite states) Yes $\{w : w \text{ contains } 01\}$
Pushdown automaton Stack No $\{0^n 1^n : n \geq 0\}$
Turing machine Infinite tape Yes (if decidable) $\{a^n b^n c^n : n \geq 0\}$

The Turing machine is our universal model of computation. Variants - multi-tape, nondeterministic, probabilistic - are all equivalent in what they can recognize, though they differ in efficiency. The Church-Turing thesis says that nothing more powerful (in terms of what can be computed) is possible for any physical device.


Read next: