Automata Theory - What Machines Can and Cannot Compute // Megha Bose

Helpful context:

Consider an elevator. It has a small number of things it needs to remember: which floor it is on, whether the door is open or closed, whether a button has been pressed. It does not need to remember the entire history of every button press ever made - just the current situation. Given its current situation and a new input (a button press, a door sensor), it transitions to a new situation. That is all it does, over and over.

This is a finite automaton. Not an abstraction - an actual description of how the machine works. Automata theory takes this idea seriously and asks: what can a machine with finitely many “situations” (states) and a fixed set of responses (transitions) actually compute? What problems can it solve, and what problems are fundamentally beyond it?

The answer turns out to be surprisingly clean. Finite automata recognize exactly the regular languages - a class of languages defined by simple patterns. And the theory of these machines reveals deep connections between computation, logic, algebra, and the structure of pattern itself.

Setting Up the Language

Before talking about machines, we need to talk about what machines operate on.

An alphabet $\Sigma$ is any finite nonempty set of symbols. Examples: $\{0,1\}$ (binary), $\{a,b,\ldots,z\}$ (lowercase English), $\{(,)\}$ (parentheses).

A string over $\Sigma$ is a finite sequence of symbols from $\Sigma$. The empty string $\varepsilon$ has length zero. The set of all strings over $\Sigma$ is written $\Sigma^*$.

A language over $\Sigma$ is any subset $L \subseteq \Sigma^*$. Languages are the objects we want machines to recognize. A machine recognizes language $L$ if it accepts exactly the strings in $L$ and rejects everything else.

The natural question: which languages can be recognized by a simple, memory-bounded machine?

Deterministic Finite Automata

A deterministic finite automaton (DFA) - also called a finite state machine (FSM) or finite state automaton (FSA) - is the simplest model of computation: a machine with a fixed finite set of states, reading an input string one symbol at a time, transitioning between states based on the current symbol.

Definition. A DFA is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$ where:

$Q$ is a finite set of states
$\Sigma$ is the input alphabet
$\delta: Q \times \Sigma \to Q$ is the transition function
$q_0 \in Q$ is the start state
$F \subseteq Q$ is the set of accepting states

The machine reads the input string left to right, starting in $q_0$. On each symbol $a$ it moves from state $q$ to $\delta(q,a)$. After reading the full input it accepts if it is in some state in $F$, and rejects otherwise.

Designing a DFA

Design comes down to identifying what the machine needs to remember. The states encode all the “relevant history” - the information about what has been read so far that might affect future acceptance.

Example. Design a DFA over $\{0,1\}$ that accepts all strings ending in “01”.

Three states suffice - we only need to track whether the last character was “0” ($q_1$), whether the last two were “01” ($q_2$, accepting), or neither ($q_0$). From $q_2$: a new “0” goes back to $q_1$; a “1” resets to $q_0$.

The transition function is fully defined: every state has exactly one transition for each input symbol. This is the deterministic in DFA - no ambiguity, no choices.

Proving a DFA Correct: State Invariants

Designing a DFA is one thing; proving it correct is another. The technique mirrors loop invariants in programming: assign each state $q$ a predicate $P(q)$ describing what it means to arrive in $q$ after reading some prefix. Then verify three things:

The start state satisfies its predicate on $\varepsilon$.
Every transition is sound: if the DFA is in state $q$ satisfying $P(q)$ and reads symbol $a$, then $\delta(q,a)$ satisfies $P(\delta(q,a))$.
A state is accepting iff its predicate implies membership in $L$.

For the “ends in 01” DFA: $P(q_0)$ = “prefix doesn’t end in 0”, $P(q_1)$ = “prefix ends in 0 but not 01”, $P(q_2)$ = “prefix ends in 01”. Checking each transition case-by-case - e.g., from $q_1$ reading “1”: the new prefix ends in “01”, so we move to $q_2$, whose predicate is satisfied - completes the proof. The invariant holds at every step, so the DFA is correct.

Operations on Languages

Languages are sets of strings, so we can define operations on them.

Union: $L_1 \cup L_2 = \{w \mid w \in L_1 \text{ or } w \in L_2\}$.

Concatenation: $L_1 \circ L_2 = \{xy \mid x \in L_1, y \in L_2\}$. The set of all strings formed by concatenating a string from $L_1$ with a string from $L_2$.

Kleene star: $L^* = \{x_1 x_2 \cdots x_k \mid k \geq 0, x_i \in L\}$. Zero or more concatenations of strings from $L$; includes $\varepsilon$.

These three operations are the building blocks of everything that follows.

Regular Expressions

Regular expressions give a compact way to describe languages using union, concatenation, and Kleene star directly.

Definition. The regular expressions over $\Sigma$ are defined inductively:

$\emptyset$ is a regex denoting the empty language
$\varepsilon$ is a regex denoting $\{\varepsilon\}$
$a$ for any $a \in \Sigma$ is a regex denoting $\{a\}$
If $R_1$ and $R_2$ are regexes: $(R_1 \cup R_2)$, $(R_1 \circ R_2)$, $(R_1^*)$ are regexes

The language described by a regex is built up from atoms by the three operations.

Example. The regex $(0 \cup 1)^* 01$ describes all binary strings ending in “01”: take any binary string (the $(0\cup 1)^*$ part), then append “01”.

The power of regular expressions: they describe exactly the same class of languages that DFAs recognize. This is not obvious - but it is true, and the proof goes through the nondeterministic model.

Nondeterministic Finite Automata

A nondeterministic finite automaton (NFA) extends the DFA by allowing the transition function to return a set of states - and by allowing $\varepsilon$-transitions that consume no input. The machine accepts if any sequence of choices leads to an accepting state.

Definition. An NFA is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$ where:

$\delta: Q \times (\Sigma \cup \{\varepsilon\}) \to \mathcal{P}(Q)$ maps to subsets of $Q$

Nondeterminism does not add computational power - every NFA can be converted to an equivalent DFA - but it makes machines far easier to design. Think of it as “guessing” the right path.

Example. The same language - binary strings ending in “01” - has a much simpler NFA:

The NFA says: loop in $p_0$ on anything (you haven’t started the suffix yet), then nondeterministically guess that the “0” you just read is the start of the final “01”, move to $p_1$, then read “1” and accept. The machine “guesses” when the suffix starts - and accepts if any guess works.

This is three transitions. The DFA needed six. For more complex patterns, the gap can be exponential.

NFA to DFA: Subset Construction

Theorem. Every NFA can be converted to an equivalent DFA.

Proof sketch. The DFA simulates all possible states the NFA could be in simultaneously - its states are subsets of the NFA state set. A subset is accepting if it contains any NFA accepting state. This may produce up to $2^{|Q|}$ states, though in practice only reachable subsets appear.

Example. Applying subset construction to the NFA above (no $\varepsilon$-transitions, so $E(S) = S$):

DFA State	on 0	on 1
$\{p_0\}$	$\{p_0, p_1\}$	$\{p_0\}$
$\{p_0, p_1\}$	$\{p_0, p_1\}$	$\{p_0, p_2\}$ ✓
$\{p_0, p_2\}$ ✓	$\{p_0, p_1\}$	$\{p_0\}$

(✓ = accepting, since the subset contains $p_2$.)

The resulting DFA - rename $A = \{p_0\}$, $B = \{p_0,p_1\}$, $C = \{p_0,p_2\}$:

This DFA is structurally the same as the one we designed by hand.

Regular Languages

A language is regular if some DFA (equivalently, some NFA, equivalently, some regular expression) recognizes it.

The equivalence of DFA, NFA, and regex is one of the foundational results of automata theory:

$$\text{DFA} \equiv \text{NFA} \equiv \text{Regular Expressions}$$

Each formalism can simulate the others. The proof chain is: DFA → regex (state elimination), regex → NFA (Thompson’s construction), NFA → DFA (subset construction).

Closure Properties

Regular languages are closed under the basic operations, which means applying these operations to regular languages always produces a regular language.

Operation	Closure	Method
Union $L_1 \cup L_2$	✓	Product construction or NFA union
Concatenation $L_1 \circ L_2$	✓	NFA: connect $F_1$ to $q_{0_2}$ via $\varepsilon$
Kleene star $L^*$	✓	NFA: add $\varepsilon$-loop from $F$ back to $q_0$
Complement $\overline{L}$	✓	DFA: swap accept/reject states
Intersection $L_1 \cap L_2$	✓	Product construction (or De Morgan: $\overline{\bar{L_1} \cup \bar{L_2}}$)
Difference $L_1 \setminus L_2$	✓	$L_1 \cap \overline{L_2}$
Reverse $L^R$	✓	Reverse all transitions, swap start/accept
Homomorphism	✓	Apply symbol-by-symbol

The Pumping Lemma: What Regular Languages Cannot Do

The closure properties tell us regular languages are robust. But they cannot recognize everything. The pumping lemma gives a necessary condition for regularity, used to prove languages are not regular.

Lemma (Pumping Lemma). If $L$ is a regular language, then there exists a pumping length $p \geq 1$ such that for every string $s \in L$ with $|s| \geq p$, we can write $s = xyz$ satisfying:

$xy^i z \in L$ for all $i \geq 0$
$|y| \geq 1$
$|xy| \leq p$

Why this must be true. A DFA with $p$ states reading a string of length $\geq p$ must revisit some state (pigeonhole). The input consumed between those two visits - call it $y$ - can be repeated or removed without affecting the outcome, so $xy^iz$ is always accepted.

Using the lemma to prove non-regularity. Suppose $L = \{0^n 1^n \mid n \geq 0\}$ - strings of $n$ zeros followed by $n$ ones. Claim: $L$ is not regular.

Proof by contradiction. Assume $L$ is regular with pumping length $p$. Take $s = 0^p 1^p \in L$. Since $|s| = 2p \geq p$, write $s = xyz$ with $|xy| \leq p$ and $|y| \geq 1$. Since $|xy| \leq p$, the substring $xy$ lies entirely within the first $p$ zeros, so $y = 0^k$ for some $k \geq 1$. Then $xy^2z = 0^{p+k}1^p$. But this has more zeros than ones, so $xy^2z \notin L$. Contradiction. $\square$

The key step: the constraint $|xy| \leq p$ forces $y$ to consist only of zeros, and pumping $y$ destroys the equal-count balance.

Myhill-Nerode: A Stronger Tool

Strings $x, y \in \Sigma^*$ are $L$-distinguishable if there exists $z$ such that exactly one of $xz, yz$ is in $L$. Any DFA for $L$ must put distinguishable strings in different states - so the minimum number of states is at least the number of pairwise $L$-distinguishable strings.

Theorem (Myhill-Nerode). $L$ is regular if and only if the indistinguishability relation $x \sim_L y$ (meaning: for all $z$, $xz \in L \iff yz \in L$) has finitely many equivalence classes. The number of classes equals the minimum DFA state count.

Example. For $L = \{0^n1^n\}$: the strings $0, 0^2, 0^3, \ldots$ are pairwise distinguishable. For $i \neq j$, use $z = 1^i$ as distinguisher: $0^i \cdot 1^i \in L$ but $0^j \cdot 1^i \notin L$. Infinitely many classes, so $L$ is not regular.

Myhill-Nerode is strictly stronger than the pumping lemma: it gives a necessary and sufficient condition for regularity. It also directly yields the minimal DFA.

Closure Arguments

If $L$ were regular, then applying a closure-preserving operation with another regular language must produce a regular language. A contradiction in the result proves $L$ is not regular.

Example. $B = \{w \mid w \text{ has equal numbers of 0s and 1s}\}$ is not regular. Assume $B$ is regular. Then $B \cap 0^{\ast}1^{\ast}$ is regular (intersection of two regular languages). But $B \cap 0^{\ast}1^{\ast} = \{0^n1^n\}$, which the pumping lemma shows is not regular. Contradiction. $\square$

Closure arguments work best when the target language is complicated but maps cleanly onto a known non-regular one.

What You Actually Do With an Automaton

Modeling something as a finite automaton is the first step. The second step is running algorithms on it to answer questions. The core problems:

Membership. Does string $w$ belong to $L(M)$? Simulate the DFA: follow transitions symbol by symbol from $q_0$, check if you land in an accepting state. $O(|w|)$. This is what regex search does on every run.

Emptiness. Does $M$ accept anything at all? Run BFS/DFS from the start state - if any accepting state is reachable, the language is non-empty. $O(|Q| + |\delta|)$. Useful in verification: model “bad behaviors” as an automaton, and emptiness means none can occur.

Equivalence. Do two automata $M_1$ and $M_2$ accept exactly the same language? Minimize both and check if the minimal DFAs are isomorphic. Decidable and efficient. Used to check if two token rules in a compiler are redundant.

Minimization. Find the smallest DFA for a given language. Hopcroft’s algorithm in $O(n \log n)$. Used in hardware design (fewer flip-flops) and compiler front-ends (faster lexers).

Intersection / Union / Complement. Construct a new automaton from two existing ones. Product construction runs both machines in parallel - states are pairs $(q_1, q_2)$, accepting when both accept (intersection) or either accepts (union). Complement just flips accepting and rejecting states on a complete DFA. These let you compose specifications.

Language containment. Is $L(A) \subseteq L(B)$? Check whether $L(A) \cap \overline{L(B)} = \emptyset$. This is model checking in disguise: “does my system ever produce output that violates the spec?”

The through-line: automata are graphs, and most of these problems reduce to graph reachability, graph isomorphism, or graph product construction.

Converting Between Representations

From \ To	DFA	NFA	Regex
DFA	-	trivial (DFA is an NFA)	state elimination
NFA	subset construction	-	state elimination
Regex	via NFA	Thompson’s construction	-

State elimination converts a DFA/NFA to a regex by removing states one at a time and collapsing paths into regex labels on the remaining edges.

Thompson’s construction converts a regex to an NFA inductively on the regex structure. A regex of size $n$ produces an NFA with at most $2n$ states.

Minimal DFA. There is a unique minimum-state DFA for any regular language, obtained by merging states that no string can distinguish. Hopcroft’s algorithm finds it in $O(n \log n)$.

Connections to Other Fields

Finite automata show up, under different names, across almost every quantitative discipline.

Sequential digital circuits. Every synchronous digital circuit with $n$ bits of state is a finite automaton with $2^n$ states. The theory of automata is the mathematical foundation of digital hardware design. State minimization in automata directly corresponds to reducing the number of flip-flops in a circuit.

Lexical analysis. Every compiler’s front end - the part that breaks source code into tokens - is a DFA. Regular expressions describe the token patterns (identifiers, numbers, keywords), and tools like lex and flex automatically convert them to optimized DFAs. The regex you write in grep or Python is compiled to an NFA and then simulated.

Natural language processing. Morphological analysis is often modeled with finite-state transducers - automata that produce output as they consume input. Many linguistic phenomena (regular morphology, shallow parsing) are within the regular languages, though full natural language is not.

Networks and protocols. Communication protocols are state machines: a TCP connection is in state SYN_SENT, SYN_RECEIVED, ESTABLISHED, etc., and transitions between them on network events. Formal verification of protocol correctness is done by model-checking, which is automata theory scaled to enormous state spaces.

Reinforcement learning: Markov Decision Processes. An MDP is a probabilistic generalization of a DFA. It adds transition probabilities $p(s' \mid s, a)$ and a reward signal $r(s, a)$; strip those away and an MDP collapses to a DFA. A policy $\pi: S \to A$ is the same structural object as a transition function $\delta: Q \times \Sigma \to Q$. The question just changes from does this string get accepted? to what sequence of actions maximizes expected reward?

Wider view. A DFA, a Markov chain, and an MDP all share one structural property: the next state depends only on the current state (and possibly a current input or action), never on the full history of how you arrived there. This is the Markov property. In a DFA it is implicit and deterministic - $\delta(q, a)$ depends on $q$ and $a$, nothing else, which is exactly why the states are described as encoding all “relevant history”: once you know the current state, knowing the sequence of symbols that led there adds nothing. In a Markov chain the same property holds but transitions are probabilistic: $P(X_{n+1} = j \mid X_n = i)$ depends only on $i$, not on $X_0, \ldots, X_{n-1}$. An MDP adds an agent who chooses an action $a$ at each step, so the transition is $p(s' \mid s, a)$ - still Markov, but now the agent’s choice is part of the “input.” The three form a clean hierarchy: a DFA is a special case of a Markov chain where all transition probabilities are 0 or 1 and the symbol read selects which; a Markov chain is a special case of an MDP where there is no agent and the environment picks the transition. The Markov property is what makes all three tractable: the current state is a sufficient statistic for everything that follows, which is why dynamic programming works for MDPs (Bellman equations), stationary distributions exist for Markov chains, and state minimization is well-defined for DFAs.

The recurring pattern: whenever a system has a finite memory and processes a sequence of inputs, automata theory is the right lens. The theory tells you what the system can compute, when two systems are equivalent, and how to minimize the representation.

Summary

Concept	Key Definition
DFA	$(Q, \Sigma, \delta, q_0, F)$; $\delta: Q \times \Sigma \to Q$ deterministic
NFA	$\delta: Q \times (\Sigma \cup \{\varepsilon\}) \to \mathcal{P}(Q)$; accepts if any path accepts
Regex	Built from $\emptyset, \varepsilon, a$ by $\cup$, $\circ$, $^*$
Regular language	Recognized by DFA $\equiv$ NFA $\equiv$ regex
Subset construction	NFA $\to$ DFA; states are subsets of NFA states
Closure	Regular languages closed under $\cup$, $\circ$, $^*$, $\cap$, complement, reverse
Pumping lemma	If regular, long strings can be “pumped”; used to prove non-regularity
Minimal DFA	Unique minimum-state DFA via Myhill-Nerode / state minimization

Regular languages are the simplest infinite class of languages with a clean algebraic theory. They are not the whole story - context-free grammars, Turing machines, and complexity classes extend the picture - but they are the right place to start, and their theory is essentially complete.

Read next: