Fast Fourier Transform (FFT)
Prerequisite: Divide & Conquer
The Discrete Fourier Transform (DFT) decomposes a sequence of $N$ numbers into its constituent frequencies. Computing it naively costs $O(N^2)$. The Cooley-Tukey Fast Fourier Transform reduces this to $O(N \log N)$ by exploiting the symmetry of complex roots of unity. First published in 1965, it is routinely called the most important numerical algorithm of the 20th century - enabling real-time audio processing, fast polynomial multiplication, and modern communications.
The Discrete Fourier Transform
Given a sequence $x[0], x[1], \ldots, x[N-1]$, its DFT is the sequence $X[0], \ldots, X[N-1]$ defined by:
$$X[k] = \sum_{n=0}^{N-1} x[n] , e^{-2\pi i k n / N}, \qquad k = 0, 1, \ldots, N-1$$
Write $\omega_N = e^{-2\pi i / N}$ (the primitive $N$-th root of unity). Then $X[k] = \sum_{n=0}^{N-1} x[n] , \omega_N^{kn}$.
The inverse DFT recovers $x$ from $X$:
$$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k] , e^{2\pi i k n / N}$$
Naive computation: each of the $N$ outputs requires a sum of $N$ terms - $O(N^2)$ multiplications and additions.
Cooley-Tukey Divide and Conquer
Assume $N = 2^p$ (a power of two). Split the input into even-indexed and odd-indexed elements:
$$X[k] = \sum_{n=0}^{N/2-1} x[2n] , \omega_N^{k \cdot 2n} + \sum_{n=0}^{N/2-1} x[2n+1] , \omega_N^{k(2n+1)}$$
$$= \sum_{n=0}^{N/2-1} x[2n] , \omega_{N/2}^{kn} + \omega_N^k \sum_{n=0}^{N/2-1} x[2n+1] , \omega_{N/2}^{kn}$$
$$= E[k] + \omega_N^k \cdot O[k]$$
where $E$ and $O$ are the DFTs of the even and odd subsequences respectively - each of length $N/2$. Using the periodicity of DFT outputs ($E[k + N/2] = E[k]$) and the anti-symmetry of twiddle factors ($\omega_N^{k+N/2} = -\omega_N^k$):
$$X[k] = E[k] + \omega_N^k \cdot O[k]$$ $$X[k + N/2] = E[k] - \omega_N^k \cdot O[k]$$
This is the butterfly operation: two outputs computed from two inputs and one twiddle factor.
Recurrence and complexity:
$$T(N) = 2T(N/2) + O(N) \implies T(N) = O(N \log N)$$
by the Master Theorem ($a = b = 2$, $f(n) = O(n)$, case 2).
Pseudocode (Recursive)
FFT(x[0..N-1]):
if N == 1: return x
even = FFT(x[0], x[2], x[4], ..., x[N-2])
odd = FFT(x[1], x[3], x[5], ..., x[N-1])
for k = 0 to N/2 - 1:
t = exp(-2*pi*i*k/N) * odd[k]
X[k] = even[k] + t
X[k+N/2] = even[k] - t
return X
In-Place Iterative FFT and Bit-Reversal
Recursive FFT incurs $O(N \log N)$ function-call overhead. The standard optimisation is the in-place Cooley-Tukey algorithm:
-
Bit-reverse permutation. Rearrange $x$ so that element at index $n$ moves to index $\text{rev}(n)$, where $\text{rev}(n)$ is the $p$-bit reversal of $n$. This places the input in the order that the recursion would access it at the leaves.
-
Iterative butterfly passes. Perform $p = \log_2 N$ passes of butterfly operations with increasing stride, bottom-up.
IterativeFFT(x):
BitReversePermutation(x)
for s = 1 to log2(N):
m = 2^s
omega_m = exp(-2*pi*i / m)
for k = 0 to N-1 step m:
omega = 1
for j = 0 to m/2 - 1:
t = omega * x[k + j + m/2]
u = x[k + j]
x[k + j] = u + t
x[k + j + m/2] = u - t
omega *= omega_m
This runs in-place with $O(1)$ extra space (beyond the input array).
Inverse FFT
The inverse DFT is:
$$x[n] = \frac{1}{N} \overline{\text{FFT}!\left(\overline{X}\right)}[n]$$
Algorithm: conjugate $X$, run the forward FFT, conjugate the result, divide by $N$. This reuses the same FFT code with no additional implementation.
The Convolution Theorem
The circular convolution of two length-$N$ sequences $a$ and $b$ is:
$$(a \star b)[n] = \sum_{k=0}^{N-1} a[k] , b[n - k \bmod N]$$
The convolution theorem states:
$$\mathcal{F}(a \star b) = \mathcal{F}(a) \cdot \mathcal{F}(b)$$
(pointwise multiplication in the frequency domain). Therefore:
$$a \star b = \mathcal{F}^{-1}!\left(\mathcal{F}(a) \cdot \mathcal{F}(b)\right)$$
Computing this via FFT takes $O(N \log N)$ instead of $O(N^2)$ for the direct sum.
Polynomial multiplication. Two degree-$(n-1)$ polynomials $A(x)$ and $B(x)$ are represented as coefficient vectors. Their product $C = A \cdot B$ is their convolution. The FFT-based algorithm:
- Pad both to length $\geq 2n$ (to avoid circular aliasing): $O(n)$.
- FFT both: $O(n \log n)$.
- Pointwise multiply: $O(n)$.
- Inverse FFT: $O(n \log n)$.
Total: $O(n \log n)$ versus $O(n^2)$ for the schoolbook method.
Number Theoretic Transform (NTT)
For exact integer arithmetic (no floating-point rounding), replace $\mathbb{C}$ with $\mathbb{Z}_p$ for a suitable prime $p$. The NTT is the DFT over $\mathbb{Z}_p$, using a primitive $N$-th root of unity modulo $p$ in place of $e^{-2\pi i/N}$.
Requirements: $p \equiv 1 \pmod{N}$ (so $N$-th roots of unity exist in $\mathbb{Z}_p$). Popular choice: $p = 998244353 = 119 \cdot 2^{23} + 1$, which supports NTT up to size $2^{23}$.
NTT enables exact polynomial multiplication and large-integer multiplication in competitive programming and cryptographic implementations.
2D FFT
For images (2D signals), apply the 1D FFT to every row, then to every column. This costs $O(MN \log(MN))$ for an $M \times N$ image. Convolution with a filter (e.g. Gaussian blur) becomes pointwise multiplication of their 2D DFTs, enabling efficient spatial filtering.
Examples
Fast polynomial multiplication. Multiplying two degree-$10^5$ polynomials naively takes $10^{10}$ operations. FFT-based multiplication takes about $2 \times 10^5 \times 17 \approx 3.4 \times 10^6$ operations - three orders of magnitude faster. This is used in cryptographic libraries (e.g. lattice-based cryptography where polynomial rings are central) and competitive programming problems involving large convolutions.
Audio processing. A digital equaliser boosts or attenuates frequency bands. The standard implementation: take the FFT of a short audio frame, multiply by the desired frequency-domain filter, inverse FFT. A 1024-point FFT on a modern CPU takes roughly 10 microseconds - well within the real-time budget for 44.1 kHz audio.
Image filtering. Gaussian blur via direct 2D convolution on a $4000 \times 3000$ image with a $101 \times 101$ kernel costs $\sim 1.2 \times 10^{11}$ multiplications. Via FFT convolution it costs $O(N \log N) \approx 4 \times 10^7$ operations - about 3000x faster, which is why image editors and computer vision pipelines use FFT-based convolution for large kernels.
Read Next: Signal Processing