Prerequisite: Divide & Conquer


The Discrete Fourier Transform (DFT) decomposes a sequence of $N$ numbers into its constituent frequencies. Computing it naively costs $O(N^2)$. The Cooley-Tukey Fast Fourier Transform reduces this to $O(N \log N)$ by exploiting the symmetry of complex roots of unity. First published in 1965, it is routinely called the most important numerical algorithm of the 20th century - enabling real-time audio processing, fast polynomial multiplication, and modern communications.

The Discrete Fourier Transform

Given a sequence $x[0], x[1], \ldots, x[N-1]$, its DFT is the sequence $X[0], \ldots, X[N-1]$ defined by:

$$X[k] = \sum_{n=0}^{N-1} x[n] , e^{-2\pi i k n / N}, \qquad k = 0, 1, \ldots, N-1$$

Write $\omega_N = e^{-2\pi i / N}$ (the primitive $N$-th root of unity). Then $X[k] = \sum_{n=0}^{N-1} x[n] , \omega_N^{kn}$.

The inverse DFT recovers $x$ from $X$:

$$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k] , e^{2\pi i k n / N}$$

Naive computation: each of the $N$ outputs requires a sum of $N$ terms - $O(N^2)$ multiplications and additions.

Cooley-Tukey Divide and Conquer

Assume $N = 2^p$ (a power of two). Split the input into even-indexed and odd-indexed elements:

$$X[k] = \sum_{n=0}^{N/2-1} x[2n] , \omega_N^{k \cdot 2n} + \sum_{n=0}^{N/2-1} x[2n+1] , \omega_N^{k(2n+1)}$$

$$= \sum_{n=0}^{N/2-1} x[2n] , \omega_{N/2}^{kn} + \omega_N^k \sum_{n=0}^{N/2-1} x[2n+1] , \omega_{N/2}^{kn}$$

$$= E[k] + \omega_N^k \cdot O[k]$$

where $E$ and $O$ are the DFTs of the even and odd subsequences respectively - each of length $N/2$. Using the periodicity of DFT outputs ($E[k + N/2] = E[k]$) and the anti-symmetry of twiddle factors ($\omega_N^{k+N/2} = -\omega_N^k$):

$$X[k] = E[k] + \omega_N^k \cdot O[k]$$ $$X[k + N/2] = E[k] - \omega_N^k \cdot O[k]$$

This is the butterfly operation: two outputs computed from two inputs and one twiddle factor.

Recurrence and complexity:

$$T(N) = 2T(N/2) + O(N) \implies T(N) = O(N \log N)$$

by the Master Theorem ($a = b = 2$, $f(n) = O(n)$, case 2).

Pseudocode (Recursive)

FFT(x[0..N-1]):
  if N == 1: return x
  even = FFT(x[0], x[2], x[4], ..., x[N-2])
  odd  = FFT(x[1], x[3], x[5], ..., x[N-1])
  for k = 0 to N/2 - 1:
    t        = exp(-2*pi*i*k/N) * odd[k]
    X[k]     = even[k] + t
    X[k+N/2] = even[k] - t
  return X

In-Place Iterative FFT and Bit-Reversal

Recursive FFT incurs $O(N \log N)$ function-call overhead. The standard optimisation is the in-place Cooley-Tukey algorithm:

  1. Bit-reverse permutation. Rearrange $x$ so that element at index $n$ moves to index $\text{rev}(n)$, where $\text{rev}(n)$ is the $p$-bit reversal of $n$. This places the input in the order that the recursion would access it at the leaves.

  2. Iterative butterfly passes. Perform $p = \log_2 N$ passes of butterfly operations with increasing stride, bottom-up.

IterativeFFT(x):
  BitReversePermutation(x)
  for s = 1 to log2(N):
    m = 2^s
    omega_m = exp(-2*pi*i / m)
    for k = 0 to N-1 step m:
      omega = 1
      for j = 0 to m/2 - 1:
        t          = omega * x[k + j + m/2]
        u          = x[k + j]
        x[k + j]        = u + t
        x[k + j + m/2]  = u - t
        omega *= omega_m

This runs in-place with $O(1)$ extra space (beyond the input array).

Inverse FFT

The inverse DFT is:

$$x[n] = \frac{1}{N} \overline{\text{FFT}!\left(\overline{X}\right)}[n]$$

Algorithm: conjugate $X$, run the forward FFT, conjugate the result, divide by $N$. This reuses the same FFT code with no additional implementation.

The Convolution Theorem

The circular convolution of two length-$N$ sequences $a$ and $b$ is:

$$(a \star b)[n] = \sum_{k=0}^{N-1} a[k] , b[n - k \bmod N]$$

The convolution theorem states:

$$\mathcal{F}(a \star b) = \mathcal{F}(a) \cdot \mathcal{F}(b)$$

(pointwise multiplication in the frequency domain). Therefore:

$$a \star b = \mathcal{F}^{-1}!\left(\mathcal{F}(a) \cdot \mathcal{F}(b)\right)$$

Computing this via FFT takes $O(N \log N)$ instead of $O(N^2)$ for the direct sum.

Polynomial multiplication. Two degree-$(n-1)$ polynomials $A(x)$ and $B(x)$ are represented as coefficient vectors. Their product $C = A \cdot B$ is their convolution. The FFT-based algorithm:

  1. Pad both to length $\geq 2n$ (to avoid circular aliasing): $O(n)$.
  2. FFT both: $O(n \log n)$.
  3. Pointwise multiply: $O(n)$.
  4. Inverse FFT: $O(n \log n)$.

Total: $O(n \log n)$ versus $O(n^2)$ for the schoolbook method.

Number Theoretic Transform (NTT)

For exact integer arithmetic (no floating-point rounding), replace $\mathbb{C}$ with $\mathbb{Z}_p$ for a suitable prime $p$. The NTT is the DFT over $\mathbb{Z}_p$, using a primitive $N$-th root of unity modulo $p$ in place of $e^{-2\pi i/N}$.

Requirements: $p \equiv 1 \pmod{N}$ (so $N$-th roots of unity exist in $\mathbb{Z}_p$). Popular choice: $p = 998244353 = 119 \cdot 2^{23} + 1$, which supports NTT up to size $2^{23}$.

NTT enables exact polynomial multiplication and large-integer multiplication in competitive programming and cryptographic implementations.

2D FFT

For images (2D signals), apply the 1D FFT to every row, then to every column. This costs $O(MN \log(MN))$ for an $M \times N$ image. Convolution with a filter (e.g. Gaussian blur) becomes pointwise multiplication of their 2D DFTs, enabling efficient spatial filtering.

Examples

Fast polynomial multiplication. Multiplying two degree-$10^5$ polynomials naively takes $10^{10}$ operations. FFT-based multiplication takes about $2 \times 10^5 \times 17 \approx 3.4 \times 10^6$ operations - three orders of magnitude faster. This is used in cryptographic libraries (e.g. lattice-based cryptography where polynomial rings are central) and competitive programming problems involving large convolutions.

Audio processing. A digital equaliser boosts or attenuates frequency bands. The standard implementation: take the FFT of a short audio frame, multiply by the desired frequency-domain filter, inverse FFT. A 1024-point FFT on a modern CPU takes roughly 10 microseconds - well within the real-time budget for 44.1 kHz audio.

Image filtering. Gaussian blur via direct 2D convolution on a $4000 \times 3000$ image with a $101 \times 101$ kernel costs $\sim 1.2 \times 10^{11}$ multiplications. Via FFT convolution it costs $O(N \log N) \approx 4 \times 10^7$ operations - about 3000x faster, which is why image editors and computer vision pipelines use FFT-based convolution for large kernels.


Read Next: Signal Processing