Fast Fourier Transform - Quadratic to Log-Linear by Symmetry // Megha Bose

Helpful context:

Divide & Conquer - Small Savings That Compound

In 1965, James Cooley and John Tukey published the FFT. It was later discovered that Gauss had the same algorithm in 1805 but never published it. The FFT reduced a computation from $O(n^2)$ to $O(n \log n)$ - for audio processing of $n = 10^6$ samples, that’s from $10^{12}$ operations to $2 \times 10^7$. A factor of fifty thousand.

It enabled modern telecommunications, image compression (JPEG uses it), audio processing (MP3), and radar. It has been called “the most important algorithm of the 20th century.”

The Discrete Fourier Transform

Given a sequence of $n$ numbers $x[0], x[1], \ldots, x[n-1]$ (say, audio samples over time), the DFT computes $n$ frequency components:

$$X[k] = \sum_{j=0}^{n-1} x[j] e^{-2\pi i jk/n}, \qquad k = 0, 1, \ldots, n-1$$

Each $X[k]$ tells you how much of the frequency $k/n$ cycles per sample is present in the signal. Compute all $n$ of them naively and you need $n$ multiplications per output: $O(n^2)$ total.

For $n = 10^6$: $10^{12}$ multiplications. At $10^9$ operations per second, that’s 1000 seconds. Not useful for real-time audio.

The Key Insight: Redundancy in the DFT Matrix

Write $\omega = e^{-2\pi i/n}$ (the primitive $n$-th root of unity). Then $X[k] = \sum_{j=0}^{n-1} x[j] \omega^{jk}$.

The values $\omega^{jk}$ are not all distinct. Because $\omega^n = 1$, the sequence $1, \omega, \omega^2, \ldots, \omega^{n-1}$ cycles. A huge fraction of the computation in the naive DFT is redundant - you’re computing the same complex exponentials over and over.

Cooley-Tukey exploits this. Assume $n$ is a power of 2. Split $x$ into its even-indexed and odd-indexed elements:

$$X[k] = \sum_{j=0}^{n/2-1} x[2j] \omega_n^{2jk} + \sum_{j=0}^{n/2-1} x[2j+1] \omega_n^{(2j+1)k}$$

$$= \sum_{j=0}^{n/2-1} x[2j] \omega_{n/2}^{jk} + \omega_n^k \sum_{j=0}^{n/2-1} x[2j+1] \omega_{n/2}^{jk}$$

$$= E[k] + \omega_n^k \cdot O[k]$$

where $E$ is the DFT of the even-indexed elements and $O$ is the DFT of the odd-indexed elements - each a DFT of size $n/2$.

Now the trick: $E[k]$ and $O[k]$ are periodic with period $n/2$. And $\omega_n^{k + n/2} = -\omega_n^k$. So:

$$X[k] = E[k] + \omega_n^k \cdot O[k]$$ $$X[k + n/2] = E[k] - \omega_n^k \cdot O[k]$$

Two outputs from two half-size DFTs and one complex multiplication. This is the butterfly operation.

Divide and Conquer

The recursion:

$$T(n) = 2T(n/2) + O(n)$$

This is the same recurrence as merge sort. By the master theorem: $T(n) = O(n \log n)$.

For $n = 10^6$: instead of $10^{12}$ operations, we do $10^6 \cdot 20 = 2 \times 10^7$. Real-time audio processing becomes possible.

FFT(x[0..n-1]):
    if n == 1: return x
    even = FFT(x[0], x[2], x[4], ..., x[n-2])
    odd  = FFT(x[1], x[3], x[5], ..., x[n-1])
    for k = 0 to n/2 - 1:
        t        = exp(-2*pi*i*k/n) * odd[k]
        X[k]     = even[k] + t
        X[k+n/2] = even[k] - t
    return X

The butterfly network: $\log_2 n$ stages, each doing $n/2$ butterfly operations. The total structure is a complete binary tree of DFT computations.

Polynomial Multiplication

Here is the application that makes FFT indispensable beyond signal processing.

You have two polynomials of degree $n$:

$$A(x) = a_0 + a_1 x + \cdots + a_n x^n$$ $$B(x) = b_0 + b_1 x + \cdots + b_n x^n$$

Their product $C = A \cdot B$ has degree $2n$. Computing the $2n+1$ coefficients of $C$ by the schoolbook method takes $O(n^2)$ multiplications.

FFT does it in $O(n \log n)$ via the evaluation-interpolation strategy:

Evaluate $A$ and $B$ at $2n+1$ points simultaneously (using FFT): $O(n \log n)$.
Pointwise multiply: $A(x_i) \cdot B(x_i) = C(x_i)$ for each point. $O(n)$.
Interpolate to recover the coefficients of $C$ (using inverse FFT): $O(n \log n)$.

The natural evaluation points are the $n$-th roots of unity - exactly what the DFT computes. That’s why FFT and polynomial multiplication are so tightly linked.

This isn’t just theoretical. Multiplying two large integers is polynomial multiplication over base 10 (or base $2^{32}$). Computer algebra systems, cryptographic libraries, and competitive programming all use FFT-based multiplication for large numbers.

Discomfort check. The FFT only works for $n$ a power of 2. What if your signal has a different length?

Zero-pad to the next power of 2. An audio frame of 1000 samples becomes 1024. The extra zeros don’t affect the signal; they just make the algorithm applicable. For sizes that are highly composite (products of small primes), there are analogous “mixed-radix” FFT algorithms. For prime $n$, Bluestein’s algorithm or Rader’s algorithm handle the case, though they’re rarely needed in practice because designers choose power-of-2 sizes.

The Inverse FFT

To recover the signal $x$ from its frequency representation $X$, the inverse DFT is:

$$x[j] = \frac{1}{n} \sum_{k=0}^{n-1} X[k] e^{2\pi i jk/n}$$

Notice: the inverse DFT is almost identical to the forward DFT - just conjugate the exponentials and divide by $n$. In code: conjugate the input, run the forward FFT, conjugate the output, divide by $n$. No separate algorithm needed.

Applications

Audio and image compression. MP3 uses the modified DCT (closely related to the DFT) to convert audio samples to frequency coefficients. JPEG uses the DCT on $8 \times 8$ pixel blocks. Both throw away high-frequency components (imperceptible to humans) for compression. The FFT makes this fast enough to run on consumer hardware.

Signal filtering. To convolve a signal with a filter kernel (e.g., applying an equalizer or a blur), computing the convolution directly is $O(nm)$. Via FFT, it’s $O(n \log n)$: transform both to frequency domain, pointwise multiply, transform back. This is the convolution theorem: convolution in time = multiplication in frequency.

Number-theoretic transform (NTT). For cryptography - particularly lattice-based cryptography - you need exact integer arithmetic without floating-point errors. Replace complex exponentials with roots of unity modulo a prime $p$. Same $O(n \log n)$ algorithm, exact integer results. The prime $p = 998244353 = 119 \cdot 2^{23} + 1$ is popular because it supports NTTs up to size $2^{23}$.

Solving PDEs. Many differential equations (heat equation, wave equation, Schrödinger equation) become algebraic in the frequency domain. Spectral methods solve them by transforming to frequency domain, solving algebraically, and transforming back - all via FFT.

Summary

Concept	Detail
DFT definition	$X[k] = \sum_{j=0}^{n-1} x[j] e^{-2\pi i jk/n}$
Naive DFT cost	$O(n^2)$
FFT cost	$O(n \log n)$
Key trick	Split even/odd indices, recurse, combine with butterfly
Butterfly operation	$X[k] = E[k] + \omega^k O[k]$; $X[k+n/2] = E[k] - \omega^k O[k]$
Polynomial multiplication	FFT evaluate + pointwise multiply + IFFT interpolate = $O(n \log n)$
Requires	$n$ a power of 2 (or zero-pad)
Inverse FFT	Conjugate, forward FFT, conjugate, divide by $n$

Read next:

Expectation-Maximization - Alternating Between Guessing and Fitting