Fourier Analysis
Prerequisite:
Motivation: Decomposing Signals into Frequencies
Every signal - a sound wave, an image row, an electrical current - can be thought of as a superposition of pure sinusoidal oscillations. Fourier analysis makes this precise. Instead of asking “what is the value of the signal at time $t$?”, we ask “how much of each frequency is present?”. The answer lives in the frequency domain, and the round trip between domains is governed by the Fourier transform.
This perspective is not merely convenient. It turns convolution (a global, expensive operation in the time domain) into pointwise multiplication in the frequency domain, and it reveals conserved quantities like energy that are not obvious from the time-domain representation.
Fourier Series: Periodic Functions
Definition
Let $f: \mathbb{R} \to \mathbb{R}$ be a $T$-periodic function (i.e., $f(x+T) = f(x)$ for all $x$). Let $\omega_0 = 2\pi/T$ be the fundamental angular frequency. The Fourier series of $f$ is
$$f(x) = \frac{a_0}{2} + \sum_{n=1}^\infty \left(a_n \cos(n\omega_0 x) + b_n \sin(n\omega_0 x)\right)$$
where the Fourier coefficients are
$$a_n = \frac{2}{T}\int_0^T f(x)\cos(n\omega_0 x),dx, \quad n = 0, 1, 2, \ldots$$
$$b_n = \frac{2}{T}\int_0^T f(x)\sin(n\omega_0 x),dx, \quad n = 1, 2, \ldots$$
The formula for $a_0$ gives twice the mean value of $f$ over one period; the factor of $1/2$ in the series corrects for this.
Complex Exponential Form
Using Euler’s identity $e^{i\theta} = \cos\theta + i\sin\theta$, we can unify the real form. Define the complex Fourier coefficients
$$c_n = \frac{1}{T}\int_0^T f(x),e^{-in\omega_0 x},dx, \quad n \in \mathbb{Z}$$
Then
$$f(x) = \sum_{n=-\infty}^\infty c_n e^{in\omega_0 x}$$
Note $c_n = \frac{1}{2}(a_n - ib_n)$ for $n > 0$, $c_0 = a_0/2$, and $c_{-n} = \overline{c_n}$ when $f$ is real-valued. The complex form is cleaner for analysis and leads directly to the DFT.
Spectrum of a square wave (first 5 harmonics):
|c_n|
|
1 |---
| |
0.5| | --- ---
| | | | | |
0 +---+---+---+---+---+---+--- n
-5 -3 -1 1 3 5
Convergence: Dirichlet Conditions
Theorem (Dirichlet). If $f$ is $T$-periodic, has finitely many discontinuities and finitely many local extrema on $[0,T]$, and $\int_0^T |f(x)|,dx < \infty$, then the Fourier series converges to $f(x)$ at every point of continuity, and to $\frac{1}{2}[f(x^+) + f(x^-)]$ at each jump discontinuity.
Gibbs Phenomenon. At a jump discontinuity, the partial sums of the Fourier series overshoot by approximately $8.9%$ of the jump size, regardless of how many terms are included. This overshoot does not vanish; it only narrows. This is a fundamental obstacle for representing piecewise signals by Fourier series.
The Fourier Transform: Aperiodic Functions
For functions defined on all of $\mathbb{R}$ that are not periodic, we take $T \to \infty$. The sum over discrete frequencies becomes an integral over a continuum.
Definition
Definition. For $f \in L^1(\mathbb{R})$, the Fourier transform of $f$ is
$$\hat{f}(\xi) = \int_{-\infty}^\infty f(x),e^{-2\pi i \xi x},dx$$
and the inverse Fourier transform is
$$f(x) = \int_{-\infty}^\infty \hat{f}(\xi),e^{2\pi i \xi x},d\xi$$
Here $\xi$ is the frequency in Hz (cycles per unit). An equivalent convention uses angular frequency $\omega = 2\pi\xi$; many physics and engineering texts use this, changing the factors of $2\pi$ in the exponent.
Key Properties
| Property | Statement |
|---|---|
| Linearity | $\widehat{\alpha f + \beta g} = \alpha\hat{f} + \beta\hat{g}$ |
| Time shift | $\widehat{f(x-x_0)}(\xi) = e^{-2\pi i \xi x_0}\hat{f}(\xi)$ |
| Frequency shift | $\widehat{e^{2\pi i \xi_0 x}f(x)}(\xi) = \hat{f}(\xi - \xi_0)$ |
| Scaling | $\widehat{f(ax)}(\xi) = \frac{1}{ |
| Duality | $\widehat{\hat{f}}(\xi) = f(-\xi)$ |
| Differentiation | $\widehat{f'}(\xi) = 2\pi i\xi,\hat{f}(\xi)$ |
The scaling property captures an important trade-off: compressing a signal in time stretches its spectrum, and vice versa.
Parseval’s Theorem
Theorem (Parseval/Plancherel). For $f, g \in L^2(\mathbb{R})$,
$$\int_{-\infty}^\infty f(x)\overline{g(x)},dx = \int_{-\infty}^\infty \hat{f}(\xi)\overline{\hat{g}(\xi)},d\xi$$
In particular, setting $f = g$: $|f|_2^2 = |\hat{f}|_2^2$. The Fourier transform is an isometry on $L^2(\mathbb{R})$: it preserves the total energy of a signal.
Convolution Theorem
Definition. The convolution of $f$ and $g$ is $(f * g)(x) = \int_{-\infty}^\infty f(t)g(x-t),dt$.
Theorem (Convolution). $\widehat{f * g}(\xi) = \hat{f}(\xi)\cdot\hat{g}(\xi)$.
Equivalently, $\widehat{f \cdot g}(\xi) = \hat{f} * \hat{g}(\xi)$. Multiplication and convolution are dual operations under the Fourier transform. This is why applying a linear filter (convolution with a kernel) is equivalent to pointwise multiplication in the frequency domain - the foundation of frequency-domain filtering.
Riemann-Lebesgue Lemma
Theorem (Riemann-Lebesgue). If $f \in L^1(\mathbb{R})$, then $\hat{f}(\xi) \to 0$ as $|\xi| \to \infty$.
Intuitively, rapidly oscillating exponentials average out against any integrable function. This means every physical signal has a Fourier transform that decays at high frequencies.
The Uncertainty Principle
Theorem (Heisenberg Uncertainty Principle). Let $f \in L^2(\mathbb{R})$ with $|f|2 = 1$. Define the spread in position $\sigma_x^2 = \int x^2 |f(x)|^2,dx$ and the spread in frequency $\sigma\xi^2 = \int \xi^2 |\hat{f}(\xi)|^2,d\xi$. Then
$$\sigma_x \cdot \sigma_\xi \geq \frac{1}{4\pi}$$
Equality holds if and only if $f$ is a Gaussian: $f(x) = C e^{-\alpha x^2}$. You cannot simultaneously localize a signal in both time and frequency - this is not a limitation of measurement but a mathematical fact about functions and their Fourier transforms.
Time-frequency trade-off:
Narrow in time Wide in time
(sharp impulse) (slow oscillation)
| |
| ## | # # # # # # # # #
| ## | # # # # # # # # #
+-----------> t +-----------> t
Wide spectrum Narrow spectrum
| |
|########### | ###
|########### | #####
+-----------> xi +-----------> xi
Discrete Fourier Transform (DFT)
In practice, signals are sampled at discrete time points. Given a finite sequence $x[0], x[1], \ldots, x[N-1]$, the Discrete Fourier Transform is
$$X[k] = \sum_{n=0}^{N-1} x[n],e^{-2\pi i kn/N}, \quad k = 0, 1, \ldots, N-1$$
and the inverse DFT is
$$x[n] = \frac{1}{N}\sum_{k=0}^{N-1} X[k],e^{2\pi i kn/N}$$
The DFT matrix $F$ has entries $F_{kn} = e^{-2\pi i kn/N}$; it is a unitary matrix (up to normalization), and the inverse DFT uses $\overline{F}$. Parseval’s theorem holds in discrete form: $\sum_n |x[n]|^2 = \frac{1}{N}\sum_k |X[k]|^2$.
Fast Fourier Transform (FFT)
Naive computation of the DFT takes $O(N^2)$ operations. The Cooley-Tukey FFT algorithm reduces this to $O(N \log N)$ by exploiting the recursive structure of the DFT when $N$ is a power of 2.
Idea. Split the DFT of length $N$ into two DFTs of length $N/2$:
$$X[k] = \underbrace{\sum_{n=0}^{N/2-1} x[2n],e^{-2\pi i k(2n)/N}}{\text{DFT of even-indexed terms}} + e^{-2\pi ik/N}\underbrace{\sum{n=0}^{N/2-1} x[2n+1],e^{-2\pi i k(2n)/N}}_{\text{DFT of odd-indexed terms}}$$
The factor $W_N^k = e^{-2\pi ik/N}$ is called the twiddle factor. Applying this recursion $\log_2 N$ times yields the butterfly diagram: a network of $N\log_2 N$ complex multiplications.
FFT butterfly (N=4):
x[0] ---+-----------> X[0]
| butterfly
x[2] ---+-----------> X[1]
x[1] ---+-----------> X[2]
|
x[3] ---+-----------> X[3]
For $N = 10^6$, the FFT is roughly $50{,}000\times$ faster than naive DFT. This single algorithm is responsible for real-time audio processing, radar, MRI reconstruction, and more.
Applications and CS Perspective
Filtering. A low-pass filter zeroes out high-frequency components of $\hat{f}$. In the time domain this is convolution with a kernel; in the frequency domain it is pointwise multiplication. The FFT makes this fast: compute FFT, multiply, inverse FFT.
Solving PDEs. The Fourier transform converts a PDE (like the heat equation $u_t = u_{xx}$) into an ODE in frequency space: $\hat{u}_t(\xi, t) = -(2\pi\xi)^2 \hat{u}(\xi, t)$, which is solved by $\hat{u}(\xi, t) = \hat{u}_0(\xi),e^{-(2\pi\xi)^2 t}$.
Attention Mechanisms. The dot-product attention score $\text{softmax}(QK^T/\sqrt{d})V$ can be interpreted as a learnable spectral weighting: the query-key product selects which frequency-like components of the values to amplify. Linear attention approximations explicitly use kernel approximations related to random Fourier features.
GPU FFT. Libraries like cuFFT implement the Cooley-Tukey algorithm on GPU grids, processing batches of transforms simultaneously. This is critical for training diffusion models (noise schedules in frequency space), image-to-image networks, and spectral graph networks.
Audio and Image Processing. MP3 uses the modified discrete cosine transform (MDCT), a close relative of the DFT. JPEG uses the DCT on $8\times 8$ blocks. Mel-frequency cepstral coefficients (MFCCs) for speech recognition are computed from the log-magnitude DFT spectrum filtered by mel-scale banks.
Read Next: