Helpful context:


In 1822, Joseph Fourier made a claim that shocked the mathematical world: any function - no matter how complicated, no matter how discontinuous - can be written as a sum of sines and cosines.

This sounded absurd. A sum of smooth, infinitely differentiable functions should be smooth. How could a discontinuous function - one that jumps - be expressed as a sum of sine waves that never jump? The claim was disputed for decades by some of the most capable mathematicians of the 19th century.

The careful working-out of exactly what “any function” means and exactly what “written as” means consumed a century of mathematics. It produced, as necessary byproducts, the modern theory of integration (Lebesgue), the notion of convergence for functions (pointwise vs. $L^2$ vs. uniform), and much of real analysis. All of this, to make Fourier’s claim precise and correct.

The core idea turned out to be correct enough to be the foundation of signal processing, data compression, quantum mechanics, and the analysis of partial differential equations. Almost everywhere you encounter a wave or a frequency, you find Fourier.


Section 1: Why Sines and Cosines?

Before we can answer this question, consider what the alternatives are. A signal can be decomposed into many different bases: step functions, wavelets, polynomial splines, random features. Why sines and cosines specifically?

Three separate answers converge to the same conclusion.

Before asking how to decompose functions into sines and cosines, ask why you would want to.

First answer: eigenfunctions of differentiation. The answer lies in differential equations. Many physical systems satisfy equations of the form:

$$\frac{d^2 y}{dt^2} + \omega^2 y = f(t)$$

(a harmonic oscillator driven by a force $f$), or more generally, equations involving derivatives of $y$. If $f(t)$ is complicated, solving this looks hard.

But suppose $f(t) = A\sin(\nu t)$ for some specific frequency $\nu$. Then you can guess that $y$ also oscillates at frequency $\nu$, substitute, and reduce the problem to algebra.

The reason this strategy works is deeper than it first appears. Sines and cosines are eigenfunctions of the differentiation operator $d/dx$:

$$\frac{d}{dx}\sin(\omega x) = \omega\cos(\omega x), \qquad \frac{d^2}{dx^2}\sin(\omega x) = -\omega^2\sin(\omega x).$$

Differentiating $\sin(\omega x)$ twice returns $\sin(\omega x)$, multiplied by the constant $-\omega^2$. This means: in a system governed by a differential equation, each sinusoidal frequency component $\sin(\omega x)$ evolves independently - governed by a scalar equation involving only that frequency, not entangled with other frequencies.

If you can decompose any input $f(t)$ into sinusoidal components, you can analyze each frequency independently, then reassemble. The differential equation becomes: analyze each frequency separately, solve the simple scalar equation for that frequency, sum the responses. This is the superposition principle for linear systems, and sinusoids are the natural language for it.

Second answer: translation invariance. A linear, translation-invariant system treats the same input the same way regardless of when it arrives. The eigenfunctions of any such system are complex exponentials - this is a theorem. Since most physical systems have translation symmetry (a circuit or filter does not care whether you start the signal at $t = 0$ or $t = 1000$), sinusoids are the natural basis.

Third answer: they are orthogonal. The sinusoids form an orthogonal basis for periodic functions. Orthogonal bases make coefficient extraction easy (inner products). Non-orthogonal bases require solving a system of equations. This computational convenience is a genuine advantage. The orthogonality is not an accident - it is a consequence of the translation invariance and the group structure of the circle.


Section 2: Trigonometric Identities as Fourier Facts

Many identities from trigonometry are Fourier facts in disguise.

The product-to-sum identity $\cos(mx)\cos(nx) = \frac{1}{2}[\cos((m-n)x) + \cos((m+n)x)]$ is the basis for the orthogonality of the cosines: integrating over $[-\pi, \pi]$, the right side integrates to zero unless $m = n$. Every orthogonality relation follows from this.

Parseval’s theorem $\sum_{n=-\infty}^{\infty}|c_n|^2 = \frac{1}{2\pi}\int|f|^2 dx$ is the Pythagorean theorem in function space. For a pure sinusoid $f(x) = \cos(kx)$: $|c_k|^2 + |c_{-k}|^2 = 1/4 + 1/4 = 1/2$, and $\frac{1}{2\pi}\int_{-\pi}^{\pi}\cos^2(kx) dx = 1/2$. They match.

Multiplication by $e^{ikx}$ shifts all Fourier coefficients by $k$: if $f$ has coefficients $\{c_n\}$, then $f(x)e^{ikx}$ has coefficients $c_{n-k}$. This is the frequency-shift property. The angle addition formula $e^{i(\alpha+\beta)} = e^{i\alpha}e^{i\beta}$ is the same fact: adding arguments corresponds to shifting frequency.


Section 3: Orthogonality - The Conceptual Key

Here is the frame that makes everything else click.

You know how to decompose a vector $\mathbf{v} \in \mathbb{R}^n$ in terms of an orthonormal basis $\{e_1, \ldots, e_n\}$:

$$\mathbf{v} = \sum_{k=1}^n (\mathbf{v} \cdot e_k) e_k.$$

The coefficient of $e_k$ is the dot product $\mathbf{v} \cdot e_k$. Orthogonality of the basis ($e_j \cdot e_k = 0$ for $j \neq k$) is what makes this work: dotting $\mathbf{v}$ with $e_k$ isolates the $k$-th component.

Now define an inner product on functions. For $f, g: [-\pi, \pi] \to \mathbb{R}$:

$$\langle f, g \rangle = \int_{-\pi}^{\pi} f(x) g(x) dx.$$

This satisfies the axioms of an inner product: linearity in $f$, symmetry in $f$ and $g$, and $\langle f, f\rangle \geq 0$ with equality only if $f = 0$ (in an $L^2$ sense). Functions form a (infinite-dimensional) vector space; this integral is the notion of inner product on that space.

The claim: sinusoidal functions $\{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\}$ are mutually orthogonal under this inner product. Verify:

$$\int_{-\pi}^{\pi} \cos(mx)\cos(nx) dx = \begin{cases} 0 & m \neq n \\ \pi & m = n \neq 0 \\ 2\pi & m = n = 0 \end{cases}$$

$$\int_{-\pi}^{\pi} \sin(mx)\sin(nx) dx = \begin{cases} 0 & m \neq n \\ \pi & m = n \neq 0 \end{cases}$$

$$\int_{-\pi}^{\pi} \cos(mx)\sin(nx) dx = 0 \quad \text{for all } m, n.$$

These are the orthogonality relations. They follow from product-to-sum identities: $\cos(mx)\cos(nx) = \frac{1}{2}[\cos((m-n)x) + \cos((m+n)x)]$. Integrate over $[-\pi, \pi]$: the integral of $\cos(kx)$ over a full period is zero whenever $k \neq 0$. The integral is $2\pi$ when $k = 0$.

Fourier’s idea: the sinusoidal functions form an orthogonal “basis” for periodic functions, exactly as $\{e_1, \ldots, e_n\}$ is a basis for $\mathbb{R}^n$. Finding Fourier coefficients is exactly “projecting onto basis elements” - the same operation as taking dot products. This is not an analogy; it is the same mathematical operation.

Discomfort check. How does the integral formula for Fourier coefficients extract the right value? Suppose $f(x) = \sum_n (a_n \cos nx + b_n \sin nx)$. Multiply both sides by $\cos(mx)$ and integrate from $-\pi$ to $\pi$. On the right, every term $a_n \int \cos(nx)\cos(mx) dx$ and $b_n \int \sin(nx)\cos(mx) dx$ vanishes by orthogonality - except $a_m \int \cos^2(mx) dx = a_m\pi$. So the integral on the left equals $a_m\pi$, giving $a_m = (1/\pi)\int f(x)\cos(mx) dx$. The orthogonality of cosines is the entire mechanism. Without it, multiplying by $\cos(mx)$ and integrating would not isolate $a_m$ - you would pick up contributions from every other coefficient.


Section 4: What the Coefficients Tell You

Before computing examples, pause to interpret what the Fourier coefficients actually measure.

The $n$-th complex Fourier coefficient is $c_n = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)e^{-inx} dx$. Think of $e^{inx}$ as an oscillation at frequency $n$ (cycles per $2\pi$). The coefficient $c_n$ is the inner product of $f$ with this oscillation.

If $f$ has a large component oscillating at frequency $n$, the integral $\int f(x)e^{-inx} dx$ will be large (the two functions are “in sync” and reinforce). If $f$ has little content at frequency $n$, the integral will be small (positive and negative contributions cancel by orthogonality).

The magnitude $|c_n|$ measures the amplitude of the frequency-$n$ component. The argument $\arg(c_n)$ measures the phase - where in the oscillation cycle the component peaks.

The spectrum of $f$ is the plot of $|c_n|$ vs. $n$. For the square wave: only odd harmonics $n = 1, 3, 5, \ldots$ have nonzero coefficients, with $|c_n| = 2/(n\pi)$. For a pure sinusoid $f(x) = \sin(kx)$: only $c_k$ and $c_{-k}$ are nonzero. For a random noise signal: all coefficients are roughly the same magnitude.

Parseval’s theorem for series: $\frac{1}{2\pi}\int_{-\pi}^{\pi}|f(x)|^2 dx = \sum_{n=-\infty}^{\infty}|c_n|^2$. Total energy in the signal equals total energy across all frequency components. No energy is created or destroyed by the Fourier decomposition.


Section 5: Fourier Series

A function $f$ with period $2\pi$ can be written as:

$$f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty} \left(a_n\cos(nx) + b_n\sin(nx)\right).$$

The Fourier coefficients are:

$$a_n = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\cos(nx) dx, \qquad b_n = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\sin(nx) dx.$$

The $a_0/2$ term is the average value of $f$ over one period ($a_0 = (1/\pi)\int_{-\pi}^{\pi} f(x) dx$, so $a_0/2$ is the mean).

Example: the square wave. Define $f(x) = 1$ for $0 < x < \pi$ and $f(x) = -1$ for $-\pi < x < 0$, extended periodically. This is an odd function (antisymmetric: $f(-x) = -f(x)$), so all cosine coefficients vanish: $a_n = 0$ (cosine is even, odd times even integrates to zero).

$$b_n = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\sin(nx) dx = \frac{2}{\pi}\int_0^{\pi} \sin(nx) dx = \frac{2}{\pi}\left[-\frac{\cos(nx)}{n}\right]_0^{\pi} = \frac{2}{n\pi}(1 - \cos(n\pi)).$$

When $n$ is even: $\cos(n\pi) = 1$, so $b_n = 0$. When $n$ is odd: $\cos(n\pi) = -1$, so $b_n = 4/(n\pi)$.

$$f(x) = \frac{4}{\pi}\left(\sin x + \frac{\sin 3x}{3} + \frac{\sin 5x}{5} + \frac{\sin 7x}{7} + \cdots\right).$$

The square wave is built entirely from odd harmonics with amplitudes decaying as $1/n$. Adding more terms sharpens the approximation, but near the jump discontinuity something surprising happens - more on this in Section 8.

Example: the sawtooth wave. $f(x) = x$ on $(-\pi, \pi)$, extended periodically. Also odd, so $a_n = 0$.

$$b_n = \frac{1}{\pi}\int_{-\pi}^{\pi} x\sin(nx) dx = \frac{2(-1)^{n+1}}{n}.$$

$$f(x) = 2\left(\sin x - \frac{\sin 2x}{2} + \frac{\sin 3x}{3} - \cdots\right).$$

All harmonics appear, with amplitudes decaying as $1/n$ and alternating signs. These are the standard examples to build intuition with.


Section 6: Complex Form - Cleaner Notation

The real Fourier series separates cosines and sines. Using Euler’s formula $e^{inx} = \cos(nx) + i\sin(nx)$, we can combine them:

$$\cos(nx) = \frac{e^{inx} + e^{-inx}}{2}, \qquad \sin(nx) = \frac{e^{inx} - e^{-inx}}{2i}.$$

Substituting into the Fourier series and rearranging, all the real and imaginary pieces combine into:

$$f(x) = \sum_{n=-\infty}^{\infty} c_n e^{inx}$$

where the sum runs over all integers (positive, negative, and zero). The coefficients are:

$$c_n = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)e^{-inx} dx.$$

The complex coefficients relate to the real ones by $c_n = (a_n - ib_n)/2$ for $n > 0$, $c_{-n} = (a_n + ib_n)/2$ for $n > 0$, and $c_0 = a_0/2$. For real $f$: $c_{-n} = \overline{c_n}$.

The complex form is more compact and often easier to compute with. The complex exponentials $\{e^{inx}\}$ are orthonormal under $\langle f, g\rangle = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)\overline{g(x)} dx$:

$$\frac{1}{2\pi}\int_{-\pi}^{\pi} e^{imx}\overline{e^{inx}} dx = \frac{1}{2\pi}\int_{-\pi}^{\pi} e^{i(m-n)x} dx = \delta_{mn}.$$

This is the same orthogonality, repackaged.

Generalization to period $T$. For a function periodic with period $T$, let $\omega_0 = 2\pi/T$ be the fundamental frequency. The Fourier series is:

$$f(t) = \sum_{n=-\infty}^{\infty} c_n e^{in\omega_0 t}, \qquad c_n = \frac{1}{T}\int_0^T f(t)e^{-in\omega_0 t} dt.$$

The frequencies present are $0, \pm\omega_0, \pm 2\omega_0, \pm 3\omega_0, \ldots$ - discrete multiples of the fundamental.


Section 7: The Fourier Transform for Non-Periodic Functions

What if the function is not periodic at all?

Consider a function with period $T$. As $T \to \infty$, the fundamental frequency $\omega_0 = 2\pi/T \to 0$, and the discrete frequencies $n\omega_0$ become densely packed. The sum $\sum c_n e^{in\omega_0 t}$ becomes an integral over a continuous variable $\omega$. Carrying through this limit precisely:

$$F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i\omega t} dt.$$

This is the Fourier transform of $f(t)$. The output $F(\omega)$ is a complex-valued function of frequency $\omega$.

The inverse: if you know $F(\omega)$, you can recover $f(t)$:

$$f(t) = \frac{1}{2\pi}\int_{-\infty}^{\infty} F(\omega) e^{i\omega t} d\omega.$$

The pair $(f, F)$ are Fourier transform partners. Each determines the other. $F(\omega)$ tells you how much of frequency $\omega$ is present in $f(t)$: specifically, $|F(\omega)|$ is the amplitude of the sinusoidal component at frequency $\omega$, and $\arg(F(\omega))$ is its phase.

Discomfort check. What does “frequency domain” mean physically? A signal $f(t)$ describes amplitude as a function of time. The Fourier transform $F(\omega)$ describes the same signal as a function of frequency $\omega$ (radians per second). $|F(\omega)|^2$ is the power spectral density - how much energy the signal carries at each frequency. A pure tone $f(t) = \cos(\omega_0 t)$ has a Fourier transform that is nonzero only at $\omega = \pm\omega_0$ (in the idealized sense of a delta function). A broadband signal like white noise has $|F(\omega)|^2$ roughly constant across all frequencies. The Fourier transform reveals the frequency structure of the signal.


Section 8: Key Properties and Their Meaning

The Fourier transform is a machine with predictable behavior. Each property of the machine translates an operation in one domain into an operation in the other.

Linearity. $\mathcal{F}\{af + bg\} = a\mathcal{F}\{f\} + b\mathcal{F}\{g\}$. Fourier respects superposition.

Time shift. $\mathcal{F}\{f(t - t_0)\}(\omega) = e^{-i\omega t_0} F(\omega)$. Shifting in time multiplies the transform by $e^{-i\omega t_0}$, changing the phase of each frequency component but not the amplitude. A delayed signal has the same frequency content but different phase.

Frequency shift (modulation). $\mathcal{F}\{e^{i\omega_0 t} f(t)\}(\omega) = F(\omega - \omega_0)$. Multiplying by a complex exponential in time shifts the spectrum in frequency. This is AM modulation: multiplying by $\cos(\omega_c t) = (e^{i\omega_c t} + e^{-i\omega_c t})/2$ shifts the spectrum to be centered at $\pm\omega_c$, allowing different signals to occupy different frequency bands without interference.

Differentiation. $\mathcal{F}\{f'(t)\}(\omega) = i\omega F(\omega)$. Differentiation in time becomes multiplication by $i\omega$ in frequency. This is why the Fourier transform turns differential equations into algebraic ones: $f'' + \omega^2 f = g$ becomes $-\omega^2 F + \omega^2 F = G$, which is solved immediately.

Scaling. $\mathcal{F}\{f(at)\}(\omega) = \frac{1}{|a|}F(\omega/a)$. Compressing in time ($a > 1$) stretches in frequency. This is the uncertainty principle in embryonic form: you cannot simultaneously make a function narrow in both time and frequency.

Convolution theorem. The convolution of $f$ and $g$ is:

$$(f * g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) d\tau.$$

The convolution theorem states:

$$\mathcal{F}\{f * g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{g\}.$$

Convolution in the time domain becomes pointwise multiplication in the frequency domain. This is the most important property. Convolution is a global operation ($O(N^2)$ for length-$N$ sequences). Multiplication is local and cheap. By transforming to the frequency domain, multiplying, and transforming back, you compute convolution in $O(N\log N)$ - this is why FFT-based convolution is used in audio processing, neural networks, and image filtering.

Parseval’s theorem. The Fourier transform preserves total energy:

$$\int_{-\infty}^{\infty} |f(t)|^2 dt = \frac{1}{2\pi}\int_{-\infty}^{\infty} |F(\omega)|^2 d\omega.$$

The energy in the time-domain representation equals the energy in the frequency-domain representation. The Fourier transform is an isometry of $L^2$ - it preserves the norm. Geometrically, it is a unitary transformation of the infinite-dimensional function space $L^2(\mathbb{R})$.


Section 9: The Discrete Fourier Transform

Computers operate on finite sequences, not continuous functions. For a sequence $x[0], x[1], \ldots, x[N-1]$, the Discrete Fourier Transform (DFT) is:

$$X[k] = \sum_{n=0}^{N-1} x[n] e^{-i2\pi kn/N}, \qquad k = 0, 1, \ldots, N-1.$$

The inverse DFT recovers the sequence:

$$x[n] = \frac{1}{N}\sum_{k=0}^{N-1} X[k] e^{i2\pi kn/N}.$$

The DFT takes $N$ complex numbers in, returns $N$ complex numbers. $X[k]$ measures how much of the $k$-th discrete frequency is present. The discrete frequencies correspond to oscillations with periods $N, N/2, N/3, \ldots, 1$ samples.

All properties of the continuous Fourier transform have DFT analogs: shift (circular shift of the sequence corresponds to phase multiplication), convolution (circular convolution in the sequence corresponds to pointwise multiplication in the DFT), and Parseval ($\sum |x[n]|^2 = (1/N) \sum |X[k]|^2$).

The DFT matrix. Write $\omega_N = e^{-i2\pi/N}$. The DFT can be written as a matrix multiplication:

$$\begin{pmatrix} X[0] \\ X[1] \\ \vdots \\ X[N-1] \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega_N & \omega_N^2 & \cdots & \omega_N^{N-1} \\ \vdots & & & & \vdots \\ 1 & \omega_N^{N-1} & \omega_N^{2(N-1)} & \cdots & \omega_N^{(N-1)^2} \end{pmatrix} \begin{pmatrix} x[0] \\ x[1] \\ \vdots \\ x[N-1] \end{pmatrix}.$$

Naively, this matrix-vector product requires $O(N^2)$ multiplications.


Section 10: The DFT and Circular Convolution

The DFT has a subtle but important property: it treats the input sequence as periodic with period $N$. The sample $x[N]$ is identified with $x[0]$, $x[N+1]$ with $x[1]$, and so on.

This means the convolution that corresponds to multiplication in the DFT domain is circular convolution, not the linear convolution you might expect:

$$(x \circledast y)[n] = \sum_{k=0}^{N-1} x[k] y[(n-k) \bmod N].$$

In circular convolution, the sequence wraps around at the boundaries. This is different from linear convolution $\sum_k x[k]y[n-k]$ (which is finite and does not wrap).

Why this matters in practice. If you want to use the DFT (and FFT) to compute linear convolution, you must zero-pad: append enough zeros to both $x$ and $y$ to prevent wrap-around aliasing. If $x$ has length $L$ and $y$ has length $M$, their linear convolution has length $L + M - 1$. Zero-pad both to length $N \geq L + M - 1$ before computing the DFT. The circular convolution of the zero-padded sequences equals the linear convolution.

This zero-padding requirement is the reason FFT-based convolution (which computes linear convolution via DFT) has complexity $O(N\log N)$ rather than $O(N)$: you must pad to the next power of 2 (for efficient FFT), compute, then trim. But for large $N$, the $O(N\log N)$ cost still beats the $O(N^2)$ direct convolution.


Section 11: The Fast Fourier Transform

The Fast Fourier Transform (FFT) computes the DFT in $O(N\log N)$ rather than $O(N^2)$ by exploiting structure in the DFT matrix.

The key observation (Cooley-Tukey, 1965): the DFT of a length-$N$ sequence can be computed from the DFTs of its even-indexed and odd-indexed subsequences, each of length $N/2$.

Write:

$$X[k] = \sum_{n=0}^{N-1} x[n]\omega_N^{kn} = \sum_{n=0}^{N/2-1} x[2n]\omega_N^{2kn} + \omega_N^k \sum_{n=0}^{N/2-1} x[2n+1]\omega_N^{2kn}.$$

Since $\omega_N^2 = e^{-i2\pi \cdot 2/N} = e^{-i2\pi/(N/2)} = \omega_{N/2}$:

$$X[k] = \underbrace{\sum_{n=0}^{N/2-1} x[2n]\omega_{N/2}^{kn}}_{E[k]} + \omega_N^k \underbrace{\sum_{n=0}^{N/2-1} x[2n+1]\omega_{N/2}^{kn}}_{O[k]}.$$

Both $E[k]$ and $O[k]$ are DFTs of length $N/2$. And because $E[k]$ and $O[k]$ are periodic with period $N/2$:

$$X[k] = E[k \bmod N/2] + \omega_N^k O[k \bmod N/2].$$

This gives $X[k]$ for all $k = 0, \ldots, N-1$ from the two half-size DFTs. Apply the same split recursively to $E$ and $O$. The recurrence for the number of operations is $T(N) = 2T(N/2) + O(N)$, which solves to $T(N) = O(N\log N)$.

The practical difference is enormous. For $N = 10^6$ samples: $O(N^2)$ is $10^{12}$ operations; $O(N\log N)$ is roughly $2 \times 10^7$. A difference of five orders of magnitude. The FFT is one of the most important algorithms ever devised - it transformed signal processing from a theoretical subject into a practical one.


Section 12: The Gibbs Phenomenon

Take the square wave’s Fourier series and compute partial sums:

$$S_N(x) = \frac{4}{\pi}\sum_{k=0}^{N-1} \frac{\sin((2k+1)x)}{2k+1}.$$

At points where $f$ is continuous, $S_N(x) \to f(x)$ as $N \to \infty$. But near the jump discontinuity at $x = 0$: for every $N$, the partial sum $S_N$ overshoots the correct value by approximately $9%$ of the jump height.

Specifically, $S_N$ reaches a maximum near $x = \pi/N$ of approximately $\frac{4}{\pi}\int_0^{\pi} \frac{\sin t}{t} dt \approx 1.179$, compared to the true value of $1$. An overshoot of about $17.9%$ of the half-jump (or $8.9%$ of the full jump). This $\approx 9%$ overshoot is the Gibbs phenomenon. It does not shrink as $N$ increases - it just moves closer to the discontinuity.

Why does this happen? The Fourier series of the square wave has coefficients $b_n \sim 4/(n\pi)$, decaying slowly as $1/n$. When you truncate at $N$ terms, you abruptly cut off high-frequency components. In the frequency domain, this is multiplication by a rectangular window (1 for $n \leq N$, 0 for $n > N$). Multiplication in frequency corresponds to convolution with the Fourier transform of the rectangular window - the Dirichlet kernel, which has oscillatory “ringing” near its edges. These oscillations produce the Gibbs overshoot.

Discomfort check. Is the Gibbs phenomenon a failure of Fourier analysis? No - it is a property of $L^2$ convergence near discontinuities. The series does converge: the $L^2$ error $\int |f - S_N|^2 dx \to 0$ as $N \to \infty$. But $L^2$ convergence does not imply pointwise convergence everywhere, and it does not imply uniform convergence near jumps. The Gibbs overshoot concentrates into a region of width $O(1/N)$ around the discontinuity, and that region contributes negligible $L^2$ error. The convergence is real; the jump behavior is also real.

In practice, the Gibbs phenomenon limits the usefulness of rectangular windows in signal processing. Engineers use smoother windows (Hann, Hamming, Kaiser-Bessel) that taper to zero at the edges, reducing the overshoot at the cost of slightly worse frequency resolution.


Section 13: Convergence - Three Different Things

Fourier’s original claim was that any function equals its Fourier series. The 19th century showed this requires care: “equals” can mean several different things.

Pointwise convergence. $S_N(x) \to f(x)$ for each fixed $x$. This holds at every point where $f$ is continuous. At a jump discontinuity, the series converges to the average of the left and right limits: $[f(x^-) + f(x^+)]/2$. This is called the Dirichlet condition for convergence: if $f$ is piecewise smooth (smooth except at isolated jump discontinuities), the Fourier series converges pointwise everywhere, with the average-of-limits behavior at jumps.

$L^2$ convergence (mean-square). $\int_{-\pi}^{\pi} |f(x) - S_N(x)|^2 dx \to 0$ as $N \to \infty$. This holds for all functions with finite energy ($\int|f|^2 < \infty$). The Gibbs overshoot exists pointwise but contributes zero error in $L^2$ as $N \to \infty$. This is the natural notion for energy-bearing signals.

Uniform convergence. $\max_x |f(x) - S_N(x)| \to 0$. The approximation is simultaneously good everywhere. This holds when $f$ is continuous and its Fourier coefficients decay fast enough - for instance, when $f$ is continuously differentiable. It fails for functions with jump discontinuities (the Gibbs phenomenon).

These convergences are genuinely different:

  • $L^2$ convergence does not imply pointwise convergence.
  • Pointwise convergence does not imply uniform convergence.
  • Uniform convergence implies pointwise and $L^2$ convergence.

How fast do Fourier coefficients decay? The smoothness of $f$ determines how fast $|c_n|$ decays as $|n| \to \infty$. The connection is:

  • If $f$ has a jump discontinuity: $|c_n| = O(1/n)$ (slow decay; $L^2$ convergence but not uniform).
  • If $f$ is continuous: $|c_n| = o(1/n)$ (by the Riemann-Lebesgue lemma).
  • If $f$ is $k$-times continuously differentiable: $|c_n| = O(1/n^k)$.
  • If $f$ is $C^\infty$ (infinitely differentiable): $|c_n|$ decays faster than any power of $n$.
  • If $f$ is analytic (holomorphic on a strip around the real axis): $|c_n|$ decays exponentially.

This decay rate determines both the convergence rate and the compression efficiency of Fourier representations.

Working out these distinctions precisely is what drove the development of Lebesgue integration and the foundations of modern analysis.


Section 14: Applications

Audio compression (MP3). The human auditory system is not uniformly sensitive to all frequencies, nor to all phases. MP3 encodes audio using a modified DCT (Discrete Cosine Transform, a real variant of the DFT) that decomposes short segments of audio into frequency components. More bits are allocated to frequency bands where the ear is sensitive; components masked by louder nearby frequencies receive fewer bits. The result: files roughly 10 times smaller than uncompressed audio, with little perceived quality loss. The entire scheme depends on the efficiency of frequency-domain representation.

Image compression (JPEG). An image is divided into $8 \times 8$ pixel blocks. Each block is transformed using the 2D DCT. Low-frequency coefficients (large-scale variations in brightness) are retained with high precision. High-frequency coefficients (fine texture) are quantized more aggressively. After quantization, most high-frequency coefficients are zero - the representation is sparse. JPEG2000 extends this using wavelets instead of DCT, achieving better quality especially at low bitrates.

Solving PDEs. The heat equation $\partial_t u = \partial_{xx} u$ on a periodic domain is solved by Fourier transforming in space. Each Fourier mode evolves independently: $\hat{u}(k, t) = \hat{u}(k, 0)e^{-k^2 t}$. High-frequency modes decay faster than low-frequency ones - this explains why heat smooths sharp temperature discontinuities rapidly. The spatial PDE becomes a family of ODEs, one per frequency - the entire power of Fourier’s method.

Quantum mechanics. The state of a quantum particle is a complex wave function $\psi(x)$. The probability of finding the particle between $x$ and $x + dx$ is $|\psi(x)|^2 dx$. The momentum-space wave function is $\tilde{\psi}(p) = \frac{1}{\sqrt{2\pi\hbar}}\int \psi(x)e^{-ipx/\hbar} dx$ - the Fourier transform of $\psi$. Position and momentum are Fourier transform partners. The spread in position $\Delta x$ and the spread in momentum $\Delta p$ satisfy $\Delta x \cdot \Delta p \geq \hbar/2$ - the Heisenberg uncertainty principle, which is a theorem about Fourier transforms and the uncertainty principle for functions (see Wavelet Transform post). This is not a claim about measurement disturbance; it is a mathematical property of any wave-like representation.


Section 15: Worked Calculations

Let us compute a few Fourier transforms explicitly to build familiarity.

Transform 1: The rectangular pulse. Let $f(t) = 1$ for $|t| \leq T/2$ and $f(t) = 0$ otherwise.

$$F(\omega) = \int_{-T/2}^{T/2} e^{-i\omega t} dt = \left[\frac{e^{-i\omega t}}{-i\omega}\right]_{-T/2}^{T/2} = \frac{e^{-i\omega T/2} - e^{i\omega T/2}}{-i\omega} = \frac{2\sin(\omega T/2)}{\omega} = T \text{sinc}\left(\frac{\omega T}{2\pi}\right)\cdot\frac{2\pi}{T}.$$

More cleanly: $F(\omega) = T\cdot\text{sinc}(\omega T / (2\pi))$ where $\text{sinc}(x) = \sin(\pi x)/(\pi x)$.

The rectangular pulse in time gives a sinc function in frequency. The sinc function has zeros at $\omega = 2\pi k/T$ for nonzero integers $k$. The main lobe extends from $-2\pi/T$ to $2\pi/T$ in frequency. As $T$ decreases (shorter pulse), the main lobe widens - consistent with the uncertainty principle. As $T \to 0$, the pulse becomes a delta function and $F(\omega) \to 1$ (flat spectrum).

Transform 2: The Gaussian. Let $f(t) = e^{-t^2/(2\sigma^2)}$.

$$F(\omega) = \int_{-\infty}^{\infty} e^{-t^2/(2\sigma^2)} e^{-i\omega t} dt.$$

Complete the square in the exponent: $-t^2/(2\sigma^2) - i\omega t = -(t + i\omega\sigma^2)^2/(2\sigma^2) - \omega^2\sigma^2/2$. So:

$$F(\omega) = e^{-\omega^2\sigma^2/2}\int_{-\infty}^{\infty} e^{-(t+i\omega\sigma^2)^2/(2\sigma^2)} dt = e^{-\omega^2\sigma^2/2} \cdot \sigma\sqrt{2\pi}.$$

The Fourier transform of a Gaussian is a Gaussian! The time-domain width is $\sigma$; the frequency-domain width is $1/\sigma$. Their product is $1$ (up to constants) - achieving the uncertainty bound.

Transform 3: The exponential decay. Let $f(t) = e^{-at}u(t)$ for $a > 0$ (a one-sided decaying exponential).

$$F(\omega) = \int_0^{\infty} e^{-at}e^{-i\omega t} dt = \int_0^{\infty} e^{-(a+i\omega)t} dt = \frac{1}{a + i\omega}.$$

The magnitude is $|F(\omega)| = 1/\sqrt{a^2 + \omega^2}$ - a Lorentzian shape, peaked at $\omega = 0$ and decaying as $1/|\omega|$ for large $|\omega|$. The half-power bandwidth is $2a$: faster decay (larger $a$) means broader frequency content, consistent with the uncertainty principle.


Section 16: Parseval and $L^2$ Theory

Parseval’s theorem deserves more attention than it typically receives.

The Fourier coefficients $\{c_n\}$ are coordinates of $f$ in the orthonormal basis $\{e^{inx}/\sqrt{2\pi}\}$ of $L^2([-\pi, \pi])$. Parseval’s theorem says:

$$|f|_{L^2}^2 = \sum_{n=-\infty}^{\infty} |c_n|^2.$$

This is exactly the discrete version of Parseval’s theorem, and it is the statement that the $L^2$ norm is preserved when you expand in an orthonormal basis - the Pythagorean theorem in infinite dimensions.

Completeness. Parseval’s theorem also shows that the exponentials $\{e^{inx}\}$ form a complete orthonormal system: if $\langle f, e^{inx}\rangle = 0$ for all $n$, then $|f|_{L^2} = 0$, meaning $f = 0$ (almost everywhere). No nonzero function is orthogonal to all sinusoids. The sinusoids span $L^2$.

The space $\ell^2$. The map $f \mapsto (c_n)_{n \in \mathbb{Z}}$ is an isometric isomorphism from $L^2([-\pi, \pi])$ to $\ell^2(\mathbb{Z})$ (square-summable sequences). This is Parseval’s theorem as a structural statement: Fourier analysis is just choosing coordinates in an infinite-dimensional Hilbert space, exactly as choosing a basis in $\mathbb{R}^n$.

Every fact about Fourier series can be restated in terms of coordinates in $\ell^2$, and vice versa. The abstract theory of Hilbert spaces and their bases - developed in the early 20th century - was built precisely to capture this structure.


Section 17: The Uncertainty Principle

The Fourier transform has an irreducible tension between time and frequency: a function cannot be simultaneously narrow in both time and frequency.

Theorem. For any function $f$ (with suitable integrability conditions), the time spread $\Delta t$ (standard deviation of $|f(t)|^2$) and frequency spread $\Delta\omega$ (standard deviation of $|F(\omega)|^2$) satisfy:

$$\Delta t \cdot \Delta\omega \geq \frac{1}{2}.$$

This is the Heisenberg uncertainty principle - or rather, the Heisenberg uncertainty principle is the quantum mechanical interpretation of this mathematical fact.

The extremal function - the one achieving equality - is the Gaussian $f(t) = e^{-t^2/2}$. Its Fourier transform is also a Gaussian: $F(\omega) = \sqrt{2\pi}e^{-\omega^2/2}$. The Gaussian has minimum time-bandwidth product. This is why Gaussian windows and Gaussian envelopes appear so often in optimal signal processing.

The uncertainty principle has a signal processing interpretation: if you want to know what frequencies are present in a short burst of signal, you cannot know them precisely - the brevity of the burst spreads its energy across a range of frequencies. If you want precise frequency information, you must observe for a long time. This tradeoff is fundamental and inescapable.


Summary

Concept Content
Why sinusoids? Eigenfunctions of $d/dx$; each frequency evolves independently in linear systems
Inner product on functions $\langle f, g\rangle = \int_{-\pi}^{\pi} f(x)g(x) dx$; functions form a vector space
Orthogonality $\int \cos(mx)\cos(nx) dx = \pi\delta_{mn}$; sinusoids are orthogonal basis elements
Fourier coefficients Projection onto basis elements: $a_n = (1/\pi)\int f\cos(nx) dx$
Fourier series $f(x) = a_0/2 + \sum (a_n\cos nx + b_n\sin nx)$; works for periodic $f$
Complex form $f(x) = \sum c_n e^{inx}$; $c_n = \frac{1}{2\pi}\int f(x)e^{-inx} dx$
Fourier transform $F(\omega) = \int f(t)e^{-i\omega t} dt$; continuous frequency representation
Inverse transform $f(t) = \frac{1}{2\pi}\int F(\omega)e^{i\omega t} d\omega$
Convolution theorem $\mathcal{F}\{f*g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{g\}$; turns convolution into multiplication
Parseval $\int |f|^2 dt = \frac{1}{2\pi}\int |F|^2 d\omega$; energy preserved
DFT $X[k] = \sum_n x[n]e^{-i2\pi kn/N}$; discrete version for finite sequences
FFT DFT in $O(N\log N)$ by divide-and-conquer; one of the most important algorithms
Gibbs phenomenon $\approx 9%$ overshoot near jumps; persists; consequence of $L^2$ vs. pointwise convergence
$L^2$ convergence Energy error goes to zero; natural for signals; does not imply pointwise
Uncertainty principle $\Delta t \cdot \Delta\omega \geq 1/2$; cannot concentrate in both time and frequency

Carrying Intuition Forward

Three intuitions from this post belong in your permanent toolkit.

Frequency decomposition is basis change. The Fourier transform is not a mysterious operation - it is a change of basis in a function space. The basis elements are the complex exponentials $e^{i\omega t}$. Fourier coefficients are inner products (projections). Everything that is true about change of basis in $\mathbb{R}^n$ - like Parseval’s theorem as the Pythagorean theorem - has a Fourier analog.

Convolution becomes multiplication. Any time you see a linear, translation-invariant operation in the time domain - filtering, blurring, correlation, moving average - you can convert it to pointwise multiplication in the frequency domain via the Fourier transform. This is the fundamental computational trick underlying signal processing, image processing, and the efficient training of neural networks with convolutional layers.

Frequency tells you what; wavelets tell you when. The Fourier transform resolves what frequencies are present. The short-time Fourier transform (spectrogram) gives both time and frequency, but with fixed resolution. Wavelets, covered in the next post, give adaptive resolution - coarse frequency analysis for low-frequency content, fine time analysis for high-frequency transients. Understanding Fourier first makes the limitation and the wavelet solution both clear.


Read next: