Measure Theory
Prerequisite:
Why Measure Theory?
The Riemann integral, familiar from calculus, is built by partitioning the domain into intervals and summing rectangles. This construction works well for continuous functions but fails in surprising ways. Consider the indicator function of the rationals, $f(x) = \mathbf{1}_{\mathbb{Q}}(x)$, which equals 1 on rationals and 0 on irrationals. Every partition interval contains both rationals and irrationals, so the upper Riemann sum is always 1 and the lower sum is always 0. The Riemann integral does not exist, yet intuitively $f$ should integrate to 0 since the rationals are a negligible set.
Measure theory resolves this by partitioning the range rather than the domain, measuring the pre-image of each value. This produces the Lebesgue integral, which handles a vastly larger class of functions and supports interchange of limits and integrals under mild conditions that Riemann theory cannot guarantee.
$\sigma$-Algebras
Definition. Let $\Omega$ be a set. A collection $\mathcal{F} \subseteq 2^\Omega$ is a $\sigma$-algebra if:
- $\Omega \in \mathcal{F}$,
- $A \in \mathcal{F} \Rightarrow A^c \in \mathcal{F}$ (closed under complements),
- $A_1, A_2, \ldots \in \mathcal{F} \Rightarrow \bigcup_{n=1}^\infty A_n \in \mathcal{F}$ (closed under countable unions).
The pair $(\Omega, \mathcal{F})$ is called a measurable space. Elements of $\mathcal{F}$ are called measurable sets.
The smallest $\sigma$-algebra on $\mathbb{R}$ containing all open sets is the Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R})$. It contains open sets, closed sets, countable intersections of open sets ($G_\delta$ sets), countable unions of closed sets ($F_\sigma$ sets), and much more. Nearly every set encountered in analysis is Borel measurable.
Measures
Definition. A measure on $(\Omega, \mathcal{F})$ is a function $\mu: \mathcal{F} \to [0, \infty]$ satisfying:
- $\mu(\emptyset) = 0$,
- Countable additivity: for pairwise disjoint $A_1, A_2, \ldots \in \mathcal{F}$,
$$\mu!\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n).$$
The triple $(\Omega, \mathcal{F}, \mu)$ is a measure space. If $\mu(\Omega) = 1$ then $\mu$ is a probability measure, conventionally written $P$, and $(\Omega, \mathcal{F}, P)$ is a probability space. The Lebesgue measure $\lambda$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ assigns to each interval its length: $\lambda([a,b]) = b - a$.
Important consequences of countable additivity include continuity from below: if $A_n \uparrow A$ (i.e., $A_n \subseteq A_{n+1}$ and $\bigcup A_n = A$), then $\mu(A_n) \uparrow \mu(A)$; and continuity from above: if $A_n \downarrow A$ and $\mu(A_1) < \infty$, then $\mu(A_n) \downarrow \mu(A)$.
Measurable Functions
Definition. Let $(\Omega, \mathcal{F})$ and $(\Omega', \mathcal{F}')$ be measurable spaces. A function $f: \Omega \to \Omega'$ is measurable if for every $B \in \mathcal{F}'$, the pre-image $f^{-1}(B) \in \mathcal{F}$.
For real-valued functions $f: \Omega \to \mathbb{R}$, measurability with respect to $\mathcal{B}(\mathbb{R})$ is equivalent to ${x : f(x) > a} \in \mathcal{F}$ for all $a \in \mathbb{R}$. Continuous functions between topological spaces with their Borel $\sigma$-algebras are always measurable. In a probability space, measurable functions are called random variables.
The Lebesgue Integral
The construction proceeds in three stages.
Stage 1: Simple functions. A function $\phi: \Omega \to \mathbb{R}$ is simple if it takes finitely many values. Any non-negative simple function writes as
$$\phi = \sum_{i=1}^k a_i \mathbf{1}_{A_i}$$
where $a_i \geq 0$ and $A_i \in \mathcal{F}$ are disjoint. Its integral is
$$\int \phi , d\mu = \sum_{i=1}^k a_i \mu(A_i).$$
Stage 2: Non-negative measurable functions. For $f \geq 0$ measurable, define
$$\int f , d\mu = \sup\left{ \int \phi , d\mu : 0 \leq \phi \leq f,; \phi \text{ simple} \right}.$$
Stage 3: General measurable functions. Write $f = f^+ - f^-$ where $f^+ = \max(f, 0)$ and $f^- = \max(-f, 0)$. If at least one of $\int f^+ d\mu$, $\int f^- d\mu$ is finite, set
$$\int f , d\mu = \int f^+ , d\mu - \int f^- , d\mu.$$
If both are finite, $f$ is integrable ($f \in L^1(\mu)$).
Monotone Convergence Theorem
Theorem (MCT). Let $f_n: \Omega \to [0, \infty]$ be measurable with $f_n \uparrow f$ pointwise. Then
$$\lim_{n \to \infty} \int f_n , d\mu = \int f , d\mu.$$
Proof sketch. Since $f_n \leq f$, we have $\int f_n , d\mu \leq \int f , d\mu$ for all $n$, so $\lim_n \int f_n \leq \int f$. For the reverse inequality, fix $\epsilon \in (0,1)$ and a simple $0 \leq \phi \leq f$. Define $E_n = {x : f_n(x) \geq (1-\epsilon)\phi(x)}$. Since $f_n \uparrow f \geq \phi$, we have $E_n \uparrow \Omega$. Then
$$\int f_n , d\mu \geq \int_{E_n} f_n , d\mu \geq (1-\epsilon)\int_{E_n} \phi , d\mu.$$
By continuity of measure from below, $\int_{E_n} \phi , d\mu \to \int \phi , d\mu$, so $\lim_n \int f_n \geq (1-\epsilon)\int \phi$. Since $\epsilon$ and $\phi$ are arbitrary, $\lim_n \int f_n \geq \int f$. $\square$
Dominated Convergence Theorem
Theorem (DCT). Let $f_n \to f$ pointwise and suppose there exists $g \in L^1(\mu)$ with $|f_n| \leq g$ for all $n$. Then $f \in L^1(\mu)$ and
$$\lim_{n \to \infty} \int f_n , d\mu = \int f , d\mu.$$
The dominating function $g$ is the essential hypothesis; without it the result fails. A counterexample: $f_n = \mathbf{1}_{[n, n+1]}$ on $\mathbb{R}$ with Lebesgue measure has $f_n \to 0$ pointwise but $\int f_n , d\lambda = 1$ for all $n$.
Fubini-Tonelli Theorem
Theorem. Let $(\Omega_1, \mathcal{F}_1, \mu_1)$ and $(\Omega_2, \mathcal{F}_2, \mu_2)$ be $\sigma$-finite measure spaces and $f: \Omega_1 \times \Omega_2 \to \mathbb{R}$ measurable.
- (Tonelli) If $f \geq 0$, iterated integrals equal the product integral in any order.
- (Fubini) If $f \in L^1(\mu_1 \otimes \mu_2)$, then
$$\int f , d(\mu_1 \otimes \mu_2) = \int !!\int f(x,y) , d\mu_2(y) , d\mu_1(x) = \int !!\int f(x,y) , d\mu_1(x) , d\mu_2(y).$$
The $\sigma$-finiteness condition is essential; the theorem fails for general measures.
Radon-Nikodym Theorem
Definition. A measure $\nu$ is absolutely continuous with respect to $\mu$, written $\nu \ll \mu$, if $\mu(A) = 0 \Rightarrow \nu(A) = 0$.
Theorem (Radon-Nikodym). If $\nu \ll \mu$ and both are $\sigma$-finite, there exists a non-negative measurable function $f$ such that
$$\nu(A) = \int_A f , d\mu \quad \text{for all } A \in \mathcal{F}.$$
The function $f = d\nu/d\mu$ is the Radon-Nikodym derivative or density of $\nu$ with respect to $\mu$, unique $\mu$-almost everywhere. In probability, if $P \ll \lambda$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$, then $dP/d\lambda$ is exactly the probability density function. Radon-Nikodym thus gives rigorous grounding to the notion of a pdf and underpins the definition of conditional expectation in full generality.
Examples
Lebesgue vs Riemann on $\mathbb{Q}$. Let $f = \mathbf{1}_{\mathbb{Q}}$ on $[0,1]$. The Lebesgue integral gives
$$\int_0^1 \mathbf{1}_{\mathbb{Q}} , d\lambda = \lambda(\mathbb{Q} \cap [0,1]) = 0,$$
since $\mathbb{Q}$ is countable and $\lambda({q}) = 0$ for each rational $q$. The Riemann integral does not exist. This shows the Lebesgue integral strictly extends the Riemann integral.
Measure-theoretic probability. The strong law of large numbers states that for i.i.d. $X_1, X_2, \ldots$ with $E[|X_1|] < \infty$,
$$\frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{\text{a.s.}} E[X_1].$$
The qualifier “almost surely” means convergence on a set of probability 1 - a concept requiring a measure space. The proof uses MCT and the Borel-Cantelli lemma, both measure-theoretic tools. Without measure theory, even stating the strong law precisely is difficult.
Conditional expectation as Radon-Nikodym derivative. For a random variable $X$ on $(\Omega, \mathcal{F}, P)$ and a sub-$\sigma$-algebra $\mathcal{G} \subseteq \mathcal{F}$, the conditional expectation $E[X | \mathcal{G}]$ is defined as the unique $\mathcal{G}$-measurable function such that $\int_G E[X|\mathcal{G}] , dP = \int_G X , dP$ for all $G \in \mathcal{G}$. Existence follows from Radon-Nikodym applied to the measure $A \mapsto \int_A X , dP$ restricted to $\mathcal{G}$.
Read Next: