Random Variables
Prerequisite:
A random variable is not random, and it is not a variable in the algebraic sense. It is a function - a precise, deterministic mapping from outcomes to numbers. This formalism is what lets us do calculus with uncertainty.
Formal Definition
Let $(\Omega, \mathcal{F}, P)$ be a probability space, where $\Omega$ is the sample space, $\mathcal{F}$ is a $\sigma$-algebra of events, and $P$ is a probability measure. A random variable is a measurable function
$$X: \Omega \to \mathbb{R}$$
such that for every Borel set $B \subseteq \mathbb{R}$, the preimage $X^{-1}(B) = {\omega \in \Omega : X(\omega) \in B} \in \mathcal{F}$.
The measurability condition ensures that $P(X \in B)$ is well-defined for all reasonable sets $B$. In practice, “reasonable” means Borel sets - the $\sigma$-algebra generated by open intervals.
Discrete vs Continuous Random Variables
A random variable $X$ is discrete if its range is countable. The distribution is characterised by the probability mass function (PMF):
$$p_X(x) = P(X = x), \quad \sum_{x} p_X(x) = 1$$
A random variable $X$ is continuous if there exists a non-negative function $f_X: \mathbb{R} \to [0, \infty)$, the probability density function (PDF), such that for all $a \leq b$:
$$P(a \leq X \leq b) = \int_a^b f_X(x), dx, \quad \int_{-\infty}^{\infty} f_X(x), dx = 1$$
Note that $f_X(x)$ is not a probability - it can exceed 1. Only integrals of $f_X$ yield probabilities.
The Cumulative Distribution Function
The CDF of a random variable $X$ is defined for all $x \in \mathbb{R}$ as:
$$F_X(x) = P(X \leq x)$$
Every CDF satisfies four properties:
- Monotonicity: $x \leq y \Rightarrow F_X(x) \leq F_X(y)$
- Right-continuity: $\lim_{h \downarrow 0} F_X(x+h) = F_X(x)$
- Left limit: $\lim_{x \to -\infty} F_X(x) = 0$
- Right limit: $\lim_{x \to \infty} F_X(x) = 1$
For continuous RVs, $f_X(x) = F_X'(x)$ wherever the derivative exists. For discrete RVs, $F_X$ is a staircase function with jumps of size $p_X(x)$ at each mass point.
Indicator Random Variables
For any event $A \in \mathcal{F}$, the indicator random variable is:
$$\mathbf{1}_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A \ 0 & \text{if } \omega \notin A \end{cases}$$
Indicators are deceptively powerful. They convert set operations into algebra: $\mathbf{1}_{A \cap B} = \mathbf{1}_A \cdot \mathbf{1}B$, and $\mathbf{1}{A \cup B} = \mathbf{1}_A + \mathbf{1}_B - \mathbf{1}_A \cdot \mathbf{1}_B$. Their expectation equals the probability of the event: $E[\mathbf{1}_A] = P(A)$.
Transformations of Random Variables
If $Y = g(X)$ and $g$ is strictly monotone and differentiable, the change of variables formula gives:
$$f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|$$
The Jacobian factor $|(g^{-1})'(y)|$ corrects for how $g$ stretches or compresses the density. For example, if $X \sim \text{Uniform}(0,1)$ and $Y = -\log X$, then $g^{-1}(y) = e^{-y}$ and $|(g^{-1})'(y)| = e^{-y}$, giving $f_Y(y) = e^{-y}$ for $y > 0$ - an Exponential$(1)$ distribution.
For non-monotone $g$, sum over all branches: $f_Y(y) = \sum_k f_X(g_k^{-1}(y)) |(g_k^{-1})'(y)|$.
Common Discrete Distributions
Bernoulli$(p)$: Models a single trial with success probability $p \in [0,1]$. $$P(X = 1) = p, \quad P(X = 0) = 1-p, \quad E[X] = p, \quad \text{Var}(X) = p(1-p)$$
Binomial$(n,p)$: Number of successes in $n$ independent Bernoulli$(p)$ trials. $$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0,1,\ldots,n$$ $E[X] = np$, $\text{Var}(X) = np(1-p)$.
Geometric$(p)$: Number of trials until the first success. $$P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots$$ $E[X] = 1/p$, $\text{Var}(X) = (1-p)/p^2$.
Poisson$(\lambda)$: Limit of Binomial$(n, \lambda/n)$ as $n \to \infty$.
Derivation: With $p = \lambda/n$: $$\binom{n}{k}\left(\frac{\lambda}{n}\right)^k!\left(1-\frac{\lambda}{n}\right)^{n-k} = \frac{n(n-1)\cdots(n-k+1)}{k!, n^k},\lambda^k\left(1-\frac{\lambda}{n}\right)^n!\left(1-\frac{\lambda}{n}\right)^{-k}$$
As $n \to \infty$: the first factor $\to 1$, $(1-\lambda/n)^n \to e^{-\lambda}$, and $(1-\lambda/n)^{-k} \to 1$. Therefore: $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots$$ $E[X] = \lambda$, $\text{Var}(X) = \lambda$.
Common Continuous Distributions
Uniform$(a,b)$: Equal density on an interval. $$f_X(x) = \frac{1}{b-a}, \quad a \leq x \leq b$$ $E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$.
Exponential$(\lambda)$: Time between Poisson events. $$f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0$$ $E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$.
Memoryless property: For $s, t \geq 0$, $$P(X > s + t \mid X > s) = \frac{P(X > s+t)}{P(X > s)} = \frac{e^{-\lambda(s+t)}}{e^{-\lambda s}} = e^{-\lambda t} = P(X > t)$$
The exponential distribution is the unique continuous distribution with this property.
Normal $\mathcal{N}(\mu, \sigma^2)$: $$f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
$E[X] = \mu$, $\text{Var}(X) = \sigma^2$. The standard normal has $\mu = 0$, $\sigma^2 = 1$, written $Z \sim \mathcal{N}(0,1)$. If $X \sim \mathcal{N}(\mu, \sigma^2)$, then $Z = (X - \mu)/\sigma \sim \mathcal{N}(0,1)$.
The normal’s ubiquity is explained by the Central Limit Theorem: sums of independent, finite-variance random variables converge in distribution to a normal, regardless of the original distribution.
Read Next: