Prerequisite:


A random variable is not random, and it is not a variable in the algebraic sense. It is a function - a precise, deterministic mapping from outcomes to numbers. This formalism is what lets us do calculus with uncertainty.

Formal Definition

Let $(\Omega, \mathcal{F}, P)$ be a probability space, where $\Omega$ is the sample space, $\mathcal{F}$ is a $\sigma$-algebra of events, and $P$ is a probability measure. A random variable is a measurable function

$$X: \Omega \to \mathbb{R}$$

such that for every Borel set $B \subseteq \mathbb{R}$, the preimage $X^{-1}(B) = {\omega \in \Omega : X(\omega) \in B} \in \mathcal{F}$.

The measurability condition ensures that $P(X \in B)$ is well-defined for all reasonable sets $B$. In practice, “reasonable” means Borel sets - the $\sigma$-algebra generated by open intervals.

Discrete vs Continuous Random Variables

A random variable $X$ is discrete if its range is countable. The distribution is characterised by the probability mass function (PMF):

$$p_X(x) = P(X = x), \quad \sum_{x} p_X(x) = 1$$

A random variable $X$ is continuous if there exists a non-negative function $f_X: \mathbb{R} \to [0, \infty)$, the probability density function (PDF), such that for all $a \leq b$:

$$P(a \leq X \leq b) = \int_a^b f_X(x), dx, \quad \int_{-\infty}^{\infty} f_X(x), dx = 1$$

Note that $f_X(x)$ is not a probability - it can exceed 1. Only integrals of $f_X$ yield probabilities.

The Cumulative Distribution Function

The CDF of a random variable $X$ is defined for all $x \in \mathbb{R}$ as:

$$F_X(x) = P(X \leq x)$$

Every CDF satisfies four properties:

  1. Monotonicity: $x \leq y \Rightarrow F_X(x) \leq F_X(y)$
  2. Right-continuity: $\lim_{h \downarrow 0} F_X(x+h) = F_X(x)$
  3. Left limit: $\lim_{x \to -\infty} F_X(x) = 0$
  4. Right limit: $\lim_{x \to \infty} F_X(x) = 1$

For continuous RVs, $f_X(x) = F_X'(x)$ wherever the derivative exists. For discrete RVs, $F_X$ is a staircase function with jumps of size $p_X(x)$ at each mass point.

Indicator Random Variables

For any event $A \in \mathcal{F}$, the indicator random variable is:

$$\mathbf{1}_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A \ 0 & \text{if } \omega \notin A \end{cases}$$

Indicators are deceptively powerful. They convert set operations into algebra: $\mathbf{1}_{A \cap B} = \mathbf{1}_A \cdot \mathbf{1}B$, and $\mathbf{1}{A \cup B} = \mathbf{1}_A + \mathbf{1}_B - \mathbf{1}_A \cdot \mathbf{1}_B$. Their expectation equals the probability of the event: $E[\mathbf{1}_A] = P(A)$.

Transformations of Random Variables

If $Y = g(X)$ and $g$ is strictly monotone and differentiable, the change of variables formula gives:

$$f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|$$

The Jacobian factor $|(g^{-1})'(y)|$ corrects for how $g$ stretches or compresses the density. For example, if $X \sim \text{Uniform}(0,1)$ and $Y = -\log X$, then $g^{-1}(y) = e^{-y}$ and $|(g^{-1})'(y)| = e^{-y}$, giving $f_Y(y) = e^{-y}$ for $y > 0$ - an Exponential$(1)$ distribution.

For non-monotone $g$, sum over all branches: $f_Y(y) = \sum_k f_X(g_k^{-1}(y)) |(g_k^{-1})'(y)|$.

Common Discrete Distributions

Bernoulli$(p)$: Models a single trial with success probability $p \in [0,1]$. $$P(X = 1) = p, \quad P(X = 0) = 1-p, \quad E[X] = p, \quad \text{Var}(X) = p(1-p)$$

Binomial$(n,p)$: Number of successes in $n$ independent Bernoulli$(p)$ trials. $$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0,1,\ldots,n$$ $E[X] = np$, $\text{Var}(X) = np(1-p)$.

Geometric$(p)$: Number of trials until the first success. $$P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots$$ $E[X] = 1/p$, $\text{Var}(X) = (1-p)/p^2$.

Poisson$(\lambda)$: Limit of Binomial$(n, \lambda/n)$ as $n \to \infty$.

Derivation: With $p = \lambda/n$: $$\binom{n}{k}\left(\frac{\lambda}{n}\right)^k!\left(1-\frac{\lambda}{n}\right)^{n-k} = \frac{n(n-1)\cdots(n-k+1)}{k!, n^k},\lambda^k\left(1-\frac{\lambda}{n}\right)^n!\left(1-\frac{\lambda}{n}\right)^{-k}$$

As $n \to \infty$: the first factor $\to 1$, $(1-\lambda/n)^n \to e^{-\lambda}$, and $(1-\lambda/n)^{-k} \to 1$. Therefore: $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots$$ $E[X] = \lambda$, $\text{Var}(X) = \lambda$.

Common Continuous Distributions

Uniform$(a,b)$: Equal density on an interval. $$f_X(x) = \frac{1}{b-a}, \quad a \leq x \leq b$$ $E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$.

Exponential$(\lambda)$: Time between Poisson events. $$f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0$$ $E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$.

Memoryless property: For $s, t \geq 0$, $$P(X > s + t \mid X > s) = \frac{P(X > s+t)}{P(X > s)} = \frac{e^{-\lambda(s+t)}}{e^{-\lambda s}} = e^{-\lambda t} = P(X > t)$$

The exponential distribution is the unique continuous distribution with this property.

Normal $\mathcal{N}(\mu, \sigma^2)$: $$f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

$E[X] = \mu$, $\text{Var}(X) = \sigma^2$. The standard normal has $\mu = 0$, $\sigma^2 = 1$, written $Z \sim \mathcal{N}(0,1)$. If $X \sim \mathcal{N}(\mu, \sigma^2)$, then $Z = (X - \mu)/\sigma \sim \mathcal{N}(0,1)$.

The normal’s ubiquity is explained by the Central Limit Theorem: sums of independent, finite-variance random variables converge in distribution to a normal, regardless of the original distribution.


Read Next: