Prerequisite: Matrix Math

Overview

The Kalman filter is an optimal recursive Bayesian estimator for linear dynamical systems with Gaussian noise. Given a sequence of noisy measurements, it maintains a probability distribution over the hidden state and refines that distribution as each new measurement arrives. The elegance of the algorithm is that this distribution remains Gaussian throughout, so it can be represented exactly with just a mean vector and a covariance matrix - no sampling, no numerical integration.

It was developed by Rudolf Kálmán in 1960 and first deployed in the Apollo Guidance Computer. Today it underpins GPS/INS sensor fusion, radar tracking, drone stabilisation, and financial time-series smoothing.

The State Space Model

The Kalman filter assumes two linear models: a transition model describing how the hidden state evolves, and an observation model describing how measurements are generated from the state.

$$x_t = F x_{t-1} + B u_t + w_t, \quad w_t \sim \mathcal{N}(0, Q)$$

$$z_t = H x_t + v_t, \quad v_t \sim \mathcal{N}(0, R)$$

Here $x_t \in \mathbb{R}^n$ is the hidden state, $z_t \in \mathbb{R}^m$ is the observation, $u_t$ is an optional control input, $F$ is the state-transition matrix, $H$ is the observation matrix, $Q$ is the process noise covariance, and $R$ is the measurement noise covariance. Both noise terms are independent of each other and of the state.

Predict and Update

Starting from a Gaussian belief $\hat{x}{t-1|t-1}$, $P{t-1|t-1}$ at time $t-1$, the filter cycles through two steps.

Predict Step

$$\hat{x}{t|t-1} = F\hat{x}{t-1|t-1} + B u_t$$

$$P_{t|t-1} = F P_{t-1|t-1} F^T + Q$$

This propagates the prior distribution forward through the dynamics. The covariance grows because process noise $Q$ adds uncertainty.

Update Step

When measurement $z_t$ arrives, we compute the Kalman gain:

$$K_t = P_{t|t-1} H^T \left(H P_{t|t-1} H^T + R\right)^{-1}$$

$K_t$ balances two sources of information: the predicted state (trusted more when $R$ is large) and the measurement (trusted more when $P_{t|t-1}$ is large).

The updated state estimate and covariance are:

$$\hat{x}{t|t} = \hat{x}{t|t-1} + K_t \underbrace{\left(z_t - H\hat{x}{t|t-1}\right)}{\text{innovation}}$$

$$P_{t|t} = (I - K_t H) P_{t|t-1}$$

The innovation $z_t - H\hat{x}_{t|t-1}$ is the residual between the observed and predicted measurement. The update step is equivalent to conditioning the joint Gaussian $(x_t, z_t)$ on $z_t$.

Derivation as MAP/MMSE Estimator

The Kalman filter is simultaneously the Maximum A Posteriori (MAP) and Minimum Mean Squared Error (MMSE) estimator for linear-Gaussian systems.

The joint distribution of state and measurement is:

$$\begin{pmatrix} x_t \ z_t \end{pmatrix} \sim \mathcal{N}!\left( \begin{pmatrix} \hat{x}{t|t-1} \ H\hat{x}{t|t-1} \end{pmatrix},; \begin{pmatrix} P_{t|t-1} & P_{t|t-1}H^T \ HP_{t|t-1} & HP_{t|t-1}H^T + R \end{pmatrix} \right)$$

Applying the Gaussian conditional formula $p(x | z) = \mathcal{N}(\mu_x + \Sigma_{xz}\Sigma_{zz}^{-1}(z - \mu_z),; \Sigma_{xx} - \Sigma_{xz}\Sigma_{zz}^{-1}\Sigma_{zx})$ immediately yields the update equations above. Since the posterior is Gaussian, MAP $=$ mean $=$ MMSE estimate.

Steady-State Kalman Filter

For time-invariant systems ($F$, $H$, $Q$, $R$ constant), the Riccati recursion for $P_{t|t-1}$ converges to a fixed point $P_\infty$ satisfying the discrete algebraic Riccati equation:

$$P_\infty = F\left(P_\infty - P_\infty H^T(HP_\infty H^T + R)^{-1} H P_\infty\right)F^T + Q$$

Once $P_\infty$ is known, the steady-state Kalman gain $K_\infty = P_\infty H^T(HP_\infty H^T + R)^{-1}$ is a fixed matrix, and the update step requires only matrix-vector products - no matrix inversion at runtime. This is the form used in embedded systems and real-time controllers.

Extended Kalman Filter

When the dynamics or observation model are nonlinear - $x_t = f(x_{t-1}, u_t) + w_t$ and $z_t = h(x_t) + v_t$ - the EKF linearises around the current estimate using Jacobians:

$$F_t = \frac{\partial f}{\partial x}\bigg|{\hat{x}{t-1|t-1}}, \quad H_t = \frac{\partial h}{\partial x}\bigg|{\hat{x}{t|t-1}}$$

The same predict-update equations apply with $F_t$ and $H_t$ replacing the constant matrices. The EKF is a first-order approximation and can diverge if nonlinearities are severe.

Unscented Kalman Filter

The Unscented Kalman Filter avoids linearisation by propagating a deterministic set of sigma points through the true nonlinear function. Given state mean $\hat{x}$ and covariance $P$, the $2n+1$ sigma points are:

$$\mathcal{X}_0 = \hat{x}, \quad \mathcal{X}_i = \hat{x} \pm \left(\sqrt{(n+\lambda)P}\right)_i, \quad i = 1,\ldots,n$$

where $\lambda$ is a scaling parameter. The transformed points are recombined with weights to yield the predicted mean and covariance. The UKF captures mean and covariance accurately to third order for Gaussian inputs, at the same $O(n^3)$ cost as the EKF but with substantially better accuracy on mildly nonlinear systems.

Particle Filter

For highly nonlinear or non-Gaussian systems, particle filters represent the belief as a weighted set of $N$ samples (particles) ${x_t^{(i)}, w_t^{(i)}}_{i=1}^N$. Each particle is propagated through the dynamics, then reweighted by the likelihood of the observation:

$$w_t^{(i)} \propto p(z_t | x_t^{(i)}) , w_{t-1}^{(i)}$$

Periodic resampling discards low-weight particles. The particle filter converges to the true posterior as $N \to \infty$ but scales poorly with state dimension - the curse of dimensionality means you need exponentially many particles in high dimensions.

Complexity

Method Per-step cost Storage
Kalman filter $O(n^3 + m^3)$ $O(n^2)$
Steady-state KF $O(n^2)$ $O(n^2)$
EKF / UKF $O(n^3)$ $O(n^2)$
Particle filter $O(N \cdot n)$ $O(Nn)$

Examples

GPS + IMU sensor fusion. GPS provides absolute position at low frequency (1–10 Hz) with large noise $R$. An IMU provides high-frequency acceleration readings that can be integrated to predict position (small $Q$ between GPS updates). The Kalman filter fuses them: the predict step integrates IMU measurements, and the update step corrects drift using GPS.

Object tracking in computer vision. A constant-velocity model sets $F$ to propagate position and velocity forward. Bounding-box detections from a neural network serve as measurements. The Kalman filter handles missing detections (no update step) and noisy detections gracefully. SORT (Simple Online and Realtime Tracking) is built on this principle.

Financial time-series filtering. Estimating the “true” price of an asset from tick data contaminated by bid-ask bounce. The state is the efficient price, the observation is the transaction price, and $R$ captures microstructure noise. The filter separates the latent price process from noise.


Read Next: Robotics / Tracking