Kalman Filtering - Optimal Estimation When Sensors Lie
Helpful context:
- Probability Distributions - The Shapes That Randomness Takes
- Vectors, Matrices & Linear Systems - The Language of Simultaneous Equations
In 1960, Rudolf Kálmán published a paper that enabled the Apollo moon landing. The Apollo navigation computer used Kálmán’s filter to track the spacecraft’s position from noisy sensor data. It works by fusing a prediction (from physics) with a measurement (from sensors), weighting each by its uncertainty. The result is always better than either alone.
That’s the whole idea: you have two imperfect sources of information. Physics tells you where you should be. Sensors tell you where you appear to be. The Kalman filter takes the weighted average, assigning more trust to whichever is more reliable.
The Setup
A system evolves over time. You can’t observe the state directly - only noisy measurements of it.
State transition model - how the hidden state evolves:
$$x_t = F x_{t-1} + B u_t + w_t, \qquad w_t \sim \mathcal{N}(0, Q)$$
Here $x_t$ is the hidden state (say, position and velocity of a spacecraft), $F$ is the transition matrix (physics: how the state changes over one time step), $u_t$ is a known control input (thruster firing), and $w_t$ is process noise with covariance $Q$ (model uncertainty, wind, etc.).
Measurement model - what you observe:
$$y_t = H x_t + v_t, \qquad v_t \sim \mathcal{N}(0, R)$$
Here $y_t$ is your measurement, $H$ maps the state to measurement space, and $v_t$ is measurement noise with covariance $R$ (sensor error).
Both noises are Gaussian. This is the key assumption. Under it, everything stays Gaussian, which means you can track the full probability distribution over the hidden state using just two numbers: a mean vector $\hat{x}$ and a covariance matrix $P$.
The Kalman Filter Cycle
The filter alternates between two steps: predict and update.
Predict
Use the dynamics to project the current estimate forward in time:
$$\hat{x}{t \mid t-1} = F \hat{x}{t-1 \mid t-1} + B u_t$$
$$P_{t \mid t-1} = F P_{t-1 \mid t-1} F^T + Q$$
The predicted state mean propagates through the physics. The predicted covariance grows: process noise $Q$ adds uncertainty because you don’t know the exact dynamics.
Update
A new measurement $y_t$ arrives. Compute the innovation - the gap between what you observed and what you predicted:
$$\text{innovation} = y_t - H \hat{x}_{t \mid t-1}$$
Compute the Kalman gain:
$$K_t = P_{t \mid t-1} H^T \left(H P_{t \mid t-1} H^T + R\right)^{-1}$$
Update the estimate and covariance:
$$\hat{x}{t \mid t} = \hat{x}{t \mid t-1} + K_t \left(y_t - H \hat{x}_{t \mid t-1}\right)$$
$$P_{t \mid t} = (I - K_t H) P_{t \mid t-1}$$
The Kalman Gain: How Much to Trust the Measurement
The Kalman gain $K_t = P_{t \mid t-1} H^T (H P_{t \mid t-1} H^T + R)^{-1}$ looks complex but has a clean interpretation.
The numerator $P_{t \mid t-1} H^T$ captures the predicted state uncertainty projected into measurement space. The denominator $H P_{t \mid t-1} H^T + R$ is the total measurement uncertainty: prediction uncertainty plus sensor noise.
- If measurement noise $R$ is large (unreliable sensors): $K_t$ is small. The filter trusts the prediction more.
- If process noise $Q$ is large (unreliable physics): $P_{t \mid t-1}$ is large, so $K_t$ is large. The filter trusts the measurement more.
- If $R \to 0$ (perfect sensors): $K_t \to H^{-1}$ - the filter ignores the prediction and takes the measurement as truth.
- If $P \to 0$ (prediction is certain): $K_t \to 0$ - the filter ignores the measurement.
The gain is exactly the weight that minimizes the posterior variance. You’re always doing the optimal fusion of two uncertain sources.
Why This Is Exact Bayesian Inference
If the prior and the likelihood are both Gaussian, the posterior is Gaussian - and its mean and covariance are exactly what the Kalman update computes.
The predicted state distribution $p(x_t \mid y_1, \ldots, y_{t-1}) = \mathcal{N}(\hat{x}{t \mid t-1}, P{t \mid t-1})$ is the prior. The measurement likelihood is $p(y_t \mid x_t) = \mathcal{N}(Hx_t, R)$. Bayes' theorem gives:
$$p(x_t \mid y_1, \ldots, y_t) \propto p(y_t \mid x_t) \cdot p(x_t \mid y_1, \ldots, y_{t-1})$$
Both factors are Gaussian, so the posterior is Gaussian. Computing its mean and covariance by completing the square in the exponent gives exactly the Kalman update equations. The filter is not an approximation - it’s the exact Bayesian posterior.
Discomfort check. The Kalman filter only works for linear systems with Gaussian noise. The real world is neither. What then?
Three generalizations:
Extended Kalman Filter (EKF): linearize nonlinear functions around the current estimate using Jacobians. Fast, but the approximation can be poor for highly nonlinear systems and the filter can diverge.
Unscented Kalman Filter (UKF): instead of linearizing, propagate a carefully chosen set of “sigma points” through the true nonlinear function and refit a Gaussian. More accurate than EKF for mildly nonlinear systems, same computational cost.
Particle filter: represent the distribution as a set of weighted samples. Non-parametric - no Gaussian assumption needed. Handles any noise distribution, any nonlinearity. Cost scales with number of particles and degrades in high dimensions (curse of dimensionality). Used in robotics localization and computer vision.
Practical Complexity
Per time step: the dominant cost is the matrix inversion in the Kalman gain, $O(m^3)$ where $m$ is the measurement dimension, plus the matrix multiplications $O(n^2 m)$ where $n$ is the state dimension.
For a time-invariant system (constant $F$, $H$, $Q$, $R$), the covariance eventually converges to a steady state $P_\infty$ satisfying the discrete algebraic Riccati equation. Once you’ve pre-computed $P_\infty$ and the corresponding steady-state gain $K_\infty$, the runtime update reduces to matrix-vector products - no inversion needed at each step. This is the form used in embedded navigation systems.
Applications
GPS navigation. GPS measurements have large, slow-varying errors. An inertial measurement unit (IMU) provides high-frequency acceleration readings that predict position between GPS fixes. The Kalman filter fuses them: predict with IMU, update with GPS. The result has less noise than either alone.
Robotics (SLAM). Simultaneous localization and mapping: the robot estimates its own position while building a map of landmarks. Both the robot pose and landmark positions are in the state vector. The Kalman filter (or its nonlinear variants) maintains a joint distribution over all of them.
Object tracking. A constant-velocity model propagates bounding box positions in video. Detections from a neural network are noisy measurements. The Kalman filter smooths the track and handles frames where detection fails. SORT (Simple Online and Realtime Tracking), a widely used tracking algorithm, is built on this exactly.
Estimating volatility. The hidden state is the true volatility of an asset. Observable data is the transaction price, which mixes true price and microstructure noise. A Kalman filter on a state-space model separates the signal from the noise.
Summary
| Concept | Detail |
|---|---|
| State model | $x_t = Fx_{t-1} + Bu_t + w_t$, $w_t \sim \mathcal{N}(0, Q)$ |
| Measurement model | $y_t = Hx_t + v_t$, $v_t \sim \mathcal{N}(0, R)$ |
| Predict: mean | $\hat{x}{t \mid t-1} = F\hat{x}{t-1 \mid t-1} + Bu_t$ |
| Predict: covariance | $P_{t \mid t-1} = FP_{t-1 \mid t-1}F^T + Q$ |
| Kalman gain | $K_t = P_{t \mid t-1}H^T(HP_{t \mid t-1}H^T + R)^{-1}$ |
| Update: mean | $\hat{x}{t \mid t} = \hat{x}{t \mid t-1} + K_t(y_t - H\hat{x}_{t \mid t-1})$ |
| Update: covariance | $P_{t \mid t} = (I - K_tH)P_{t \mid t-1}$ |
| Why it works | Exact Bayesian posterior for Gaussian prior + Gaussian likelihood |
| Extensions | EKF (linearize), UKF (sigma points), particle filter (samples) |
Read next: