Helpful context:


Model predictive control is a control strategy that solves an optimization problem at every time step. Rather than precomputing a fixed policy and applying it open-loop, MPC uses a model of the system to simulate the future, optimizes over a finite window, executes the first control action, then discards the rest of the solution and repeats from the new state. The controller is not a static map from state to input - it is a running optimization engine.

This receding horizon idea reconciles two things that seem in tension: the computational tractability of short-horizon optimization and the long-term performance of infinite-horizon control. Solving an infinite-horizon problem exactly is generally intractable, and precomputed finite-horizon policies may not stabilize the system. MPC sidesteps both problems by repeatedly solving a short problem, but incorporating new state information at each step - turning open-loop planning into closed-loop feedback.

The consequence is a controller that handles constraints natively, adapts to model mismatch and disturbances automatically, and has a clean stability theory grounded in Lyapunov functions and dynamic programming. It is the dominant control paradigm for constrained systems in industrial process control, autonomous vehicles, aerospace, and robotics.


The Setup: Dynamical Systems

The underlying object is a discrete-time dynamical system: $$x_{t+1} = f(x_t, u_t)$$ where $x_t \in \mathbb{R}^n$ is the state at time $t$ and $u_t \in \mathbb{R}^m$ is the control input. The state captures everything about the system relevant to future behavior; the control input is what the controller chooses.

The most well-studied case is the linear system: $$x_{t+1} = Ax_t + Bu_t$$ with $A \in \mathbb{R}^{n \times n}$ and $B \in \mathbb{R}^{n \times m}$. Linearity allows the entire future state trajectory to be expressed as a linear function of the initial state and the control sequence, which makes the optimization tractable.

Both state and input are typically constrained. State constraints $x_t \in \mathcal{X}$ encode safety requirements, physical limits, or operating ranges. Input constraints $u_t \in \mathcal{U}$ encode actuator limits - an electric motor has a maximum torque, a valve has a maximum flow rate. These sets are usually polytopes: $\mathcal{X} = \{x \mid C_x x \leq d_x\}$, $\mathcal{U} = \{u \mid C_u u \leq d_u\}$.

The objective is to minimize a cost accumulated over time. The natural infinite-horizon formulation is: $$\min_{\{u_t\}_{t \geq 0}} \sum_{t=0}^{\infty} \ell(x_t, u_t)$$ The standard choice is the quadratic stage cost $\ell(x,u) = x^T Q x + u^T R u$ with $Q \succeq 0$ and $R \succ 0$. The $Q$ matrix penalizes state deviation; $R \succ 0$ ensures the input is penalized, which also guarantees the optimization is strictly convex in $u$. The unconstrained infinite-horizon linear-quadratic problem has a closed-form solution via the discrete-time algebraic Riccati equation (DARE) - this is the LQR controller. MPC generalizes this to the constrained case.


The Receding Horizon Principle

Instead of solving the infinite-horizon problem, MPC solves a finite-horizon approximation of length $N$ - the prediction horizon - at each time step.

At time $t$, given the current state $x_t$, solve: $$\min_{u_t, \ldots, u_{t+N-1}} \sum_{k=0}^{N-1} \ell(x_{t+k}, u_{t+k}) + V_f(x_{t+N})$$ subject to: $$x_{t+k+1} = f(x_{t+k}, u_{t+k}), \quad k = 0, \ldots, N-1$$ $$x_{t+k} \in \mathcal{X}, \quad u_{t+k} \in \mathcal{U}, \quad k = 0, \ldots, N-1$$ $$x_{t+N} \in \mathcal{X}_f$$

Let $(u_t^{\ast}, \ldots, u_{t+N-1}^{\ast})$ be the optimal solution. Apply only the first input: $u_t \leftarrow u_t^{\ast}$. Observe $x_{t+1}$, shift the window to $[t+1, t+N]$, and repeat.

Three components of the formulation need careful design. The prediction horizon $N$ trades off computation against approximation quality: larger $N$ gives a better approximation to the infinite-horizon cost but requires solving a larger optimization problem. The terminal cost $V_f : \mathbb{R}^n \to \mathbb{R}$ approximates the tail cost $\sum_{k=N}^{\infty} \ell(x_{t+k}, u_{t+k})$ - it must be chosen to ensure stability. The terminal set $\mathcal{X}_f \subseteq \mathcal{X}$ constrains where the state can be at the end of the horizon, serving as both a stability mechanism and a recursive feasibility guarantee.

The key insight is that re-solving the optimization at each step with the new measured state $x_{t+1}$ provides feedback. Disturbances and model errors that push the state off the predicted trajectory are corrected automatically - the optimization simply starts from the actual current state rather than the predicted one. This is why MPC is robust to model mismatch, even though the model used in the optimization may not perfectly match reality.


Linear MPC and QP

For linear systems with quadratic cost and polytopic constraints, the optimization at each step reduces to a quadratic program (QP). Let $U = (u_t, \ldots, u_{t+N-1}) \in \mathbb{R}^{mN}$ be the stacked control sequence. Because $x_{t+k}$ is a linear function of $x_t$ and $(u_t, \ldots, u_{t+k-1})$, the cost and constraints can all be written in terms of $U$ and the current state $x_t$:

$$\min_{U} \frac{1}{2} U^T H U + q^T U \quad \text{subject to} \quad GU \leq w + Ex_t$$

where $H \succ 0$ (inheriting strict convexity from $R \succ 0$), and the matrices $H$, $G$, $w$, $E$ are computed once offline from $A$, $B$, $Q$, $R$, $N$ and the constraint matrices. The right-hand side $w + Ex_t$ depends on the current state, so the QP is re-solved with a different $q$ and $w + Ex_t$ at each time step, but the structure and $H$, $G$ remain fixed.

QPs are convex and solvable in polynomial time. Modern active-set and interior-point solvers - OSQP, qpOASES, ECOS - exploit the fixed sparsity structure and warm-starting (initializing the solver from the previous solution, shifted by one step) to achieve millisecond solve times, making real-time control at kHz rates feasible for moderate $n$ and $N$.

Explicit MPC is an alternative that precomputes the optimal control law $u^{\ast}(x)$ offline for all $x$ in some compact set. The solution to the parametric QP is piecewise affine (PWA) in $x$: the state space partitions into polyhedral regions, and on each region the optimal control is an affine function $u^{\ast}(x) = K_i x + k_i$. At runtime, the controller evaluates a point location query (which region is $x$ in?) and applies the corresponding affine law - no online optimization required. The tradeoff: the number of regions can grow exponentially in the state dimension, making explicit MPC practical only for low-dimensional systems ($n \lesssim 5$).


Stability

The receding horizon strategy is not obviously stable. An optimal finite-horizon policy can drive the system toward the constraint boundary without any regard for what happens after the horizon ends. The fundamental question is: under what conditions does the closed-loop system $x_{t+1} = f(x_t, u^{\ast}_t(x_t))$ converge to the origin?

The standard sufficient conditions are:

  1. Terminal constraint: $x_{t+N} \in \mathcal{X}_f$, where $\mathcal{X}_f$ is a positively invariant set for a local stabilizing controller $\kappa : \mathcal{X}_f \to \mathcal{U}$ - meaning $f(x, \kappa(x)) \in \mathcal{X}_f$ for all $x \in \mathcal{X}_f$.

  2. Terminal cost: $V_f$ is a control Lyapunov function on $\mathcal{X}_f$ satisfying: $$V_f(f(x, \kappa(x))) - V_f(x) \leq -\ell(x, \kappa(x)) \quad \forall x \in \mathcal{X}_f$$

The idea is that $V_f$ upper-bounds the tail cost under the local controller $\kappa$. Under these conditions, the optimal value function $V_N^{\ast}(x_t)$ - the minimum cost achievable from $x_t$ over the horizon - is a Lyapunov function for the closed-loop system: $V_N^{\ast}(x_{t+1}) \leq V_N^{\ast}(x_t) - \ell(x_t, u_t^{\ast})$. Since $\ell \geq 0$ with $\ell(x,u) = 0$ only at the origin, $V_N^{\ast}$ decreases at every step and the system converges.

For linear systems, $\mathcal{X}_f$ is typically taken as the maximal positively invariant set of the LQR-controlled system subject to the constraints, and $V_f(x) = x^T P x$ where $P$ solves the DARE - the infinite-horizon LQR cost. This is the canonical stabilizing terminal cost.

Stability is also achievable without explicit terminal constraints, provided $N$ is large enough. The condition is cost controllability: there exists a finite-horizon feasible trajectory from any $x \in \mathcal{X}$ with cost bounded by a class-$\mathcal{K}$ function of $|x|$. In practice this means choosing $N$ large relative to the system’s transient timescale.


Handling Constraints - the Core Advantage

The defining advantage of MPC over classical controllers is native constraint handling. LQR, PID, and pole-placement controllers are designed for unconstrained systems; constraints are handled ad hoc (clamping, anti-windup), with no guarantee that the closed-loop system respects them. MPC incorporates constraints directly into the optimization - the optimizer cannot return a solution that violates hard constraints by construction.

The constraint types MPC handles include:

  • Input saturation: $u_{\min} \leq u_t \leq u_{\max}$, the most common actuator limit.
  • State constraints: $x_t \in \mathcal{X}$, encoding safety bounds, operating envelopes, or physical limits (temperature, pressure, angle).
  • Output constraints: $y_t = Cx_t \in \mathcal{Y}$, when the constrained quantity is an output rather than the full state.
  • Mixed state-input constraints: $Ex_t + Fu_t \leq g$, coupling state and input.

Soft constraints relax hard constraints by introducing slack variables $\epsilon \geq 0$ and adding a penalty $\rho |\epsilon|^2$ to the cost: $$Ex_t + Fu_t \leq g + \epsilon, \quad \epsilon \geq 0$$ This preserves feasibility of the QP even when the hard constraint set would be empty, at the cost of possible constraint violation.

Recursive feasibility - the guarantee that if the optimization is feasible at time $t$ it remains feasible at $t+1$ - follows from the terminal constraint design. If $x_{t+N} \in \mathcal{X}_f$ and $\mathcal{X}_f$ is positively invariant, then the shifted sequence $(u_{t+1}^{\ast}, \ldots, u_{t+N-1}^{\ast}, \kappa(x_{t+N}))$ is feasible at time $t+1$. Recursive feasibility and stability are the two key properties of a well-posed MPC formulation, and the terminal constraint simultaneously delivers both.


Nonlinear MPC (NMPC)

For nonlinear systems $x_{t+1} = f(x_t, u_t)$, the optimization at each step is a nonlinear program (NLP) - generally nonconvex, with multiple local minima and no polynomial-time guarantee. The state trajectory enters the cost and constraints nonlinearly, so the QP structure of linear MPC is lost.

Standard numerical methods for NMPC:

  • Sequential quadratic programming (SQP): Iteratively linearize $f$ around the current iterate, solve the resulting QP, update. Convergence is local and requires a good initial guess - warm-starting from the previous solution is essential.
  • Interior point methods (IPM): Solve the KKT conditions of the NLP directly using a barrier formulation. More robust to nonconvexity than active-set methods but typically slower per iteration.
  • Real-time iteration (RTI): Perform a single SQP step per time step - linearize once, solve the QP, apply, repeat. Sacrifices convergence to a local optimum for guaranteed execution time. Surprisingly effective in practice because the solution changes slowly between time steps.

The stability theory for NMPC mirrors the linear case: a terminal cost satisfying the decrease condition and a positively invariant terminal set are sufficient, with the same Lyapunov argument. The difficulty is constructing $V_f$ and $\mathcal{X}_f$ for a nonlinear system - this typically requires local linearization (use the LQR cost and LQR-controlled invariant set for the linearized system near the origin, valid in a neighborhood).

Applications where NMPC is standard: autonomous vehicle trajectory optimization, chemical reactor temperature control (highly nonlinear dynamics), aerospace attitude control, and humanoid robot whole-body control.


MPC and Reinforcement Learning

MPC is planning with a model: at each step, simulate the future under the model, optimize over actions, act. This is precisely what model-based reinforcement learning does, making the two frameworks deeply connected.

The connection to dynamic programming is exact. The optimal value function $V^{\ast}(x)$ for the infinite-horizon problem satisfies the Bellman equation: $$V^{\ast}(x) = \min_u \left[ \ell(x,u) + V^{\ast}(f(x,u)) \right]$$ MPC approximates this by truncating at horizon $N$ and replacing the tail with $V_f \approx V^{\ast}$. If $V_f = V^{\ast}$ exactly, MPC recovers the globally optimal policy regardless of $N$. In practice $V_f$ is an approximation, and larger $N$ reduces the approximation error.

Learned models: given data $\{(x_t, u_t, x_{t+1})\}$, fit a model $\hat{f}$ - linear regression, Gaussian process, neural network - and use it in the MPC optimization. This is the core of learning-based MPC. The model error introduces a mismatch between the optimization and reality, which can be bounded using robust MPC techniques (tube MPC, stochastic MPC).

MPC as a safety filter for RL: a learned policy $\pi_\theta$ may be performant on average but occasionally violate safety constraints. MPC can act as a safety layer: at each step, check whether applying $\pi_\theta(x_t)$ leads to a safe trajectory; if not, substitute the MPC action. This recovers constraint satisfaction guarantees without retraining the policy.

Value function approximation: in model-based RL, the learned value function $\hat{V}$ is a natural candidate for the terminal cost $V_f$ in MPC. This closes the loop between the two frameworks - RL learns a better tail-cost approximation, MPC uses it to improve finite-horizon planning, and the improved policy generates better data for RL.


Summary

Concept Key idea
Prediction horizon $N$ Window over which the optimization runs; trades computation for approximation quality
Receding horizon Apply first action only; re-solve at next step with new state measurement
Linear MPC Reduces to QP; fixed structure enables fast solvers and warm-starting
Explicit MPC Precompute piecewise-affine law offline; avoids online optimization at cost of exponential regions
Terminal constraint $\mathcal{X}_f$ Positively invariant set; ensures stability and recursive feasibility
Terminal cost $V_f$ Control Lyapunov function approximating tail cost; decrease condition is key
Recursive feasibility Feasibility at $t$ implies feasibility at $t+1$ via shifted trajectory
NMPC Nonlinear system $\to$ NLP; SQP, IPM, or RTI; same stability theory applies

Read next: