The Legendre Transform - Swapping Variables, Preserving Structure // Megha Bose

Helpful context:

The Legendre transform is one of those ideas that appears in multiple places under different names and you only realise it’s the same thing after being confused by all of them separately. It shows up in classical mechanics as the passage from Lagrangian to Hamiltonian. In thermodynamics as the relationship between thermodynamic potentials. In convex analysis as the convex conjugate. In machine learning as a tool in variational inference and exponential families. The underlying operation is the same in all of these; the notation just changes.

The Basic Idea

Start with a function $f : \mathbb{R} \to \mathbb{R}$ that is convex and differentiable. The Legendre transform produces a new function $f^\ast : \mathbb{R} \to \mathbb{R}$ defined by:

$$f^\ast(p) = \sup_{x \in \mathbb{R}} \left[ px - f(x) \right]$$

The input to $f^\ast$ is $p$, and for each value of $p$ you take the supremum over all $x$ of the expression $px - f(x)$.

Before unpacking what this means, notice what it does to units and variables. If $f$ takes $x$ and returns a value, then $f^\ast$ takes $p$ (which has units of $f'(x)$, i.e. the derivative of $f$ with respect to $x$) and returns a value. The transform swaps the role of a variable and its derivative. This is the core of it.

Geometric Intuition

Think of the graph of $f(x)$. A line with slope $p$ is $y = px + c$ for some intercept $c$. Among all such lines, which one is tangent to the graph of $f$ from below?

The tangent line with slope $p$ touches the curve at the point $x_0$ where $f'(x_0) = p$. Its equation is:

$$y = px - (px_0 - f(x_0)) = px - f^\ast(p)$$

So $f^\ast(p)$ is the negative of the $y$-intercept of the tangent line to $f$ with slope $p$.

Equivalently: the graph of $f$ can be described either as the set of points $(x, f(x))$, or as the envelope of all its tangent lines. The Legendre transform encodes the second description. Instead of saying “the function value at $x$”, you say “the intercept of the tangent line with slope $p$”. These two descriptions contain the same information (when $f$ is convex).

This duality - between a convex function and its family of supporting hyperplanes - is the geometric heart of the transform.

Finding the Transform

For a convex differentiable $f$, the supremum in $f^\ast(p) = \sup_x [px - f(x)]$ is achieved at the point $x^\ast$ where the derivative is zero:

$$\frac{d}{dx}[px - f(x)] = p - f'(x) = 0 \implies f'(x^\ast) = p$$

So $x^\ast$ is the value where $f'$ equals $p$. If $f'$ is invertible (which it is when $f$ is strictly convex), then $x^\ast = (f')^{-1}(p)$, and:

$$f^\ast(p) = p \cdot (f')^{-1}(p) - f\left((f')^{-1}(p)\right)$$

Example. Take $f(x) = \frac{1}{2}x^2$. Then $f'(x) = x$, so $x^\ast = p$. Then:

$$f^\ast(p) = p \cdot p - \frac{1}{2}p^2 = \frac{1}{2}p^2$$

The Legendre transform of $\frac{1}{2}x^2$ is itself. This is a special property of the quadratic - it’s self-dual under the transform.

Example. Take $f(x) = e^x$. Then $f'(x) = e^x = p$ gives $x^\ast = \ln p$. Then:

$$f^\ast(p) = p \ln p - e^{\ln p} = p \ln p - p$$

So the convex conjugate of the exponential is $p \ln p - p$ (defined for $p > 0$).

Key Properties

Involution. For a closed convex function $f$, applying the Legendre transform twice recovers $f$:

$$(f^\ast)^\ast = f$$

This is the content of the Fenchel - Moreau theorem. The transform is an involution on the space of closed convex functions: it’s its own inverse. This is what makes it a genuine duality rather than just a one-way operation.

Young’s inequality. For any $x$ and $p$:

$$px \leq f(x) + f^\ast(p)$$

with equality iff $p = f'(x)$ (equivalently, $x = (f^\ast)'(p)$). This follows directly from the definition of the supremum. In thermodynamics this is the statement that the Gibbs inequality is saturated at equilibrium; in optimisation it appears as a bound on inner products.

Convexity. $f^\ast$ is always convex, regardless of whether $f$ is. The supremum of linear functions is always convex.

Scaling and translation. If $g(x) = f(\alpha x)$, then $g^\ast(p) = f^\ast(p/\alpha)$. If $g(x) = f(x) + cx$, then $g^\ast(p) = f^\ast(p - c)$. These follow directly from the definition.

The Generalisation: Convex Conjugate

The same definition works for functions $f : \mathbb{R}^n \to \mathbb{R}$:

$$f^\ast(p) = \sup_{x \in \mathbb{R}^n} \left[ \langle p, x \rangle - f(x) \right]$$

where $\langle p, x \rangle = p^\top x$ is the inner product and $p \in \mathbb{R}^n$. This is called the Fenchel conjugate or convex conjugate.

The function $f^\ast$ need not be differentiable even if $f$ is, and the domain of $f^\ast$ (the set of $p$ for which the supremum is finite) can be a strict subset of $\mathbb{R}^n$.

In Classical Mechanics

The Lagrangian $L(q, \dot{q})$ of a mechanical system is a function of position $q$ and velocity $\dot{q}$. The Hamiltonian $H(q, p)$ is its Legendre transform with respect to $\dot{q}$:

$$p = \frac{\partial L}{\partial \dot{q}}, \qquad H(q, p) = p\dot{q} - L(q, \dot{q})$$

where $\dot{q}$ on the right is expressed in terms of $p$ by inverting the first equation.

The Legendre transform swaps the velocity variable $\dot{q}$ for the momentum variable $p = \partial L / \partial \dot{q}$. Newton’s equations of motion, written in Lagrangian form as second-order ODEs in $q$, become Hamilton’s equations:

$$\dot{q} = \frac{\partial H}{\partial p}, \qquad \dot{p} = -\frac{\partial H}{\partial q}$$

These are first-order, and the symmetry between $q$ and $p$ in the Hamiltonian formulation is what makes it the natural home for symplectic geometry, canonical transformations, and eventually quantum mechanics.

In Thermodynamics

The internal energy $U(S, V, N)$ depends on entropy $S$, volume $V$, and particle number $N$. Temperature is defined as $T = \partial U / \partial S$. The Helmholtz free energy is the Legendre transform of $U$ with respect to $S$:

$$F(T, V, N) = U - TS$$

The variable $S$ is swapped for its conjugate $T$. Similarly:

Enthalpy $H = U + PV$ swaps volume $V$ for pressure $P = -\partial U / \partial V$.
Gibbs free energy $G = U - TS + PV$ performs both transforms.

The family of thermodynamic potentials is exactly the family of Legendre transforms of $U$ in its various arguments. They contain the same information but make different variables natural as controls, which is why you use $F$ when temperature is fixed (e.g. isothermal processes) and $G$ when both temperature and pressure are fixed.

In Convex Optimisation and ML

In optimisation, the convex conjugate appears in duality theory. The Fenchel dual of the problem $\min_x f(x) + g(x)$ involves $f^\ast$ and $g^\ast$, and under mild conditions strong duality holds: the primal and dual optimal values coincide.

In machine learning, the log-partition function of an exponential family distribution:

$$A(\eta) = \log \int \exp(\eta^\top T(x)) d\mu(x)$$

is convex in the natural parameter $\eta$. Its convex conjugate is:

$$A^\ast(\mu) = \sup_\eta \left[ \eta^\top \mu - A(\eta) \right]$$

which equals the negative entropy $-H[p]$ of the distribution with mean parameters $\mu$. The transform connects the natural parameterisation of an exponential family (via $\eta$) to the mean parameterisation (via $\mu = \mathbb{E}[T(x)]$), and the two are related by:

$$\mu = \nabla A(\eta), \qquad \eta = \nabla A^\ast(\mu)$$

This duality between natural and mean parameters is what makes belief propagation and variational inference work cleanly for exponential families.

Summary

Context	Function	Transform	Variable swap
Convex analysis	$f(x)$	$f^\ast(p) = \sup_x[px - f(x)]$	$x \leftrightarrow p = f'(x)$
Classical mechanics	$L(q, \dot{q})$	$H(q, p)$	$\dot{q} \leftrightarrow p = \partial L/\partial \dot{q}$
Thermodynamics	$U(S, V)$	$F(T, V)$	$S \leftrightarrow T = \partial U/\partial S$
Exponential families	$A(\eta)$	$A^\ast(\mu)$	$\eta \leftrightarrow \mu = \nabla A(\eta)$

The pattern in every case: you have a convex function of some variable, you take its derivative to get a conjugate variable, and the Legendre transform gives you a new function of the conjugate variable that contains the same information but makes the conjugate variable natural. The involution $(f^\ast)^\ast = f$ is what makes this a genuine duality rather than a one-way change of variables.

Read next:

Entropy & Information Theory - The Mathematics of Surprise