Vector Calculus for ML // Megha Bose

Prerequisite:

Gradients & Partial Derivatives

Vector calculus extends differentiation and integration from scalar functions to vector fields - functions that assign a vector to every point in space. The classical operators (divergence, curl, Laplacian) measure different aspects of how a vector field spreads, rotates, or concentrates. Their integral theorems (Green’s, Stokes', Divergence) are not just beautiful mathematics: they appear in normalizing flows, graph neural networks, and the score functions used in diffusion models.

The Operators: Gradient, Divergence, and Curl

Throughout, let $F = (F_1, F_2, F_3): \mathbb{R}^3 \to \mathbb{R}^3$ be a smooth vector field and $f: \mathbb{R}^3 \to \mathbb{R}$ a smooth scalar field.

Gradient

The gradient of $f$ is the vector field pointing in the direction of steepest increase:

$$\nabla f = \left(\frac{\partial f}{\partial x},\ \frac{\partial f}{\partial y},\ \frac{\partial f}{\partial z}\right)$$

Already discussed in detail in Gradients & Partial Derivatives; included here for completeness of the operator triad.

Divergence

The divergence of $F$ measures the net outward flux per unit volume at a point - how much the field “spreads out”:

$$\nabla \cdot F = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}$$

If $\nabla \cdot F(p) > 0$, the field has a source at $p$ (fluid flowing outward). If $\nabla \cdot F(p) < 0$, it has a sink. If $\nabla \cdot F = 0$ everywhere, $F$ is called divergence-free or incompressible.

Positive divergence          Zero divergence
(source at center):          (field has no sources):

   \  |  /                     -->  -->  -->
    \ | /                      -->  -->  -->
  ---\|/---                    -->  -->  -->
    /|\                        -->  -->  -->
   / | \

Curl

The curl of $F$ measures the infinitesimal rotation of the field around a point:

$$\nabla \times F = \left(\frac{\partial F_3}{\partial y} - \frac{\partial F_2}{\partial z},\quad \frac{\partial F_1}{\partial z} - \frac{\partial F_3}{\partial x},\quad \frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y}\right)$$

The magnitude $|\nabla \times F|$ gives the angular speed of rotation; the direction gives the axis (by the right-hand rule). A field with $\nabla \times F = 0$ everywhere is irrotational or conservative: it is the gradient of a scalar potential, $F = \nabla \phi$.

Key identities (valid for $C^2$ fields):

$\nabla \times (\nabla f) = 0$ - gradient fields are irrotational
$\nabla \cdot (\nabla \times F) = 0$ - curl fields are divergence-free

The Laplacian

The Laplacian of $f$ is the divergence of its gradient:

$$\nabla^2 f = \Delta f = \nabla \cdot (\nabla f) = \sum_{i=1}^n \frac{\partial^2 f}{\partial x_i^2}$$

(The definition extends naturally to $\mathbb{R}^n$.) The Laplacian measures how much $f(x)$ differs from its average over a small ball around $x$: $\Delta f(x) \approx \frac{2n}{r^2}\left(\bar{f}_r(x) - f(x)\right)$ where $\bar{f}_r(x)$ is the mean of $f$ on a sphere of radius $r$ centered at $x$.

Functions satisfying $\Delta f = 0$ are harmonic (e.g., electrostatic potentials in charge-free regions). Functions satisfying $\Delta f = f$ arise in the heat equation and in the analysis of graph Laplacians.

The vector Laplacian acts component-wise: $\Delta F = (\Delta F_1, \Delta F_2, \Delta F_3)$.

Line Integrals

Given a smooth curve $C$ parametrized by $\gamma: [a, b] \to \mathbb{R}^3$, the line integral of $F$ along $C$ is

$$\int_C F \cdot dr = \int_a^b F(\gamma(t)) \cdot \gamma'(t), dt$$

Physically, this computes the work done by force field $F$ on a particle moving along $C$. If $F = \nabla \phi$ is conservative, then $\int_C F \cdot dr = \phi(\gamma(b)) - \phi(\gamma(a))$ - the fundamental theorem for line integrals - and in particular the integral depends only on the endpoints, not the path.

The Integral Theorems

The three classical theorems of vector calculus relate integrals over a region to integrals over its boundary, and can be viewed as higher-dimensional versions of the fundamental theorem of calculus.

Green’s Theorem

Theorem (Green). Let $D \subset \mathbb{R}^2$ be a simply connected domain with piecewise-smooth boundary $\partial D$ oriented counterclockwise. Let $P, Q: \overline{D} \to \mathbb{R}$ be $C^1$. Then

$$\oint_{\partial D} P,dx + Q,dy = \iint_D \left(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\right) dA$$

The left side integrates around the boundary; the right side integrates the two-dimensional curl over the interior. Setting $P = -y/2$, $Q = x/2$ gives the area formula $A = \frac{1}{2}\oint_{\partial D}(x,dy - y,dx)$, used in computer graphics and computational geometry.

Stokes' Theorem

Theorem (Stokes). Let $S \subset \mathbb{R}^3$ be a smooth oriented surface with boundary $\partial S$ (a closed curve) oriented consistently. Let $F$ be a $C^1$ vector field on an open set containing $S$. Then

$$\oint_{\partial S} F \cdot dr = \iint_S (\nabla \times F) \cdot dS$$

where $dS = \mathbf{n}, dA$ with $\mathbf{n}$ the unit normal consistent with the orientation of $\partial S$. Stokes' theorem says the circulation of $F$ around the boundary of $S$ equals the total curl flux through $S$. Green’s theorem is the special case where $S$ is a planar region.

Divergence Theorem (Gauss’s Theorem)

Theorem (Gauss–Ostrogradsky). Let $V \subset \mathbb{R}^3$ be a bounded region with piecewise-smooth boundary $\partial V$ oriented with outward normal $\mathbf{n}$. Let $F$ be a $C^1$ vector field on an open set containing $\overline{V}$. Then

$$\oiint_{\partial V} F \cdot dS = \iiint_V \nabla \cdot F, dV$$

The surface integral counts the total outward flux; the volume integral sums all sources and sinks. This theorem is fundamental in physics: it derives Gauss’s law (electrostatics), the continuity equation for fluid flow, and the heat equation from local differential statements.

Unified View via Differential Forms

All three theorems are instances of a single master theorem. A differential $k$-form $\omega$ is a quantity that can be integrated over an oriented $k$-dimensional manifold. The exterior derivative $d$ maps $k$-forms to $(k+1)$-forms and satisfies $d^2 = 0$.

Generalized Stokes' Theorem. For a smooth $k$-form $\omega$ on a compact oriented manifold $M$ with boundary $\partial M$:

$$\int_{\partial M} \omega = \int_M d\omega$$

This subsumes all three classical theorems. In $\mathbb{R}^3$: the gradient corresponds to $d$ on 0-forms, the curl to $d$ on 1-forms, and the divergence to $d$ on 2-forms. The identities $\nabla \times \nabla f = 0$ and $\nabla \cdot \nabla \times F = 0$ are both $d^2 = 0$.

ML Relevance: Laplacian, Divergence, and Score Functions

Laplacian in Graph Neural Networks

For a graph $G = (V, E)$ with adjacency matrix $A$ and degree matrix $D$, the graph Laplacian is $L = D - A$. Its entries are

$$L_{ij} = \begin{cases} \deg(i) & i = j \ -1 & (i,j) \in E \ 0 & \text{otherwise} \end{cases}$$

$L$ is positive semidefinite, and $(x^T L x) = \sum_{(i,j) \in E}(x_i - x_j)^2$ measures the total variation of a signal $x$ on the graph. This is the discrete analogue of the continuous Dirichlet energy $\int |\nabla f|^2 dV$. Graph neural networks use $L$ (or the normalized Laplacian $\tilde{L} = D^{-1/2} L D^{-1/2}$) to aggregate neighbor information; spectral GNNs operate in the eigenbasis of $L$.

Laplacian regularization adds $\lambda x^T L x$ as a penalty to enforce smoothness of predictions over the graph - predictions on connected nodes should be similar.

Divergence in Normalizing Flows

A normalizing flow is a diffeomorphism $f_\theta: \mathbb{R}^n \to \mathbb{R}^n$ that transforms a simple base distribution $p_Z$ into a complex data distribution $p_X$. The change-of-variables formula gives

$$\log p_X(x) = \log p_Z(f_\theta^{-1}(x)) + \log |\det J_{f_\theta^{-1}}(x)|$$

Computing $\log |\det J|$ costs $O(n^3)$ in general. Continuous normalizing flows (CNFs, from the FFJORD paper) model the flow as the solution to an ODE $\dot{x}(t) = v_\theta(x(t), t)$. By the instantaneous change-of-variables formula (Liouville’s theorem):

$$\frac{d \log p(x(t))}{dt} = -\nabla \cdot v_\theta(x(t), t)$$

The log-density changes at the rate of the negative divergence of the velocity field. Computing $\nabla \cdot v_\theta$ costs $O(n)$ using Hutchinson’s trace estimator (stochastic divergence estimation), making CNFs scalable.

Score Functions in Diffusion Models

A score function is $s_\theta(x) \approx \nabla_x \log p(x)$ - the gradient of the log-density with respect to the data point $x$. Diffusion models (DDPM, Score SDE) learn this score function via denoising score matching:

$$\mathcal{L} = \mathbb{E}{x, \tilde{x}}\left[\left|s\theta(\tilde{x}) - \nabla_{\tilde{x}} \log p(\tilde{x}|x)\right|^2\right]$$

where $\tilde{x} = x + \sigma \epsilon$ is a noisy version of $x$. At sampling time, Langevin dynamics iterates $x_{t+1} = x_t + \frac{\epsilon}{2} s_\theta(x_t) + \sqrt{\epsilon}, z_t$ to produce samples from $p(x)$. The score function is exactly the gradient field that points toward regions of high probability mass - a continuous vector calculus object.

Laplacian in Signal Processing

The graph Fourier transform uses eigenvectors of $L$ as a basis. Low-frequency eigenvectors (small eigenvalues) vary slowly over the graph; high-frequency ones vary rapidly. Graph convolutions in the spectral domain are pointwise multiplications in this basis, generalizing the classical convolution theorem. The Laplacian smoothing operation $x \mapsto (I - \alpha L)x$ (for $\alpha$ small) damps high frequencies and is the discrete heat equation run for one step.

In continuous signal processing, the Laplacian appears in the heat equation $\partial_t u = \Delta u$ and the wave equation $\partial_{tt} u = \Delta u$. Its spectrum (eigenvalues of $-\Delta$ on a domain with boundary conditions) determines the natural frequencies of vibration - the same mathematical structure underlies both physical simulation and spectral graph methods in GNNs.

Read Next: