Prerequisite:


What Is a Differential Equation?

A differential equation is an equation relating an unknown function to one or more of its derivatives. Rather than asking “what number satisfies this equation?”, we ask “what function satisfies this equation?”

Definition. An ordinary differential equation (ODE) of order $n$ is an equation of the form

$$F!\left(x,, y,, y',, y'',, \ldots,, y^{(n)}\right) = 0$$

where $y = y(x)$ is the unknown function and $y^{(k)}$ denotes its $k$-th derivative with respect to $x$.

Order is the highest derivative appearing. Linearity means the equation is linear in $y$ and all its derivatives - no products like $y \cdot y'$ and no nonlinear terms like $\sin(y)$. A linear ODE has the form

$$a_n(x),y^{(n)} + a_{n-1}(x),y^{(n-1)} + \cdots + a_1(x),y' + a_0(x),y = g(x).$$

When $g(x) = 0$ the equation is homogeneous; otherwise it is inhomogeneous (or nonhomogeneous).


First-Order ODEs

Separable Equations

A first-order ODE is separable if it can be written as $\frac{dy}{dx} = g(x),h(y)$. Separating variables:

$$\frac{dy}{h(y)} = g(x),dx \implies \int \frac{dy}{h(y)} = \int g(x),dx.$$

Example. $y' = xy$ gives $\int \frac{dy}{y} = \int x,dx$, so $\ln|y| = \frac{x^2}{2} + C$, yielding $y = Ae^{x^2/2}$.

Linear First-Order Equations

The standard form is

$$y' + P(x),y = Q(x).$$

Theorem (Integrating Factor). Multiply both sides by $\mu(x) = e^{\int P(x),dx}$. Then the left side becomes $\frac{d}{dx}[\mu(x),y]$, and the general solution is

$$y = \frac{1}{\mu(x)}\left(\int \mu(x),Q(x),dx + C\right).$$

The integrating factor works because $\mu' = P\mu$, so $(\mu y)' = \mu y' + \mu' y = \mu(y' + Py) = \mu Q$.

Exact Equations

An equation $M(x,y),dx + N(x,y),dy = 0$ is exact if $\frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}$. In that case there exists a potential function $F(x,y)$ with $F_x = M$ and $F_y = N$, and the solution is $F(x,y) = C$.


Second-Order Linear ODEs with Constant Coefficients

Consider

$$ay'' + by' + cy = 0, \quad a \neq 0.$$

Substitute the trial solution $y = e^{rx}$ to get the characteristic equation

$$ar^2 + br + c = 0.$$

The discriminant $\Delta = b^2 - 4ac$ determines the form of the general solution.

Case 1: Two distinct real roots $r_1 \neq r_2$ ($\Delta > 0$).

$$y = C_1 e^{r_1 x} + C_2 e^{r_2 x}.$$

Case 2: Repeated root $r_1 = r_2 = r = -b/(2a)$ ($\Delta = 0$).

$$y = (C_1 + C_2 x),e^{rx}.$$

The factor $x$ is needed because $e^{rx}$ alone gives only one linearly independent solution.

Case 3: Complex conjugate roots $r = \alpha \pm i\beta$ ($\Delta < 0$).

$$y = e^{\alpha x}(C_1\cos\beta x + C_2\sin\beta x).$$

This follows from Euler’s formula $e^{i\beta x} = \cos\beta x + i\sin\beta x$.

Inhomogeneous Equations: Method of Undetermined Coefficients

For $ay'' + by' + cy = g(x)$ with $g$ a polynomial, exponential, or sinusoid, guess a particular solution $y_p$ of the same “family” as $g$ and solve for coefficients. The general solution is $y = y_h + y_p$ where $y_h$ is the homogeneous solution.

Superposition Principle. If $y_1$ and $y_2$ are solutions to the homogeneous equation, then $C_1 y_1 + C_2 y_2$ is also a solution. Any two linearly independent solutions form a basis for the solution space.

Variation of Parameters

For arbitrary $g(x)$, if ${y_1, y_2}$ is a fundamental set of solutions to the homogeneous equation, the particular solution is

$$y_p = -y_1 \int \frac{y_2,g}{W},dx + y_2 \int \frac{y_1,g}{W},dx$$

where $W = y_1 y_2' - y_2 y_1'$ is the Wronskian.

Theorem (Linear Independence via Wronskian). Two solutions $y_1, y_2$ are linearly independent if and only if $W(y_1, y_2)(x) \neq 0$ for some (equivalently, for all) $x$ in the interval.


Systems of ODEs

A first-order linear system has the form $\dot{\mathbf{x}} = A\mathbf{x}$ where $\mathbf{x}(t) \in \mathbb{R}^n$ and $A$ is an $n \times n$ matrix. The solution is

$$\mathbf{x}(t) = e^{At},\mathbf{x}_0$$

where the matrix exponential is defined by the power series

$$e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots$$

Computing the Matrix Exponential

If $A$ is diagonalizable with $A = P\Lambda P^{-1}$ (where $\Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n)$), then

$$e^{At} = P,e^{\Lambda t},P^{-1} = P\operatorname{diag}(e^{\lambda_1 t}, \ldots, e^{\lambda_n t})P^{-1}.$$

Each eigenvalue $\lambda_k$ contributes a mode $e^{\lambda_k t}$ to the solution.

Stability

The stability of $\dot{\mathbf{x}} = A\mathbf{x}$ at the origin is determined entirely by the eigenvalues of $A$:

  • All $\operatorname{Re}(\lambda_k) < 0$: asymptotically stable (solutions decay to zero).
  • Some $\operatorname{Re}(\lambda_k) > 0$: unstable (solutions grow).
  • All $\operatorname{Re}(\lambda_k) \leq 0$ with no repeated eigenvalues on the imaginary axis: neutrally stable.

Existence and Uniqueness: Picard-Lindelöf Theorem

Theorem (Picard-Lindelöf). Consider the initial value problem

$$y' = f(x, y), \quad y(x_0) = y_0.$$

If $f$ is continuous on a rectangle $R = {|x - x_0| \leq a,, |y - y_0| \leq b}$ and satisfies a Lipschitz condition in $y$ - that is, there exists $L > 0$ such that

$$|f(x, y_1) - f(x, y_2)| \leq L,|y_1 - y_2|$$

for all $(x, y_1), (x, y_2) \in R$ - then there exists $h > 0$ such that the IVP has a unique solution on $|x - x_0| \leq h$.

The proof constructs the solution as the limit of Picard iterates $y_{n+1}(x) = y_0 + \int_{x_0}^x f(t, y_n(t)),dt$, which converge by the Banach fixed-point theorem.


Phase Plane Analysis

For a 2D autonomous system $\dot{x} = f(x,y)$, $\dot{y} = g(x,y)$, fixed points (equilibria) satisfy $f = g = 0$. Near a fixed point, linearize to get $\dot{\mathbf{u}} = J\mathbf{u}$ where $J$ is the Jacobian. The eigenvalues of $J$ classify the fixed point:

Eigenvalues Type Stability
$\lambda_1, \lambda_2 < 0$ real Stable node Stable
$\lambda_1, \lambda_2 > 0$ real Unstable node Unstable
$\lambda_1 < 0 < \lambda_2$ Saddle Unstable
$\alpha \pm i\beta$, $\alpha < 0$ Stable spiral Stable
$\alpha \pm i\beta$, $\alpha > 0$ Unstable spiral Unstable
$\pm i\beta$ Center Neutrally stable

Phase Portrait: Simple Harmonic Oscillator

The simple harmonic oscillator $\ddot{x} + \omega^2 x = 0$ is equivalent to the system $\dot{x} = v$, $\dot{v} = -\omega^2 x$. Eigenvalues are $\pm i\omega$ - a center.

        v (velocity)
        |
    .   |   .
  .     |     .
 .      |      .
 .------+------. --> x (position)
 .      |      .
  .     |     .
    .   |   .
        |

Closed elliptical orbits around (0,0).
Energy E = (1/2)v^2 + (1/2)omega^2 x^2 = const.
Trajectories are level curves of E.

Each closed orbit corresponds to a conserved energy. This is the hallmark of a center.


Examples

Recurrent Neural Networks. An RNN with hidden state $h_t$ updated by $h_{t+1} = \sigma(Wh_t + Ux_t + b)$ is a discrete-time dynamical system. As the time step shrinks, this converges to a continuous ODE $\dot{h} = f(h, x)$.

Neural ODEs (Chen et al., 2018) make this connection explicit: define the network’s hidden state evolution by an ODE $\frac{d\mathbf{h}}{dt} = f_\theta(\mathbf{h}(t), t)$ and use a black-box ODE solver to compute the output. This gives continuous-depth models with $O(1)$ memory during training via the adjoint method.

Numerical Integration. Euler’s method discretizes $y' = f(x,y)$ as $y_{n+1} = y_n + h,f(x_n, y_n)$ - global error $O(h)$. The fourth-order Runge-Kutta (RK4) method achieves $O(h^4)$ accuracy with four function evaluations per step. These are the building blocks of all ODE-based deep learning libraries (torchdiffeq, diffrax).

The eigenvalue-stability relationship is not just mathematics: the gradient vanishing/exploding problem in RNNs is precisely the statement that eigenvalues of the recurrence Jacobian with $|\lambda| \neq 1$ cause exponential decay or growth over long sequences.


Read Next: