Limits & Continuity
Prerequisite:
The limit is the engine of calculus. Derivatives are limits of difference quotients. Integrals are limits of sums. Continuity is defined in terms of limits. If you understand limits rigorously - not just intuitively, but formally - the rest of calculus becomes a series of careful applications of one central idea.
The Informal Idea
Consider $f(x) = \frac{x^2 - 1}{x - 1}$. This function is not defined at $x = 1$ - the denominator is zero there. But factor the numerator: $\frac{(x-1)(x+1)}{x-1} = x + 1$ for all $x \neq 1$.
Near $x = 1$, $f(x)$ gets close to $2$. Not at $x = 1$ - the function doesn’t exist there - but approaching it. We say:
$$\lim_{x \to 1} f(x) = 2.$$
The key idea: the limit asks what value $f(x)$ approaches as $x$ approaches $a$, not what $f(a)$ is. The function’s value at the point is irrelevant - or might not even exist.
This is conceptually clean but mathematically imprecise. “Gets close” and “approaches” are informal. What makes a rigorous definition is specifying how close and when.
The Formal $\varepsilon$-$\delta$ Definition
Definition. We say $\lim_{x \to a} f(x) = L$ if:
$$\forall \varepsilon > 0,\ \exists \delta > 0 \text{ such that } 0 < |x - a| < \delta \implies |f(x) - L| < \varepsilon.$$
Read this carefully. It is a game between two players:
- The challenger picks any $\varepsilon > 0$ - their tolerance. They are demanding that $f(x)$ be within $\varepsilon$ of $L$.
- You must respond with a $\delta > 0$ - a neighborhood of $a$ - such that whenever $x$ is within $\delta$ of $a$ (but $x \neq a$), the output $f(x)$ is within $\varepsilon$ of $L$.
If you can always win this game, the limit is $L$.
The condition $0 < |x - a|$ is crucial: it says $x \neq a$. The limit does not care about $f(a)$.
Visualizing the Definition
f(x)
|
L+ε +-----------+-----+-------+
| |xxxxx| |
L + | * | +-- f(x) must stay in this band
| |xxxxx| |
L-ε +-----------+-----+-------+
| |
+----+-------+-------+----+
a-δ a a+δ x
For any x in (a-δ, a+δ), x ≠ a,
f(x) must land in the band (L-ε, L+ε).
The $\varepsilon$ controls the vertical band; the $\delta$ controls the horizontal window. The definition says: for any vertical tolerance, you can find a horizontal window that works.
Worked Example: $\lim_{x \to 2} x^2 = 4$
Claim: $\lim_{x \to 2} x^2 = 4$.
Proof. Let $\varepsilon > 0$ be given. We need $\delta > 0$ such that $0 < |x - 2| < \delta \implies |x^2 - 4| < \varepsilon$.
Factor: $|x^2 - 4| = |x - 2| \cdot |x + 2|$.
We need to bound $|x + 2|$. Assume $\delta \leq 1$ (we will choose $\delta$ to be at most 1). Then $|x - 2| < 1$ implies $1 < x < 3$, so $3 < x + 2 < 5$, and thus $|x + 2| < 5$.
Therefore:
$$|x^2 - 4| = |x - 2| \cdot |x + 2| < 5|x - 2|.$$
We want this to be less than $\varepsilon$, so we need $|x - 2| < \varepsilon/5$.
Choose $\delta = \min\left(1, \frac{\varepsilon}{5}\right)$. Then for $0 < |x - 2| < \delta$:
$$|x^2 - 4| < 5 \cdot \delta \leq 5 \cdot \frac{\varepsilon}{5} = \varepsilon. \quad \blacksquare$$
This is the template: bound the output difference in terms of the input difference, then work backwards to find $\delta$.
One-Sided Limits and Limits at Infinity
Sometimes the limit from the left and from the right differ. Define:
$$\lim_{x \to a^+} f(x) = L \quad \text{means} \quad \forall \varepsilon > 0,\ \exists \delta > 0: 0 < x - a < \delta \implies |f(x) - L| < \varepsilon.$$
The left-sided limit $\lim_{x \to a^-}$ replaces $0 < x - a < \delta$ with $0 < a - x < \delta$.
The two-sided limit exists if and only if both one-sided limits exist and are equal.
For limits at infinity:
$$\lim_{x \to \infty} f(x) = L \quad \text{means} \quad \forall \varepsilon > 0,\ \exists M > 0: x > M \implies |f(x) - L| < \varepsilon.$$
Here, the “neighborhood” is a half-line rather than an interval. The structure of the game is identical.
Limit Laws
Computing limits via $\varepsilon$-$\delta$ from scratch each time would be tedious. The following laws allow limits to be computed algebraically once the basic limits are established.
Theorem (Algebra of Limits). If $\lim_{x \to a} f(x) = L$ and $\lim_{x \to a} g(x) = M$, then:
- $\lim_{x \to a} [f(x) + g(x)] = L + M$
- $\lim_{x \to a} [f(x) \cdot g(x)] = L \cdot M$
- $\lim_{x \to a} \frac{f(x)}{g(x)} = \frac{L}{M}$, provided $M \neq 0$
- $\lim_{x \to a} [c \cdot f(x)] = c \cdot L$ for any constant $c$
- If $h$ is continuous at $L$, then $\lim_{x \to a} h(f(x)) = h(L)$
Each of these has a proof via the $\varepsilon$-$\delta$ definition. The sum law is the simplest: if you can get $f(x)$ within $\varepsilon/2$ of $L$ and $g(x)$ within $\varepsilon/2$ of $M$, their sum is within $\varepsilon$ of $L + M$ (triangle inequality).
The Squeeze Theorem
Sometimes a function is hard to evaluate directly but can be sandwiched between two simpler functions.
Theorem (Squeeze Theorem). Let $g(x) \leq f(x) \leq h(x)$ for all $x$ near $a$ (except possibly at $a$). If $\lim_{x \to a} g(x) = \lim_{x \to a} h(x) = L$, then $\lim_{x \to a} f(x) = L$.
Proof. Let $\varepsilon > 0$. Since $\lim_{x \to a} g(x) = L$, there exists $\delta_1 > 0$ such that $0 < |x - a| < \delta_1 \implies |g(x) - L| < \varepsilon$. Similarly, there exists $\delta_2 > 0$ for $h$. Let $\delta = \min(\delta_1, \delta_2)$.
For $0 < |x - a| < \delta$:
$$L - \varepsilon < g(x) \leq f(x) \leq h(x) < L + \varepsilon,$$
so $|f(x) - L| < \varepsilon$. $\blacksquare$
Canonical example: $\lim_{x \to 0} x^2 \sin(1/x) = 0$.
Since $|\sin(1/x)| \leq 1$ always, we have $-x^2 \leq x^2 \sin(1/x) \leq x^2$. Both bounds go to $0$ as $x \to 0$, so the squeeze theorem gives the limit.
Continuity
A function can have a limit at $a$ without equaling $f(a)$ there (or $f(a)$ might not exist). Continuity ties the limit to the function value.
Definition. $f$ is continuous at $a$ if:
$$\lim_{x \to a} f(x) = f(a).$$
This implicitly requires three things: $f(a)$ is defined, the limit exists, and they are equal.
Unfolding the limit definition: $f$ is continuous at $a$ if
$$\forall \varepsilon > 0,\ \exists \delta > 0: |x - a| < \delta \implies |f(x) - f(a)| < \varepsilon.$$
(Note: the condition is now $|x - a| < \delta$, not $0 < |x - a| < \delta$, because we include $x = a$.)
$f$ is continuous on an interval if it is continuous at every point of that interval.
Types of Discontinuities
Not all discontinuities are alike. There are three standard types:
Removable discontinuity: The limit exists but either $f(a)$ is undefined or $f(a) \neq \lim_{x \to a} f(x)$. Example: $f(x) = \frac{x^2 - 1}{x - 1}$ at $x = 1$. The limit is $2$; we could “remove” the discontinuity by defining $f(1) = 2$.
Jump discontinuity: Both one-sided limits exist but are unequal. Example: the sign function $\text{sgn}(x)$ at $x = 0$: $\lim_{x \to 0^+} = 1$ and $\lim_{x \to 0^-} = -1$.
Essential discontinuity: At least one one-sided limit fails to exist (is $\pm\infty$ or oscillates without settling). Example: $\sin(1/x)$ at $x = 0$ oscillates infinitely and has no limit.
Uniform Continuity
The continuity definition allows $\delta$ to depend on both $\varepsilon$ and $a$. A stronger property requires $\delta$ to work uniformly across the entire domain.
Definition. $f$ is uniformly continuous on $D$ if:
$$\forall \varepsilon > 0,\ \exists \delta > 0: \forall x, y \in D,\ |x - y| < \delta \implies |f(x) - f(y)| < \varepsilon.$$
The single $\delta$ must work for all points simultaneously. This is strictly stronger: $f(x) = x^2$ is continuous on $\mathbb{R}$ but not uniformly continuous (near large $x$, you need smaller $\delta$ for the same $\varepsilon$). On a closed bounded interval like $[0, 1]$, every continuous function is uniformly continuous - that is the content of the following theorem.
Continuity of Standard Functions
- Polynomials are continuous everywhere (follows from the limit laws).
- Rational functions $p(x)/q(x)$ are continuous wherever $q(x) \neq 0$.
- $\sin x$ and $\cos x$ are continuous everywhere (requires a careful geometric argument showing $|\sin x - \sin a| \leq |x - a|$).
- $e^x$ and $\ln x$ are continuous on their domains.
- Compositions of continuous functions are continuous (from limit law 5 above).
The Intermediate Value Theorem
Theorem (IVT). Let $f$ be continuous on $[a, b]$. If $f(a) \neq f(b)$, then for every value $k$ strictly between $f(a)$ and $f(b)$, there exists $c \in (a, b)$ such that $f(c) = k$.
Informally: a continuous function cannot jump from one value to another without passing through every value in between. You cannot travel from the ground floor to the third floor in a continuous elevator without passing through the second floor.
The proof requires the completeness of $\mathbb{R}$ - on the rational numbers, the IVT fails. (The function $f(x) = x^2 - 2$ is continuous on $[1, 2]$ over $\mathbb{Q}$, satisfies $f(1) < 0 < f(2)$, but has no rational root.)
Applications:
- Proving that $x^5 - x + 1 = 0$ has a root in $[-1, 1]$ (evaluate at endpoints, apply IVT).
- Bisection method for numerical root-finding.
- In topology: IVT is a special case of the connectedness of $[a, b]$ - continuous images of connected sets are connected.
The Extreme Value Theorem
Theorem (EVT). Let $f$ be continuous on a closed, bounded interval $[a, b]$. Then $f$ attains its maximum and minimum on $[a, b]$: there exist $c, d \in [a, b]$ such that $f(c) \leq f(x) \leq f(d)$ for all $x \in [a, b]$.
Two conditions are both necessary:
- Closed interval: $f(x) = x$ on $(0, 1)$ has no maximum.
- Bounded interval: $f(x) = x$ on $[0, \infty)$ has no maximum.
- Continuity: the function must not have essential discontinuities.
The proof uses the Bolzano-Weierstrass theorem: a bounded sequence of reals has a convergent subsequence.
The CS Perspective: Continuity in Optimization
In machine learning, you optimize a loss function $\mathcal{L}(\theta)$ over parameters $\theta$. Several things depend critically on continuity and related properties:
Numerical stability. Floating-point arithmetic does not represent real numbers exactly. A function that changes rapidly (large derivative, or discontinuity nearby) can produce catastrophically wrong outputs when inputs have small rounding errors. Algorithms for numerically computing limits - say, evaluating $\frac{\sin x}{x}$ near $x = 0$ - must use algebraically equivalent forms that avoid catastrophic cancellation.
Optimization landscapes. Gradient descent requires a loss surface that is at least continuous, ideally differentiable, ideally with well-behaved curvature. ReLU networks introduce kinks (non-differentiable points), but they are continuous. The IVT guarantees that if a network’s output is positive somewhere and negative somewhere, it must cross zero - relevant for binary classification.
Lipschitz continuity. A function $f$ is $K$-Lipschitz if $|f(x) - f(y)| \leq K|x - y|$ for all $x, y$. This is uniform continuity with an explicit rate. Lipschitz conditions appear everywhere in ML theory: in convergence proofs for gradient descent, in generalization bounds (via Rademacher complexity), and in defining stable training dynamics. Spectral normalization in GANs is precisely a technique to enforce a Lipschitz constraint on the discriminator.
Floating-point “limits." In finite precision arithmetic, you cannot take limits in the mathematical sense - the process terminates. But the $\varepsilon$-$\delta$ language maps directly to algorithm design: the $\varepsilon$ becomes a numerical tolerance; the $\delta$ becomes a condition on input variation. Understanding what the mathematical limit is tells you what the numerical computation should converge to and how to check whether it has.
With limits and continuity in hand, we have the vocabulary to define the derivative precisely - not as “the slope of the tangent line” (a circular phrase) but as the limit of a ratio of differences.
Read Next: