Calculus was invented twice, independently, within a decade, by two of the most formidable intellects in the history of science. It then spent roughly 150 years on philosophically shaky ground before mathematicians finally gave it the rigorous foundation it deserved. That arc - from intuition to controversy to rigor - is worth tracing, because it is also the arc of understanding that each student recapitulates when learning the subject seriously.

The Problem That Demanded a New Language

Two ancient problems drove the development of calculus, and they look different but turn out to be secretly the same.

The first is the tangent problem: given a curve, what is the slope of the line that just touches it at a point? For a straight line this is trivial. For a curve like $y = x^2$, it is not obvious - a tangent line at the point $(1, 1)$ has some slope, but how do you find it exactly?

The second is the area problem: given a curve, what is the area enclosed beneath it? Rectangles can approximate it. But for an exact answer, you need to push the approximation to some kind of limit.

These problems were not merely theoretical curiosities. They were the mathematical core of questions about planetary motion (Kepler needed areas swept by orbital paths), projectile trajectories (Galileo needed instantaneous velocity), and the behavior of lenses and mirrors. The 17th century had urgent practical reasons to want answers.

Ancient Roots: The Method of Exhaustion

The Greeks were not without resources. Archimedes (c. 287–212 BCE) developed the method of exhaustion - a technique for computing areas and volumes by approximating them with polygons or other known shapes, and proving that the approximation can be made arbitrarily close to the true value.

To compute the area of a circle of radius $r$, Archimedes inscribed and circumscribed regular polygons with increasing numbers of sides. An inscribed $n$-gon is strictly inside the circle; a circumscribed $n$-gon strictly contains it. As $n$ increases, both polygons close in on the circle. Archimedes proved:

$$\frac{223}{71} < \pi < \frac{22}{7}$$

using 96-sided polygons. This is remarkable not for the digits but for the method: he was explicitly computing a limit, pinning a quantity between upper and lower bounds that converge. He lacked the language to say “limit” - that would take two millennia - but the logic is there.

Archimedes also computed the area under a parabolic arc, the volume of a sphere, and the surface area of a sphere. In each case, the method was the same: approximate with simple shapes, bound from above and below, argue that the error can be made as small as desired. This is integration, essentially, without the formalism.

The 17th Century: Two Inventors, One Discovery

By the 1660s and 1670s, mathematics had accumulated enough - analytic geometry from Descartes, algebraic notation from Viète, Fermat’s method of finding maxima - that a full account of tangents and areas was within reach. Two people reached it.

Isaac Newton (1642–1727) developed what he called the method of fluxions between roughly 1665 and 1671, during a period of isolation while Cambridge was closed for plague. Newton thought of variables as fluents - quantities flowing continuously in time - and their rates of change as fluxions, denoted $\dot{x}$. The derivative of $x$ with respect to time was $\dot{x}$; the antiderivative was $\dot{x}$ reversed.

Newton’s approach was geometric and rooted in physical intuition. He was motivated by mechanics and thought of the calculus as a tool for physics, which it immediately became in his hands: the Principia Mathematica (1687) is calculus applied to gravitation, though Newton presented it mostly in geometric form to avoid controversy.

Gottfried Wilhelm Leibniz (1646–1716) arrived independently, working intensively on these problems from around 1673 to 1676. Leibniz was a philosopher and diplomat as well as a mathematician, and he brought to calculus an extraordinary gift for notation. He introduced $dy/dx$ for the derivative - thinking of $dy$ and $dx$ as infinitesimal differences - and the elongated S, $\int$, for the integral, as a sum of infinitely many infinitesimally thin strips. He wrote $\int y , dx$ meaning “sum all the $y \cdot dx$ strips.”

Leibniz published first, in 1684 and 1686. Newton’s calculus work circulated privately and was published much later. Both had priority, and both knew it.

The Priority Dispute

The argument over who invented calculus became one of the most damaging controversies in the history of science. Newton accused Leibniz of plagiarism after seeing early correspondence; Leibniz denied it; their respective supporters took sides; the Royal Society (which Newton effectively controlled) declared for Newton in a report that was, in retrospect, deeply biased.

The historical consensus today is that the discoveries were genuinely independent. Leibniz’s notes show him working out the fundamental ideas himself, with a different approach and different notation. The tragedy is that the dispute poisoned relations between British and Continental mathematics for over a century. British mathematicians stubbornly clung to Newton’s dot notation and fell behind, while Leibniz’s notation spread across Europe and proved vastly more powerful.

Why Leibniz’s Notation Won

This is not a trivial point. Notation shapes thought. Leibniz’s $\frac{dy}{dx}$ notation has concrete advantages:

  1. It looks like a fraction. The chain rule becomes $\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$, which seems to “cancel” like fractions - a heuristic that, while not a proof, guides correct calculation.

  2. The integral and derivative are visually linked. $\int f(x), dx$ - the $dx$ reminds you what variable you integrate over, and it echoes the $dx$ in $\frac{dy}{dx}$.

  3. It generalizes. When calculus extended to multiple variables, Leibniz’s notation extended cleanly: $\frac{\partial f}{\partial x}$, $\iint f , dx , dy$. Newton’s dot notation does not generalize gracefully.

Newton’s $\dot{x}$ is still used in physics, especially for time derivatives. But for mathematical analysis, Leibniz’s notation is standard everywhere.

Berkeley’s Critique: Ghosts of Departed Quantities

In 1734, the philosopher and bishop George Berkeley published The Analyst, a pointed critique of Newtonian calculus directed at scientists who accepted religious mysteries less readily than mathematical ones. His target was the logical status of infinitesimals.

When Newton computed the derivative of $x^2$, he would consider $(x + o)^2 = x^2 + 2xo + o^2$, subtract $x^2$, divide by $o$ to get $2x + o$, and then discard the $o$ term on the grounds that $o$ is “infinitely small.” But Berkeley asked: is $o$ zero or not? If it is zero, you cannot divide by it. If it is not zero, you cannot discard it. He called these discarded quantities “ghosts of departed quantities” - present when convenient, absent when inconvenient.

Berkeley was right that there was a logical gap. Infinitesimals were not numbers in any rigorous sense. They behaved like zero when added but like nonzero when used as denominators. The calculus worked - its predictions were spectacularly accurate - but nobody could say precisely why it worked.

This is not unusual in the history of mathematics. Complex numbers were used for a century before anyone gave them a rigorous definition. Dirac’s delta function was used in physics for decades before Laurent Schwartz made it precise with distribution theory. The practice can outrun the foundations. But eventually the foundations catch up.

The 19th Century: Rigorization

The project of putting calculus on a firm logical footing took most of the 19th century and required building a careful theory of the real numbers themselves.

Augustin-Louis Cauchy (1789–1857) made the first serious attempt at rigorization. He defined the limit not in terms of infinitesimals but in terms of convergence: a sequence $a_n$ converges to $L$ if, for any desired precision, the terms eventually stay within that precision of $L$. He defined continuity, derivatives, and integrals in terms of limits, eliminating infinitesimals from the definitions. His formulations were not yet fully rigorous by modern standards - he still relied on informal geometric intuitions in places - but the structure was there.

Karl Weierstrass (1815–1897) completed the project. His $\varepsilon$-$\delta$ formalism gave a purely algebraic definition of limits that made no appeal to motion, intuition, or infinitesimals:

$$\lim_{x \to a} f(x) = L \quad \text{means} \quad \forall \varepsilon > 0,\ \exists \delta > 0 \text{ such that } 0 < |x - a| < \delta \Rightarrow |f(x) - L| < \varepsilon.$$

This definition is purely logical. Given any challenge ($\varepsilon$ is the challenger’s tolerance), you must produce a response ($\delta$ is your neighborhood). No infinitesimals. No motion. No intuition required - only the ability to verify an inequality.

Weierstrass also produced pathological examples that clarified what the definitions were for. He constructed a function that is continuous everywhere but differentiable nowhere - something that intuition says should be impossible (how can a smooth-looking curve have no tangent anywhere?) but which is perfectly consistent. Without $\varepsilon$-$\delta$ definitions, you cannot even state the result precisely, let alone prove it.

Bernhard Riemann (1826–1866) formalized the integral. The Riemann integral of $f$ over $[a, b]$ is defined as the limit of sums $\sum f(x_i^\ast) \Delta x_i$ as the partition becomes finer, provided this limit exists. This gives a precise meaning to “area under a curve” and specifies exactly which functions are integrable (essentially, those with a “small” set of discontinuities).

Later, Henri Lebesgue (1875–1941) developed a more powerful integral theory that handles many functions the Riemann integral cannot, and which is the standard foundation for modern probability theory and functional analysis.

The Bolzano-Weierstrass theorem - that every bounded sequence of real numbers has a convergent subsequence - turns out to be a key lemma underlying many of the most important theorems of analysis, including the Extreme Value Theorem.

What “Foundations” Mean and Why They Matter

It might seem pedantic to spend decades arguing about whether infinitesimals are numbers. The calculus worked fine without settling the question. So why bother?

The reason is that mathematics without foundations is a collection of recipes. Recipes work until they don’t. The $\varepsilon$-$\delta$ formalism is not just philosophical tidiness - it is what lets you:

  • Know exactly when a theorem applies and when it does not
  • Detect errors that intuition misses (Weierstrass’s nowhere-differentiable function being the canonical example)
  • Extend the results to new settings (infinite-dimensional function spaces, stochastic processes, manifolds)

In machine learning, this matters concretely. The convergence of gradient descent depends on properties like Lipschitz continuity and convexity - both defined in terms of $\varepsilon$-$\delta$ style inequalities. The theory of generalization in learning uses measure theory, which is built on Lebesgue’s integral. If you want to understand why an optimizer converges, or why a network generalizes, you need the foundations.

The Connection to the Real Numbers

Underpinning all of this is the real number system itself. The rational numbers $\mathbb{Q}$ are not enough for calculus: the sequence $3, 3.1, 3.14, 3.141, 3.1415, \ldots$ consists of rationals converging to $\pi$, which is not rational. If your number system has “holes,” limits cannot always be taken.

The real numbers $\mathbb{R}$ are constructed to fill those holes - they form a complete ordered field, meaning every Cauchy sequence converges, and every nonempty set bounded above has a least upper bound (the completeness axiom). This completeness is what makes the Intermediate Value Theorem, the Extreme Value Theorem, and ultimately all of calculus work. The constructions of $\mathbb{R}$ from $\mathbb{Q}$ - via Dedekind cuts or equivalence classes of Cauchy sequences - are among the foundational achievements of 19th century mathematics.


The path from Archimedes' polygons to Weierstrass’s $\varepsilon$-$\delta$ definitions spans two thousand years. What we have now is not just a collection of techniques for computing slopes and areas. It is a precisely defined, fully rigorous theory, built on clear definitions, with theorems that say exactly what they mean and proofs that establish them beyond doubt. That is where we pick up next.


Read Next: