History of Calculus - Two Centuries of Arguments About Infinity // Megha Bose

Helpful context:

Functions & Mappings - One Input, One Output, No Exceptions

There is a kind of mathematics that was not invented so much as discovered - pulled out of the physical world by people who could not bear not knowing. Calculus is that mathematics. It was not created in a vacuum by geniuses with too much free time. It was wrested into existence by the urgency of real questions: How fast is a falling stone moving at this exact instant? What is the area enclosed by this curve? Where will this planet be next week? These questions had been pressing on human civilization for thousands of years before anyone had the tools to answer them.

This post is not primarily about the formulas of calculus. It is about the hunger that produced them.

Part 1 - Before Calculus: Ancient Wrestlings

The Fundamental Human Impulse

Long before any formalism, before any Greek philosopher wrote a single theorem, human beings were trying to measure things that resisted measurement.

Consider the practical problem: you are an Egyptian scribe in 1650 BCE. A farmer wants to know how much grain a cylindrical silo will hold. You can measure the height. You can walk around the base. But the area of a circle - that curved, elusive thing - does not submit to a measuring stick the way a rectangle does. Circles are not rectangles. Curves are not straight lines. And yet they occupy space, and that space can be measured, if only you are clever enough.

The Rhind Mathematical Papyrus, written around 1650 BCE, contains a formula for the area of a circle: take the diameter, subtract one-ninth of it, and square the result. This gives $\left(\frac{8}{9}d\right)^2$, which implies $\pi \approx 3.16$ - accurate to better than one percent. Egyptian scribes did not derive this from first principles. They discovered it by doing, by measuring many circles and noticing the pattern. They did not know why it worked. But they knew it worked.

This is the fundamental psychological impulse that leads to calculus: to measure irregular shapes, you break them into regular pieces you can measure. You approximate. You accumulate. You add up many small familiar things to get one large unfamiliar thing. The Egyptians were doing this intuitively. The Greeks turned it into a method.

1.1 - Archimedes and the Method of Exhaustion (~250 BCE)

Archimedes of Syracuse (c. 287-212 BCE) is one of the most extraordinary scientists who has ever lived. He invented war machines to defend his city. He discovered the principle of the lever. He understood buoyancy. And in his mathematical work, he was, in some sense, doing calculus - nearly two thousand years before Newton or Leibniz were born.

His problem: find the area of a circle.

You cannot measure the area of a circle directly. It has no corners, no straight edges, no natural decomposition into rectangles. But here is what you can do: draw a square inside the circle. Its area is easy to compute. The square is too small - it misses the curved bits. So draw a regular hexagon inside instead. Better. Then a regular 12-gon. Better still. Then a 24-gon. The polygon is getting closer and closer to the circle, “exhausting” the space between polygon and circle. Its area approaches the area of the circle from below.

Now draw a square outside the circle - a circumscribed polygon that contains the circle. This gives an upper bound. Do the same thing: hexagon, 12-gon, 24-gon, 48-gon, 96-gon. The circumscribed polygon’s area shrinks toward the circle’s area from above.

You now have two sequences: one creeping upward toward the circle’s area, one creeping downward. The circle’s area is trapped between them. As you add more sides, the gap between the two sequences becomes as small as you like.

Archimedes carried this out with 96-sided polygons and concluded:

$$\frac{223}{71} < \pi < \frac{22}{7}$$

This is not primarily remarkable for the precision (though the precision is remarkable). What matters is the logic. Archimedes was not guessing. He was proving that the circle’s area could be trapped to any desired accuracy. He was computing a limit - without having the word “limit,” without having the machinery of limits - through pure logical force.

He applied the same method to an astonishing range of problems. He found the area under a parabola by filling it with triangles. He found the volume of a sphere. The surface area of a sphere. The area traced by a spiral. In every case, the strategy was the same: approximate with known shapes, bound from above and below, prove the error can be made arbitrarily small.

This is proto-integration. This is, in spirit if not in formalism, what a modern calculus student does when they write $\int_a^b f(x)dx$ and think about it as the limit of rectangular approximations. Archimedes was doing exactly that, in the only language available to him: the language of ancient Greek geometry.

There is something deeply moving about this. A man in Syracuse, in the third century BCE, sitting with compass and straightedge, was thinking thoughts that would not be fully formalized for another two thousand years. He had the right idea. He lacked the notation.

Discomfort check. You might be wondering: isn’t this cheating somehow? How can you “exhaust” a circle with polygons - there are always curved bits left over. The answer is that Archimedes' method is not cheating; it is precisely the insight that makes it brilliant. The polygons never equal the circle. But the limit of the polygon areas as the number of sides grows without bound is the circle’s area. Archimedes could not say it that way - the formal machinery of limits did not exist. But the logical structure is identical. The 19th century would invent the language; Archimedes had already discovered the idea.

Also in this tradition: Eudoxus of Cnidus (c. 408-355 BCE), who first systematized the “method of exhaustion,” and Bonaventura Cavalieri (1598-1647), who proposed thinking of areas as composed of “indivisibles” - infinitely many infinitely thin lines. Cavalieri’s principle: if two solids have equal cross-sections at every height, they have equal volumes. This is exactly what modern integration formalizes. Cavalieri was slicing. He just could not say precisely what an infinitely thin slice was.

1.2 - Zeno’s Crisis: The Paradoxes of Motion (~450 BCE)

Before we can calculate motion, we must grapple with the fact that motion itself is philosophically bewildering.

Zeno of Elea posed his paradoxes around 450 BCE, and they have never entirely left us. The most famous:

The Arrow. An arrow is flying through the air. At any single instant in time, the arrow occupies a definite position. It is not moving at that instant - it has no time in which to move. It simply is somewhere. But the flight consists of nothing but instants. If the arrow is motionless at every instant, how is it ever moving at all?

Achilles and the Tortoise. Achilles gives a tortoise a head start in a race. Before Achilles reaches where the tortoise started, the tortoise has moved forward. Before Achilles reaches that new position, the tortoise has moved again. At every step, the tortoise has moved. There are infinitely many steps. Can Achilles ever catch up?

These are not mere puzzles. Zeno was pointing at something genuinely difficult: the relationship between the continuous and the discrete, between the infinite and the finite, between an instant and an interval. These are exactly the questions that calculus must answer.

The resolution to Achilles: yes, infinitely many steps can sum to a finite total. $\frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \cdots = 1$. An infinite series can converge. But to say this rigorously - to actually prove that infinitely many positive quantities can sum to something finite - required the theory of limits that would not arrive for another two thousand years.

The resolution to the arrow is deeper still. It requires understanding what velocity is - not as a vague notion of “how fast something is moving,” but as a precise mathematical object: the derivative of position with respect to time. An arrow has a velocity at an instant not because it moves during that instant (instants have no duration) but because its position function has a well-defined slope at that point.

Notice what Zeno’s paradoxes reveal: human beings were already thinking about infinitely small pieces and infinitely many steps in 450 BCE. They could not resolve the questions. But they were asking them. The hunger was there two thousand years before the tools.

1.3 - The Accumulation Impulse

There is a kind of practical wisdom that accumulates in any civilization dealing with land, water, grain, and tax. Long before anyone wrote down a theorem, builders and surveyors were using intuitive versions of integration.

The key insight - one so natural that children grasp it without being taught - is this: if you want to know a whole, you can add up its parts. The area of a field can be estimated by dividing it into strips and measuring each strip. The volume of a jug can be estimated by pouring it out into smaller known containers. The distance traveled by a moving object can be estimated by adding up many small distances.

This is exactly what a Riemann sum does. When a student in a calculus class approximates $\int_0^1 x^2dx$ by dividing the interval into $n$ strips and summing $f(x_i)\Delta x$, they are doing what Egyptian land surveyors did empirically - just with the additional clarity of knowing what happens as the strips get thinner.

The genius of calculus is not in inventing this impulse. It is in making it precise: asking exactly what happens in the limit, and proving that the limit exists and gives the right answer.

Part 2 - The 17th Century: The Problems Reach Crisis

2.1 - A New Kind of Problem

By the early 1600s, something had changed in European science. Galileo had shown that falling objects accelerate uniformly - their positions trace a parabola, their velocities grow linearly. Kepler had described the planets' elliptical orbits mathematically. Descartes had invented coordinate geometry, making it possible to describe curves algebraically. The machinery was assembling.

But there was a gap. A very large gap.

Everyone could now write down the position of a moving object as a function of time. Galileo knew that a ball dropped from rest falls $s = \frac{1}{2}g t^2$ meters in $t$ seconds. But how fast is it moving at the exact instant $t = 3$? Not on average over some interval - at that specific instant?

And the reverse: if you know the velocity of an object at every instant, how do you find how far it has traveled? These seemed like distinct questions. No one suspected they were secretly the same question.

There was also the purely geometric problem: given a curve like $y = x^3$, what is the slope of the line tangent to the curve at the point $(2, 8)$? The tangent line grazes the curve at one point. Its slope describes the curve’s “steepness” at that point. Finding it seemed to require dividing by zero.

These problems converged in the 17th century with the force of inevitability. The scientific revolution had made them urgent. The mathematical machinery had made them tractable. Someone was going to solve them.

2.2 - Fermat and the Tangent Line (1630s)

Pierre de Fermat (1607-1665) is best known today for a margin note about a theorem he claimed to have proved but didn’t write down. But in his own time, he was one of the most formidable mathematicians in Europe - and in the 1630s, he developed a method for finding maxima and minima of curves that was, in retrospect, the first recognizable appearance of the derivative.

His reasoning: at a maximum or minimum of a curve, the tangent line is horizontal. Its slope is zero. So if you can find the slope of the tangent, you can find the extremes.

His method: suppose you want the slope of the tangent to $f(x) = x^2$ at the point $(x, x^2)$. Take a nearby point: $(x + e, (x+e)^2)$. Compute the slope of the secant line through these two points:

$$\frac{(x+e)^2 - x^2}{e} = \frac{x^2 + 2xe + e^2 - x^2}{e} = 2x + e$$

Now - and here is the philosophically dubious step that Fermat performed without apology - set $e = 0$. The slope is $2x$.

Fermat had no rigorous justification for this. He divided by $e$ (treating it as nonzero), then set $e = 0$ (treating it as zero). The same $e$, playing two incompatible roles. And yet the answer is correct. The derivative of $x^2$ is $2x$, just as any calculus student learns today.

Fermat could not explain why this worked. He did not have the concept of a limit. He had an excellent intuition and a willingness to follow it. The rigor would come two centuries later.

Discomfort check. This procedure - dividing by $e$, then setting $e = 0$ - should make you uncomfortable. It is logically suspicious. You cannot divide by zero, so $e$ must be nonzero when you divide. But then you cannot set it to zero afterward. This tension is real, and it is exactly what Bishop Berkeley would later attack with devastating precision. The resolution - the concept of a limit - was not available to Fermat. The 17th century proceeded by ignoring the tension and getting correct answers. The 19th century came back and fixed the foundations. This is normal in mathematics. The practice often precedes the rigor by decades or centuries.

2.3 - Newton: Motion, Fluxions, and the Language of the Universe (1665-1666)

In the summer of 1665, the Great Plague came to Cambridge. The university closed. Isaac Newton, 22 years old, returned to his mother’s farm at Woolsthorpe. Over the next two years - working in near-total isolation - he essentially invented calculus, discovered the laws of universal gravitation, and laid the foundations of optics.

It is the most productive two years in the history of science.

Newton’s calculus was rooted in physics from the start. He thought about quantities that flow - position, temperature, density - and he called them fluents. The rate at which a fluent changes, he called its fluxion, written $\dot{x}$. Position is a fluent; velocity is its fluxion. Velocity is a fluent; acceleration is its fluxion.

To find the fluxion of $x^2$, Newton reasoned as follows. In a “moment” of time $o$ (a very small quantity), $x$ becomes $x + \dot{x} o$. So $x^2$ becomes $(x + \dot{x} o)^2 = x^2 + 2x\dot{x} o + (\dot{x} o)^2$. The change in $x^2$ is $2x\dot{x} o + (\dot{x} o)^2$. Divide by $o$: the rate of change is $2x\dot{x} + \dot{x}^2 o$. Now let $o$ “vanish” - the term with $o$ disappears, leaving $2x\dot{x}$.

This is exactly Fermat’s procedure, dressed differently. $o$ plays the role of $e$: it starts nonzero (to permit division) and ends zero (to get the final answer). The logical gap is identical.

But Newton did something neither Fermat nor anyone before him had done: he connected the tangent problem to the area problem.

The Fundamental Theorem of Calculus - the central discovery of the entire subject - is this:

Finding the area under a curve and finding the slope of the tangent to a curve are inverse operations.

Differentiation undoes integration. Integration undoes differentiation. These two ancient problems, which had seemed completely unrelated, were secretly mirrors of each other.

Newton could prove this. He used it immediately, to devastating effect. In the Principia Mathematica (1687), Newton used his calculus to derive Kepler’s laws of planetary motion from his law of gravitation. He showed that an inverse-square gravitational force produces exactly elliptical orbits. He calculated the tides, the shape of the Earth, the precession of the equinoxes. Calculus was not abstract mathematics - it was the language in which the universe was written.

Newton did not publish his calculus methods for decades. He shared them in letters and private manuscripts, but he held back from publication. His reasons were partly caution, partly discomfort with the philosophical foundations he knew were shaky. This delay would have consequences.

2.4 - Leibniz: The Sum of Infinitely Many Infinitely Thin Strips (1674-1684)

Gottfried Wilhelm Leibniz (1646-1716) was not a physicist. He was a philosopher, a diplomat, a theologian, and one of the most intellectually ambitious people who ever lived. He wanted to build a universal logical calculus - a system in which all philosophical disputes could be resolved by calculation. He never achieved this dream. But in pursuing it, he invented calculus.

Leibniz came to the problems of tangents and areas from a different direction than Newton, and - crucially - with a different gift: an extraordinary instinct for notation.

His approach to integration was deeply geometric and intuitive. He imagined the area under a curve as the sum of infinitely many infinitely thin vertical strips, each of width $dx$ and height $f(x)$. The area of each strip is $f(x)dx$. The total area is their sum. He wrote this sum with an elongated S - the integral sign $\int$ - standing for the Latin summa, meaning “sum.”

$$\int_a^b f(x)dx$$

This is not just notation. It is a picture. The $\int$ says “sum up.” The $f(x)$ says “of these heights.” The $dx$ says “over infinitely thin widths.” The entire expression is a direct transcription of the geometric idea.

For derivatives, Leibniz wrote $\frac{dy}{dx}$ - the ratio of an infinitely small change in $y$ to an infinitely small change in $x$. Again, this is a picture. The derivative is the slope of the tangent; slope is rise over run; $dy$ is the infinitely small rise; $dx$ is the infinitely small run.

Leibniz published his work in 1684 and 1686. Newton’s work had been circulating privately since the 1660s but was not published until much later. Both men had developed calculus independently. Modern historians are confident of this: Leibniz’s notebooks show him working out the ideas himself, in his own language, with his own approach.

What followed was one of the most bitter controversies in the history of science.

2.5 - The Priority Dispute: A Tragedy in Two Nations

The Newton-Leibniz priority dispute was not a polite academic disagreement. It was a war.

Newton accused Leibniz of plagiarism. Leibniz denied it. The Royal Society, which Newton effectively controlled, commissioned an investigation and - in a report that Newton himself largely wrote - declared for Newton. Continental mathematicians, who had been working with Leibniz’s methods and notation, were outraged. The two camps stopped talking to each other.

The tragedy had a concrete cost. British mathematicians, loyal to Newton, continued using his dot notation for another 150 years. Leibniz’s notation - $\frac{dy}{dx}$, $\int fdx$, $\frac{d^2y}{dx^2}$ - spread across continental Europe and proved vastly more powerful. French mathematicians like Cauchy, Laplace, and Lagrange built the next century of mathematics in Leibniz’s language. British mathematics stagnated.

The notation mattered. When you write $\frac{dy}{dx}$, the chain rule looks like:

$$\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$$

This is not a proof - you cannot literally cancel the $du$’s like fractions - but it is an extraordinarily good mnemonic that guides correct calculation. Newton’s dot notation offers no such guidance for multivariable and compositional problems.

Leibniz won the notation war. We use his notation today. The $\int$ sign, the $dx$, the $\frac{d}{dx}$ - these are Leibniz’s. Every calculus student inherits them.

2.6 - The Ghost Problem: What Are Infinitesimals?

Both Newton and Leibniz used “infinitely small” quantities freely. Newton called them “moments” or “evanescent quantities.” Leibniz called them $dx$ and $dy$. Both men knew, in their private moments, that there was something philosophically slippery about these objects. Both pressed on anyway, because the answers they got were correct.

In 1734, the philosopher and Anglican bishop George Berkeley published The Analyst: A Discourse Addressed to an Infidel Mathematician. It was ostensibly a defense of Christian faith against scientists who accepted religious mysteries while scorning mathematical ones. But the mathematical content was devastating.

Berkeley targeted the logical structure of Newton’s method directly. When Newton computed the derivative of $x^2$:

He expanded $(x + o)^2 = x^2 + 2xo + o^2$
He divided by $o$ to get $2x + o$
He then set $o = 0$ to obtain $2x$

Berkeley asked: is $o$ zero or is it not? If it is zero, you cannot divide by it in step 2 - division by zero is meaningless. If it is not zero, you cannot discard it in step 3 - you have simply made an error. He wrote:

“And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities, nor Quantities infinitely small, nor yet nothing. May we not call them the Ghosts of departed Quantities?"

“Ghosts of departed quantities.” It is one of the best phrases in the history of mathematical criticism. Present when needed, absent when inconvenient. Berkeley was right. The foundations of calculus were, as of 1734, not rigorous. The answers were correct - spectacularly, reliably correct - but the logical justification was a fiction.

This was not unusual. Complex numbers were used productively for a century before anyone gave them a rigorous definition. Dirac’s delta function was used by physicists for decades before Laurent Schwartz put it on firm ground in the 1940s. Mathematics often practices in advance of foundations. But eventually the foundations must be built.

The foundations of calculus would take another century to arrive.

Part 3 - Making It Rigorous: The 19th Century

3.1 - Cauchy and the End of Infinitesimals (1820s)

Augustin-Louis Cauchy (1789-1857) was a prolific and sometimes difficult man - he drove colleagues to despair with his output - but he did something no one before him had managed: he gave calculus a foundation that did not rest on infinitesimals.

His key move was to define limits without appealing to “infinitely small” or “approaching zero in the usual sense.” Instead, he described convergence in terms of ordinary arithmetic inequalities.

A sequence $a_n$ converges to $L$, Cauchy said, if: for any tolerance you name, the terms of the sequence eventually stay within that tolerance of $L$. Not infinitely small tolerance - any tolerance, however small. You name it; the sequence eventually satisfies it.

This is a different kind of statement. It contains no infinite or infinitesimal quantities. It makes a claim about ordinary inequalities that can be verified by ordinary arithmetic.

Cauchy then defined derivatives and integrals in terms of limits. The derivative of $f$ at $x$ is the limit of the difference quotient:

$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

There are no infinitesimals here. $h$ is an ordinary real number that we allow to approach zero. The limit - defined precisely as Cauchy defined it - either exists or it doesn’t.

The integral he defined as the limit of Riemann-style sums as the partition becomes finer. Again, no infinitesimals - just limits.

Cauchy’s work was not yet perfectly rigorous by later standards; he still relied on some geometric intuitions. But the architecture was there. The ghost of departed quantities had been exorcised.

Discomfort check. The $\varepsilon$-$\delta$ definition of a limit, which you will encounter in the next post, is famous for being difficult to first encounter. Here is a reframe. The definition is Cauchy’s answer to Berkeley’s challenge. Berkeley said: you cannot have a quantity that is “almost zero but not zero” - that is incoherent. Cauchy replied: you are right, I will not use such quantities. Instead, I will say this - and then wrote the $\varepsilon$-$\delta$ definition. The abstraction is not arbitrary bureaucracy. It is the precise minimum required to make the logic airtight. Every quantifier is there to close a specific loophole. When you feel the discomfort of $\forall \varepsilon > 0$, $\exists \delta > 0$, you are feeling the pressure of Berkeley’s challenge being answered one word at a time.

Karl Weierstrass (1815-1897) refined Cauchy’s work into the completely rigorous $\varepsilon$-$\delta$ formalism that appears in every analysis textbook today. He also constructed pathological examples that proved why the rigor was necessary - functions continuous everywhere but differentiable nowhere, limits that behaved strangely at boundaries. Without precise definitions, you cannot even state these results. With them, you can prove them.

Bernhard Riemann (1826-1866) formalized the integral: the Riemann integral of $f$ on $[a,b]$ is the limit of sums $\sum f(x_i^*)\Delta x_i$ as the partition is refined, provided this limit exists. This gives a precise definition of “area under a curve” and specifies exactly which functions are integrable.

3.2 - Vindication: Non-Standard Analysis (1960s)

Here is a remarkable coda.

Abraham Robinson (1918-1974), a mathematician and logician, proved in the 1960s that infinitesimals can be made rigorous - but it requires tools from mathematical logic that did not exist in the 17th or even 19th century. His non-standard analysis constructs a number system (the hyperreals) that contains genuine infinitely small quantities - numbers larger than zero but smaller than any positive real number.

In this framework, when Newton wrote $dx$ and meant “an infinitely small change in $x$,” he was not, in principle, talking nonsense. He was talking about a hyperreal. The problem was not that infinitesimals are incoherent - they are not. The problem was that the 17th century lacked the logical machinery to construct them rigorously.

So Cauchy and Weierstrass were not proving that Leibniz was wrong to think about infinitely small slices. They were providing one rigorous route through the conceptual landscape. Robinson showed there was another route, one that vindicated the original intuition more directly.

The lesson: mathematical intuition is often right, even when it runs ahead of the rigor. The work of the 19th and 20th centuries was not to correct Newton and Leibniz’s instincts - those instincts were largely sound - but to build foundations solid enough to support them.

Part 4 - Why This History Matters for Learning Calculus

Your Confusion Is 2,000 Years Old

Here is something worth knowing: every confusion you will feel learning calculus is a confusion that brilliant people felt before you, sometimes for centuries.

When limits feel arbitrary - that is Fermat’s unease with setting $e = 0$ after dividing by it. The discomfort is appropriate; the procedure is genuinely subtle.

When infinitesimals feel incoherent - that is Bishop Berkeley’s challenge. It took the entire 19th century to resolve. You are in good company.

When $\varepsilon$-$\delta$ definitions feel like bureaucratic overkill - that is the feeling of everyone who first encounters Cauchy’s rigorization of something that seemed to be working fine. The bureaucracy is there because the foundations without it were, as Berkeley correctly noted, a logical mess.

None of these confusions are signs that you are bad at mathematics. They are signs that you are paying attention to the right things.

What the Formulas Actually Mean

When you see:

$$\lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

you are looking at what Fermat did in the 1630s and Newton did in 1665, translated into Cauchy’s language. The fraction is Fermat’s secant-line slope. The limit is Cauchy’s rigorous replacement for “set $h = 0$ at the end.” The whole expression is Newton’s fluxion, made precise.

When you see:

$$\int_a^b f(x)dx$$

you are looking at what Archimedes did with polygons in 250 BCE and Leibniz did with infinitely thin strips in 1674, translated into Riemann’s language. The $\int$ is Leibniz’s elongated S for “sum.” The $dx$ is Leibniz’s infinitely thin strip width, made rigorous by Riemann’s partition limit. The whole expression is Archimedes' method of exhaustion, made precise.

When you see the Fundamental Theorem of Calculus:

$$\frac{d}{dx} \int_a^x f(t) dt = f(x)$$

you are seeing Newton’s great discovery: that the tangent problem and the area problem are inverses. This was not obvious. This was not inevitable. This is the insight that unified two millennia of separate struggles into a single subject.

The ε-δ Definitions Are Not the Enemy

There is a temptation, when first encountering $\varepsilon$-$\delta$ proofs, to view them as pedantic gatekeeping - unnecessary formalism that obscures clear ideas. This is a mistake.

The formalism is Cauchy’s answer to Berkeley’s challenge. Without it, calculus is a collection of procedures that happen to give correct answers for reasons that nobody can clearly state. With it, calculus is a subject where every theorem means exactly what it says, where you know exactly when a result applies and when it does not, where the foundations support the entire structure without hidden cracks.

In machine learning, this matters concretely. The convergence of gradient descent depends on Lipschitz continuity - an $\varepsilon$-$\delta$ style condition. The theory of generalization in learning uses measure theory, which is built on Lebesgue’s integral. If you want to understand why an optimization algorithm converges, or why a neural network generalizes, you need the foundations that the 19th century worked so hard to build.

Part 5 - What Is Coming Next

The next several posts trace the formal development that this historical arc was building toward.

Limits are first - Cauchy’s great achievement, the replacement for infinitesimals. You will see the $\varepsilon$-$\delta$ definition and learn to use it. It will be uncomfortable at first. That is appropriate; it took a century of effort to arrive at.

Derivatives come next - Fermat’s and Newton’s problem made precise. You will see the limit definition of the derivative, understand it geometrically, and develop the rules (chain rule, product rule, etc.) that make it computable.

Integration follows - Archimedes' idea, formalized by Riemann. You will see what a Riemann sum is and understand the definition of the definite integral as a limit.

The Fundamental Theorem of Calculus ties it together - Newton’s unifying discovery that differentiation and integration are inverse operations.

Through all of this: remember where these concepts came from. They are not abstract game-pieces invented by bored professors. They are the distillation of two thousand years of human beings trying to answer questions about motion, area, and change. They are the language the physical world writes itself in. When they feel difficult, remember that the difficulty is real, and that every great mind that encountered these ideas found them difficult, and that the difficulty is worth enduring.

Archimedes, exhausting a circle with polygons in a Sicilian workshop, had no way of knowing that his method would become the foundation of the technology behind every digital image, every climate simulation, every medical scanner in existence. He was just trying to find the area of a circle. That turned out to be enough.

Read next: