Helpful context:


Here is a machine. You put a number in. The machine does something to it. Out comes exactly one number.

You put in $3$. Out comes $9$. You put in $-3$. Out comes $9$. You put in $5$. Out comes $25$. You put in $0$. Out comes $0$.

The machine is squaring. For every input, there is exactly one output. There is no input you can give this machine that produces two different answers, or no answer at all. That reliability - one input, always exactly one output - is the entire essence of a function.

Everything else in this post is an elaboration of that sentence.

But do not let the simplicity fool you. That one sentence, made precise, is the foundation on which all of calculus is built. Every major idea in calculus - how fast something changes, what it accumulates over an interval, what value it approaches at a point - only makes sense when you know exactly what kind of object you are working with. None of those ideas make sense without a precise definition of what $f$ is.

This post builds that foundation. It will not feel glamorous. It will feel like definitions, like careful distinctions between things that seem almost identical. But every distinction made here will pay off later, in situations where whether an equation has a solution comes down to a structural property of a function.


Section 1: What Makes Something a Function?

Before any definitions, let us examine some examples and ask: is this a function?

Example A. You are given a person’s name. The output is their phone number.

This is NOT a function. One person can have multiple phone numbers (a work number and a cell number). The same input produces multiple outputs. Functions do not allow this.

Example B. You are given a phone number. The output is the person it belongs to.

This also fails to be a function - for different reasons. Multiple people can share a phone number (a family plan). Again, one input, multiple outputs. Forbidden.

Example C. You are given a day of the year. The output is the high temperature in Delhi on that day.

This IS a function. For any specific day of a specific year, the high temperature was exactly one number. There is no ambiguity. (The temperature might be hard to measure precisely, but conceptually, it was one value.)

Example D. You are given a positive real number $x$. The output is a number $y$ satisfying $y^2 = x$.

This is NOT a function. Every positive $x$ has two square roots: $\sqrt{x}$ and $-\sqrt{x}$. If $x = 9$, the output could be $3$ or $-3$. One input, two possible outputs. Not a function.

But here is something important: you can FIX example D by making a choice. If you declare that the output is always the positive square root, then for every positive $x$ there is exactly one output. Now it is a function. We call it $f(x) = \sqrt{x}$ and we mean the positive root. The choice we made - restricting to positive outputs - is baked into the definition of $f$.

This fixing-by-choosing is not a trick. It is a genuine mathematical action: restricting which outputs are allowed. We will see this again when we study inverse trig functions, where similar choices must be made deliberately.

The key test. Look at the input-output pairs. If any input appears with two different outputs, it is not a function. That is the only rule.


Section 2: The Formal Definition

We have enough intuition now to absorb a formal definition.

Definition. A function $f$ from a set $A$ to a set $B$, written $f: A \to B$, is a rule that assigns to each element of $A$ exactly one element of $B$.

The set $A$ is called the domain of $f$. The set $B$ is called the codomain of $f$.

For each $x \in A$, the element of $B$ assigned to $x$ is written $f(x)$ (read: “$f$ of $x$") and called the image of $x$ under $f$.

Let us unpack each piece.

The domain. This is the set of valid inputs. The function $f(x) = \sqrt{x}$ on the real numbers has domain $[0, \infty)$ because you cannot take the square root of a negative real number and get a real output. The function $g(x) = \frac{1}{x}$ has domain $\mathbb{R} \setminus \{0\}$ - all reals except zero - because division by zero is undefined.

The domain is not optional decoration. It is part of the definition of the function. Two functions with the same formula but different domains are different functions.

The codomain. This is the set that outputs are declared to live in. It is an upper bound, a container. The codomain of $f(x) = x^2$ might be declared to be $\mathbb{R}$ even though the outputs are never negative. The codomain is where you say the outputs live; it does not have to be exactly the set of all outputs.

The range (or image). The range of $f$ is the set of elements of $B$ that are actually hit by something in $A$:

$$\text{range}(f) = \{f(x) : x \in A\} = \{b \in B : b = f(x) \text{ for some } x \in A\}.$$

For $f(x) = x^2$ with domain $\mathbb{R}$ and codomain $\mathbb{R}$, the range is $[0, \infty)$ - only nonnegative reals are ever produced.

Discomfort check. Why are codomain and range different things? Why not just say the codomain is the set of actual outputs?

Because sometimes you want to declare where outputs live before you know exactly which outputs appear. When you write $f: \mathbb{R} \to \mathbb{R}$, you are saying both inputs and outputs are real numbers - a type declaration. The codomain is part of the function’s type signature. The range is a computed property of the function. Both matter in different situations. In the context of inverse functions and surjectivity (Section 5), the difference becomes critical.


Section 3: Visualizing Functions

There are two useful ways to visualize a function.

Arrow diagrams. Draw the domain elements on the left and codomain elements on the right. Each element on the left gets exactly one arrow pointing to its image on the right.

Domain A Codomain B 1 2 3 4 1 4 9 16 f(x) = x² - every input has exactly one arrow

The function condition is: every element on the left has exactly one arrow coming out of it. Not zero arrows (every input must have an output). Not two arrows (the output must be unique).

Graphs. For functions $f: \mathbb{R} \to \mathbb{R}$, draw the set of points $\{(x, f(x)) : x \in \mathbb{R}\}$ in the plane.

The vertical line test follows immediately from the definition: a curve in the plane is the graph of a function if and only if every vertical line $x = c$ intersects the curve at most once. A vertical line $x = c$ represents the input $c$; if it crosses the curve twice, there are two outputs for that input, violating the function property.

f(x) = x² passes the vertical line test: every vertical line meets the curve exactly once.

A unit circle fails the vertical line test: the line x = 0 meets it at two points, (0,1) and (0,−1). So a circle is not a function.


Section 4: Injective Functions (One-to-One)

Now we get to the first important structural property a function can have.

Definition. A function $f: A \to B$ is injective (or one-to-one) if distinct inputs produce distinct outputs:

$$x_1 \neq x_2 \implies f(x_1) \neq f(x_2).$$

Equivalently (the contrapositive, often more useful in proofs): $f(x_1) = f(x_2) \implies x_1 = x_2$.

An injective function never collides. Different inputs always produce different outputs. You can always trace back from an output to its unique input.

Examples.

  • $f(x) = 2x$ on $\mathbb{R}$ is injective. If $2x_1 = 2x_2$, then $x_1 = x_2$. Different inputs produce different outputs. The doubling machine is collision-free.

  • $f(x) = x^2$ on $\mathbb{R}$ is NOT injective. The inputs $3$ and $-3$ both produce output $9$. Collision.

  • $f(x) = x^2$ on $[0, \infty)$ IS injective. On the nonnegative reals, no two different inputs produce the same square.

This illustrates a critical point: injectivity depends on the domain. By restricting the domain, you can make a non-injective function injective. This is not a trick - it is a fundamental technique. Inverse trig functions are defined precisely by this restriction.

The horizontal line test. For $f: \mathbb{R} \to \mathbb{R}$, $f$ is injective if and only if every horizontal line $y = c$ intersects the graph at most once. If a horizontal line hits the graph twice, those two intersection points are two inputs with the same output - a collision.

f(x) = x² on all of ℝ: NOT injective. The horizontal line y = 9 meets the curve at x = −3 and x = 3 - two different inputs, same output.

f(x) = x² on [0, ∞): injective. The same horizontal line y = 9 now meets the curve only at x = 3. Restricting the domain eliminates the collision.


Section 5: Surjective Functions (Onto)

Definition. A function $f: A \to B$ is surjective (or onto) if every element of the codomain is hit by at least one element of the domain:

$$\forall b \in B,; \exists a \in A \text{ such that } f(a) = b.$$

A surjective function exhausts its codomain. No element of $B$ is left unreached. The range equals the codomain.

Examples.

  • $f(x) = x^2$ with domain $\mathbb{R}$ and codomain $\mathbb{R}$ is NOT surjective. The element $-1$ in the codomain is never produced - you cannot square a real number and get $-1$.

  • $f(x) = x^2$ with domain $\mathbb{R}$ and codomain $[0, \infty)$ IS surjective. Every nonneg real is the square of something. The range now equals the codomain.

  • $f(x) = 2x + 1$ with domain and codomain $\mathbb{R}$ is surjective. Given any $b \in \mathbb{R}$, the input $x = (b-1)/2$ gives $f(x) = b$.

This illustrates the other critical point: surjectivity depends on the codomain. You can make a non-surjective function surjective by shrinking the codomain to match the range. This is why the distinction between range and codomain matters.

Discomfort check. You might be wondering: if you can always make a function surjective just by shrinking the codomain, why bother with the concept? The answer is that in mathematics you often specify the codomain before you know the range. A function $f: \mathbb{R}^n \to \mathbb{R}^m$ is declared to map $\mathbb{R}^n$ into $\mathbb{R}^m$ - that is the type contract. Asking whether it is surjective is asking whether it fills all of $\mathbb{R}^m$ or only a strict subspace. In linear algebra, this becomes the question of whether a matrix transformation’s image equals the whole target space. In systems of equations, it becomes the question of whether a solution exists for every right-hand side.


Section 6: Bijective Functions and Why They Matter Most

Definition. A function is bijective (or a bijection or a one-to-one correspondence) if it is both injective and surjective.

A bijection is a perfect pairing. Every element of $A$ is matched with exactly one element of $B$, and every element of $B$ is matched with exactly one element of $A$. No element on either side is left over or paired with more than one partner.

This might sound like a technicality. It is not. Bijections are the reason we can say two sets have the same size.

If you have a room full of chairs and people, you do not need to count either to know whether there are the same number. Just ask each person to sit down. If every chair is occupied and every person is seated, there is a bijection between people and chairs, and the sets are the same size. If some chairs are empty, more chairs than people. If someone is standing, more people than chairs.

Cantor used exactly this idea to define what it means for infinite sets to have the same cardinality. The natural numbers $\mathbb{N}$ and the integers $\mathbb{Z}$ have the same cardinality because there is a bijection between them:

$$0 \leftrightarrow 0,\quad 1 \leftrightarrow 1,\quad 2 \leftrightarrow -1,\quad 3 \leftrightarrow 2,\quad 4 \leftrightarrow -2,; \ldots$$

Every integer is paired with exactly one natural number, and vice versa. This is genuinely surprising: $\mathbb{Z}$ looks bigger than $\mathbb{N}$ (it has negative numbers) but they are the same infinite size.

The real numbers $\mathbb{R}$, on the other hand, are strictly larger than $\mathbb{N}$. No bijection can exist. Cantor proved this with his diagonal argument. The concept of different “sizes” of infinity - which feels absurd until you think carefully about what bijection means - rests entirely on this function-theoretic foundation.

For us, the immediate payoff of bijections is inverses.


Section 7: Inverse Functions

If $f: A \to B$ is a bijection, then for every $b \in B$, there is exactly one $a \in A$ with $f(a) = b$. This lets us define a new function $f^{-1}: B \to A$ by:

$$f^{-1}(b) = \text{the unique } a \in A \text{ such that } f(a) = b.$$

This is the inverse function of $f$. It runs the machine in reverse.

Why bijection is required. Consider what goes wrong otherwise.

  • If $f$ is not injective: two inputs $a_1 \neq a_2$ map to the same output $b$. When we try to define $f^{-1}(b)$, we have two choices. $f^{-1}$ is not a function.

  • If $f$ is not surjective: some $b \in B$ is not hit by anything. $f^{-1}(b)$ has no value. $f^{-1}$ is not defined on all of $B$.

So both conditions are necessary. Bijection is not extra requirement - it is exactly the right requirement for inverses to exist.

Verification. If $f^{-1}$ is the inverse of $f$, then:

$$f^{-1}(f(a)) = a \quad \text{for all } a \in A,$$ $$f(f^{-1}(b)) = b \quad \text{for all } b \in B.$$

These two equations say: composing $f$ and $f^{-1}$ in either order gives the identity. The identity function on a set $S$ is the function $\text{id}_S(x) = x$ that leaves every element unchanged.

Examples.

$f(x) = 2x + 1$ on $\mathbb{R}$ is a bijection. Its inverse: solve $y = 2x + 1$ for $x$, giving $x = (y-1)/2$. So $f^{-1}(y) = (y-1)/2$.

$f(x) = e^x$ on $\mathbb{R}$ maps onto $(0, \infty)$ (the range is all positive reals). It is injective (exponential is strictly increasing). Restricted to this range, it is a bijection $\mathbb{R} \to (0, \infty)$. Its inverse is the natural logarithm: $f^{-1}(y) = \ln y$, defined on $(0, \infty)$.

$f(x) = \sin x$ on $\mathbb{R}$ is NOT injective (it repeats every $2\pi$) and NOT surjective onto $\mathbb{R}$ (outputs are between $-1$ and $1$). To define $\arcsin$, we restrict the domain to $[-\pi/2, \pi/2]$, on which $\sin$ is bijective onto $[-1, 1]$. Then $\arcsin: [-1,1] \to [-\pi/2, \pi/2]$ is well-defined.

e^x (blue) and ln(x) (orange) are inverses: each is the reflection of the other across y = x (gray). Every point (a, b) on one curve appears as (b, a) on the other.

The choices we make when restricting domains to build inverses are not arbitrary. They are chosen by mathematical convention to produce the most useful inverse. The choice $[-\pi/2, \pi/2]$ for arcsine makes the inverse continuous. Any other restricted interval would give a different function. Understanding which choice was made - and why - is part of understanding the function.

Discomfort check. Notation trap: $f^{-1}$ means inverse function. It does NOT mean $1/f(x)$. These are completely different things. The inverse of $f(x) = 2x$ is $f^{-1}(x) = x/2$ (running the machine backwards). The reciprocal is $(f(x))^{-1} = 1/(2x)$ (taking the reciprocal of the output). These agree only by accident if at all. For trigonometric functions: $\sin^{-1}(x) = \arcsin(x)$ is the inverse function. It is not $1/\sin(x)$ (which is $\csc(x)$). Context always determines which meaning applies, but in calculus, $f^{-1}$ almost always means inverse function.


Section 8: Composition of Functions

You have two machines. The output of the first machine feeds directly into the second machine as input. This is composition.

Definition. Given $f: A \to B$ and $g: B \to C$, the composition $g \circ f: A \to C$ is defined by:

$$(g \circ f)(x) = g(f(x)).$$

You apply $f$ first, then $g$.

x f(x) g(f(x)) f g A B C f g g ∘ f

The order matters: $g \circ f$ means “first $f$, then $g$.” Reading left-to-right, you do the rightmost function first. This is a common source of confusion.

Example. Let $f(x) = x^2$ and $g(x) = \sin x$.

$(g \circ f)(x) = g(f(x)) = g(x^2) = \sin(x^2).$ $(f \circ g)(x) = f(g(x)) = f(\sin x) = (\sin x)^2 = \sin^2 x.$

These are completely different functions. $\sin(x^2) \neq \sin^2(x)$. Composition is generally not commutative.

Composition and inverses. If $f: A \to B$ is a bijection with inverse $f^{-1}: B \to A$, then:

$$f^{-1} \circ f = \text{id}_A, \quad f \circ f^{-1} = \text{id}_B.$$

This is not just a formula. It is the definition of what inverse means: composing $f$ and its inverse (in either order) does nothing, leaves everything in place.

Associativity. Composition is associative: $(h \circ g) \circ f = h \circ (g \circ f)$. You can check this: both sides compute $h(g(f(x)))$. This means you can drop parentheses when composing three or more functions - the order of operations is the left-to-right order of application, but associativity means how you group them does not matter.


Section 9: Five Function Families

The major function families that come up everywhere in calculus and analysis.

Polynomials

A polynomial of degree $n$ is:

$$p(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0, \quad a_n \neq 0.$$

Domain: all of $\mathbb{R}$. No gaps, no jumps, no sharp corners - polynomials are the most well-behaved functions there are.

The degree controls the behavior: for large $|x|$, $p(x) \approx a_n x^n$. A degree-5 polynomial can cross the $x$-axis up to 5 times (5 real roots, counted with multiplicity). A degree-2 polynomial (parabola) has at most 2 real roots.

Why they matter: polynomials are the simplest functions to work with, and many complicated functions can be closely approximated by polynomials near any given point.

Polynomials of degrees 1 through 4 near the origin. Low degree: gentle curves. Higher degree: sharper turns, more crossings possible.

Exponential Functions

$$f(x) = a^x, \quad a > 0, ; a \neq 1.$$

Domain: all of $\mathbb{R}$. Range: $(0, \infty)$. The function is always positive.

For $a > 1$: increasing. The larger $a$ is, the faster it grows. For $0 < a < 1$: decreasing.

The most important base is $e \approx 2.71828$, which comes up naturally across calculus and analysis.

Why exponentials grow so shockingly fast. A linear function $f(x) = 100x$ grows by adding $100$ for each unit increase in $x$. An exponential $f(x) = 2^x$ grows by doubling for each unit increase. By $x = 10$, the linear function is $1000$. The exponential is $1024$. By $x = 100$: linear gives $10000$, exponential gives $2^{100} \approx 10^{30}$. Exponential growth eventually dominates any polynomial, no matter how high the degree.

e^x (blue) vs x² (orange). The polynomial leads early, but by x = 3 the exponential overtakes it - and the gap widens without bound.

Logarithmic Functions

The logarithm $\log_a$ is the inverse of $a^x$:

$$y = \log_a x \iff a^y = x.$$

Domain: $(0, \infty)$. Range: all of $\mathbb{R}$. The natural logarithm $\ln x = \log_e x$ is the inverse of $e^x$.

Key identities (all follow from the inverse relationship with exponential):

$$\ln(xy) = \ln x + \ln y, \quad \ln(x/y) = \ln x - \ln y, \quad \ln(x^r) = r\ln x.$$

These identities turn multiplication into addition - a profound simplification that made logarithms invaluable for computation before calculators. Log tables were exactly this in practice: printed books listing $\log_{10}(x)$ for thousands of values of $x$. To multiply two large numbers, you looked up their logs, added them, then looked up the antilog to recover the product. John Napier published the first such tables in 1614, and they remained standard tools for astronomers, navigators, and engineers for over 350 years.

Log table in action. Say you want to compute $23.4 \times 47.8$ by hand - no calculator.

Step 1. Look up the logs in the table: $$\log_{10}(23.4) \approx 1.3692 \qquad \log_{10}(47.8) \approx 1.6794$$

Step 2. Add them: $$1.3692 + 1.6794 = 3.0486$$

Step 3. Look up the antilog - that is, find $x$ such that $\log_{10}(x) = 3.0486$: $$x = 10^{3.0486} \approx 1118.7$$

The actual answer is $23.4 \times 47.8 = 1118.52$. One addition replaced one multiplication, and the error is less than 0.02%. For a navigator computing a ship’s position in 1750, this was the difference between an hour of arithmetic and a few minutes with a book.

Why logarithms grow so slowly. While $e^x$ doubles with every increase of $\ln 2 \approx 0.693$ in $x$, the logarithm requires doubling $x$ to increase by $\ln 2$. The function $\ln x$ grows without bound (it has no horizontal asymptote) but does so glacially. $\ln(10^{100}) \approx 230$. A billion is only about 21 in natural log. In complexity theory, $O(\log n)$ algorithms are essentially as fast as you can get.

Trigonometric Functions

The sine and cosine are defined via the unit circle: for a point at angle $\theta$ from the positive $x$-axis on the unit circle (radius 1, center at origin), $\cos\theta$ is the $x$-coordinate and $\sin\theta$ is the $y$-coordinate.

$$\sin^2\theta + \cos^2\theta = 1. \quad \text{(Pythagorean identity)}$$

Domain of $\sin$ and $\cos$: all of $\mathbb{R}$. Range: $[-1, 1]$. Period $2\pi$: they repeat every $2\pi$.

The tangent: $\tan\theta = \sin\theta / \cos\theta$, undefined when $\cos\theta = 0$ (at $\theta = \pm\pi/2, \pm 3\pi/2, \ldots$).

Why they matter everywhere: any periodic phenomenon (sound, light, electromagnetic waves, alternating current, oscillations in springs and circuits) is modeled by sines and cosines. Fourier’s theorem says that virtually any function can be written as a sum of sines and cosines. Trigonometric functions are not a narrow topic - they are the language of everything that repeats.

sin(x) over two full periods. The vertical markers show x = −2π, 0, and 2π - the function repeats identically on each interval of length 2π.

Wait - polynomials AND sines and cosines can both represent any function? Are there more of these?

Yes, and this is one of the most surprising structural facts in mathematics. Several completely different families of functions each have the property that “any reasonable function” can be built from them:

  • Polynomials (Taylor/Weierstrass): any sufficiently nice function can be approximated arbitrarily closely by polynomials near a point, and any continuous function on a closed interval can be uniformly approximated by polynomials everywhere on it.
  • Sines and cosines (Fourier): any periodic function can be written as a sum of sines and cosines of different frequencies.
  • Wavelets: any function can be decomposed into localized oscillatory “bumps” at different scales - the basis behind JPEG image compression.
  • Exponentials (Laplace transform): functions can be represented as combinations of $e^{st}$ for various $s$, which turns problems about change over time into algebra.

These are not contradictory - they are different choices of “basis” for the same space of functions. The key insight is that functions themselves form a kind of Vector Spaces & Subspaces - The Geometry of Abstract Addition and Scaling : you can add two functions, or scale one by a constant. Just as any vector in $\mathbb{R}^3$ can be expressed in many different coordinate systems (Change of Basis - The Same Space, a Better Vantage Point , etc.), any function can be expressed in many different “bases” of simpler building blocks.

Why do these particular families work? Each one is, in the appropriate sense, a complete set - no function is left unreachable, and the building blocks do not overlap redundantly. Polynomials are complete because they can approximate any shape by adding enough terms. Sines and cosines are complete because any periodic shape is made of oscillations at different frequencies. The choice of which basis to use depends entirely on the problem: Fourier is natural for anything involving periodicity or frequency; polynomials are natural for local approximation; wavelets are natural for signals that change character over time.

This is why these families keep reappearing across all of mathematics and its applications - they are not just convenient tools, they are genuinely fundamental building blocks of function space.

Rational Functions

A rational function is the ratio of two polynomials:

$$f(x) = \frac{p(x)}{q(x)}, \quad q(x) \neq 0.$$

Domain: all $x$ where $q(x) \neq 0$. At zeros of $q$, the function either has a vertical asymptote (if $p$ and $q$ share no common factor) or a removable discontinuity (hole) (if they do share a common factor).

These are important because when computing limits (and later, derivatives), many of the indeterminate forms $\frac{0}{0}$ come from rational functions where numerator and denominator both vanish at the same point. The limit exists if and only if the common factor can be cancelled.


Section 10: Functions of Multiple Variables

So far every function takes one real number as input and produces one real number. This is $f: \mathbb{R} \to \mathbb{R}$.

But the world is multidimensional. The temperature in a room depends on position $(x, y, z)$ and time $t$ - four inputs, one output. The trajectory of a particle gives position $(x(t), y(t), z(t))$ from one input $t$ - one input, three outputs. A neural network takes millions of parameters as input and produces a scalar loss.

The function concept extends naturally:

Scalar field. $f: \mathbb{R}^n \to \mathbb{R}$. Multiple inputs, one output. Examples: loss function in machine learning $L: \mathbb{R}^p \to \mathbb{R}$; temperature distribution $T: \mathbb{R}^3 \to \mathbb{R}$.

Vector field. $f: \mathbb{R}^n \to \mathbb{R}^m$. Multiple inputs, multiple outputs. Examples: velocity at each point in a fluid $v: \mathbb{R}^3 \to \mathbb{R}^3$; a neural network layer $W: \mathbb{R}^n \to \mathbb{R}^m$.

Parametric curve. $\gamma: \mathbb{R} \to \mathbb{R}^n$. One input (a parameter, often time), multiple outputs (coordinates). $\gamma(t) = (\cos t, \sin t)$ traces the unit circle.

All the concepts we developed - domain, codomain, injectivity, surjectivity, bijectivity, composition - extend to these general settings without change. The definitions care about inputs and outputs, not how many dimensions they live in.

This generalization is not just notation. When we study multivariate calculus, the derivative of $f: \mathbb{R}^n \to \mathbb{R}^m$ becomes a matrix (the Jacobian) rather than a number. But the intuition - the derivative measures how outputs change when inputs change - is exactly the same. By understanding it well for $f: \mathbb{R} \to \mathbb{R}$ first, you will understand it for general functions immediately.


Section 11: Why This Foundation Matters

Here is something students often feel: this chapter is just definitions. It is all terminology. Nothing is being computed. Nothing interesting is happening.

That feeling is understandable. And it is a signal that you have not yet seen what breaks without these definitions.

Limits. $\lim_{x \to a} f(x)$ asks: what does $f$ do near $a$? If you do not know precisely what $f$ is - its domain, its rule - you cannot answer this question. The distinction between the domain $\mathbb{R}$ and the domain $\mathbb{R} \setminus \{0\}$ determines whether a limit exists.

Derivatives. The derivative is defined as:

$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}.$$

This definition requires $f$ to be a function, requires $x$ to be in the domain of $f$, and requires $x + h$ to be in the domain for small $h$. The derivative of a function at a point of its domain is the limit of a ratio of function values. Without a precise notion of function, this formula is undefined.

Inverse functions and solving equations. If you want to solve $e^x = 5$, you apply $\ln$ to both sides: $x = \ln 5$. This works because $\ln$ is the inverse function of $e^x$. Knowing that the inverse exists and is well-defined requires knowing that $e^x$ is bijective onto $(0, \infty)$.

Composition and the chain rule. The chain rule says: the derivative of $g \circ f$ at $x$ is $g'(f(x)) \cdot f'(x)$. This requires understanding what $g \circ f$ means and being able to identify when a complicated function is a composition of simpler ones.

Every major theorem of calculus presupposes a well-defined function. Every technique of calculus - u-substitution, integration by parts, implicit differentiation - is a manipulation of functions according to the rules we are laying down now. The definitions in this post are not just formalities. They are the grammar of the language you are about to speak.

Discomfort check. It is normal to feel like you do not truly understand something until you have used it. The concepts of this post - especially injectivity, surjectivity, and bijectivity - may feel abstract right now. That is fine. The intuitions are: injective means no collisions, surjective means nothing in the codomain is missed, bijective means a perfect pairing that allows inversion. These become concrete as they appear in specific theorems and problems.


Section 12: The Rigorous Underpinning

For those who want the formal set-theoretic foundation on which all of this rests.

Functions as Sets of Pairs

In set theory, a function $f: A \to B$ is defined as a set of ordered pairs:

$$f \subseteq A \times B$$

with the property that for every $a \in A$, there is exactly one $b \in B$ such that $(a, b) \in f$.

This reduces the notion of function to purely set-theoretic terms. “Exactly one $b$” encodes the single-valued property. “$f \subseteq A \times B$” means every pair has its first component in $A$ and second in $B$.

Under this definition, the function $f(x) = x^2$ on $\{1, 2, 3\}$ is the set $\{(1,1), (2,4), (3,9)\}$. The “rule” is just the list of pairs.

This definition is necessary for mathematics to be fully rigorous. Concepts like “rule” and “machine” are informal; set-theoretic pairs are not.

Formal Definition of Composition

Given $f: A \to B$ and $g: B \to C$, the composition $g \circ f$ is the function:

$$g \circ f = \{(a, c) \in A \times C : \exists b \in B, (a,b) \in f \text{ and } (b,c) \in g\}.$$

Verify: for each $a \in A$, there is a unique $b$ with $(a,b) \in f$ (since $f$ is a function), and a unique $c$ with $(b,c) \in g$ (since $g$ is a function), so there is a unique $(a,c)$ in the composition. The composition is a function.

Injectivity and Surjectivity, Precisely

$f: A \to B$ is injective if and only if: for all $a_1, a_2 \in A$, $f(a_1) = f(a_2) \implies a_1 = a_2$.

$f: A \to B$ is surjective if and only if: for all $b \in B$, there exists $a \in A$ with $f(a) = b$.

Theorem. $f: A \to B$ has an inverse function $f^{-1}: B \to A$ if and only if $f$ is bijective.

Proof sketch. ($\Rightarrow$) If $f^{-1}$ exists: if $f(a_1) = f(a_2) = b$, apply $f^{-1}$ to both sides: $f^{-1}(b) = a_1$ and $f^{-1}(b) = a_2$, so $a_1 = a_2$ (injective). For any $b \in B$, set $a = f^{-1}(b)$; then $f(a) = f(f^{-1}(b)) = b$ (surjective).

($\Leftarrow$) If $f$ is bijective: for each $b \in B$, surjectivity gives at least one $a$ with $f(a) = b$; injectivity gives at most one. So exactly one such $a$ exists. Define $f^{-1}(b) = a$. This is a well-defined function. $\blacksquare$

Cardinality and Bijections

Two sets $A$ and $B$ have the same cardinality (written $|A| = |B|$) if and only if there exists a bijection $f: A \to B$.

For finite sets, this agrees with counting. For infinite sets, it is the only definition that makes sense.

$|\mathbb{N}| = |\mathbb{Z}|$ because $n \mapsto \lceil n/2 \rceil \cdot (-1)^{n+1}$ (for $n \geq 1$) is a bijection between $\mathbb{N}$ and $\mathbb{Z} \setminus \{0\}$, and similar constructions handle all of $\mathbb{Z}$.

$|\mathbb{R}| > |\mathbb{N}|$: Cantor’s diagonal argument shows no surjection $\mathbb{N} \to \mathbb{R}$ can exist. The reals are uncountably infinite.

This is not just a curiosity. The reason calculus works - the reason limits, derivatives, and integrals can be made rigorous - is that the real numbers form a complete ordered field, and the cardinality gap between $\mathbb{N}$ and $\mathbb{R}$ is what gives the real line its density and completeness. Functions defined on the reals have structure that functions on the rationals lack.


Summary

Concept Definition Why It Matters
Function $f: A \to B$ Rule: each $a \in A$ gets exactly one $f(a) \in B$ Language for all mathematical relationships
Domain Set of valid inputs Determines where the function lives; changes domain, changes function
Codomain Declared container for outputs Part of the type signature; distinct from range
Range Actual set of outputs $\{f(a) : a \in A\}$ Tells you what values are achievable
Injective Different inputs give different outputs Necessary for inverse to exist; no collisions
Surjective Every codomain element is hit Necessary for inverse to exist; nothing left out
Bijective Both injective and surjective Exactly the condition for an inverse to exist
Inverse $f^{-1}$ Runs $f$ backward Exists precisely when $f$ is bijective
Composition $g \circ f$ Apply $f$ first, then $g$ Foundation of the chain rule

The questions worth having ready for any function: what is its domain? Is it injective? Is it surjective? What is its range? What does its inverse look like?


Read Next: