Searching Algorithms - Twenty Steps Through a Million Entries // Megha Bose

Helpful context:

A phone book has 1 million entries. You want to find a name.

Linear search: start at page one, read every name until you find it. In the worst case, 1 million comparisons.

Binary search: open to the middle. If the name you want comes before the middle, discard the second half. Repeat. With 1 million entries you need at most 20 comparisons.

That’s a 50,000-fold speedup for exactly the same task. Not from better hardware. Not from a faster language. From a different approach to the question.

This gap - $O(n)$ versus $O(\log n)$ - is one of the most consequential in all of computing. But binary search is more than a lookup tool. It’s a general reasoning strategy, and that’s what this post is really about.

Linear Search: When You Have No Choice

If the array is unsorted and you need to find a specific element, you have no choice but to check each element in turn. Linear search is optimal for this case - you can’t do better than $O(n)$ without preprocessing.

Expected cost: $n/2$ comparisons on average if the element is present (you find it halfway through, on average). Worst case: $n$ comparisons (element is last or absent).

Linear search has practical advantages for small arrays. For $n \leq 16$ or so, the constant factor overhead of binary search (computing midpoints, branch prediction misses) can make linear search faster in wall-clock time. This is why real sorting libraries use insertion sort for small subarrays.

Binary Search: The Invariant That Makes It Work

Binary search works on a sorted array. The core idea is maintaining an invariant: the answer, if it exists, is always somewhere in the current search range $[\text{lo}, \text{hi}]$.

At each step, you look at the midpoint. Based on what you see, you can eliminate half the remaining range while preserving the invariant. Repeat until the range collapses to a single element (or until you confirm the element is absent).

Here is a careful implementation:

BinarySearch(A, target):
    lo = 0
    hi = n - 1
    while lo <= hi:
        mid = lo + (hi - lo) / 2
        if A[mid] == target:
            return mid
        elif A[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Two things in this code deserve attention.

The midpoint formula. We write lo + (hi - lo) / 2, not (lo + hi) / 2. These are mathematically equivalent, but (lo + hi) can overflow when both lo and hi are large integers. In 32-bit arithmetic with lo = hi = 2^30, their sum overflows. The safe form keeps intermediate values bounded by hi. This bug appeared in Java’s standard library for nearly a decade before being fixed.

Off-by-one. The condition is lo <= hi, not lo < hi. The update to lo is mid + 1, not mid. If you use lo = mid when A[mid] < target, the loop can run forever when lo + 1 == hi and A[lo] < target < A[hi]. Getting binary search exactly right is famously difficult - a 2006 study found that most published implementations contained bugs.

Discomfort check. Binary search is one of the simplest algorithms to describe and one of the hardest to implement correctly. The off-by-one errors are subtle because there are two interacting choices (how to update lo vs hi, and when the loop terminates) that must be mutually consistent. The safest approach: pick a specific invariant and maintain it mechanically. The invariant “the target is always in $A[\text{lo} \ldots \text{hi}]$” tells you exactly what the updates must be: if A[mid] < target, the target can’t be at or left of mid, so set lo = mid + 1. If A[mid] > target, the target can’t be at or right of mid, so set hi = mid - 1. The loop terminates when lo > hi, meaning the range is empty and the target is absent.

Each iteration halves the remaining range. After $k$ iterations, the range has size $n / 2^k$. The loop terminates when this is less than 1, i.e., $k = O(\log n)$ iterations. Total comparisons: $O(\log n)$.

Here is binary search on [2, 5, 8, 12, 16, 23, 38, 56, 72, 91] looking for 23:

flowchart TD A["lo=0, hi=9, mid=4\nA[4]=16 < 23 → move lo right\nlo = 5"] B["lo=5, hi=9, mid=7\nA[7]=56 > 23 → move hi left\nhi = 6"] C["lo=5, hi=6, mid=5\nA[5]=23 = 23 → found at index 5"] A --> B --> C

Variants: lower_bound and upper_bound

Sometimes you don’t want to find one occurrence of a value - you want all of them, or you want to know where to insert a value to maintain sorted order.

lower_bound: the first index where A[i] >= target. (The leftmost position where target could be inserted.)

upper_bound: the first index where A[i] > target. (The rightmost position where target could be inserted.)

The range [lower_bound(target), upper_bound(target)) gives all occurrences of target in the sorted array. For example, searching for 3 in [1, 3, 3, 3, 5, 7]:

flowchart LR A["[0] 1"] --- B["[1] 3\n↑ lower_bound"] --- C["[2] 3"] --- D["[3] 3"] --- E["[4] 5\n↑ upper_bound"] --- F["[5] 7"]

LowerBound(A, target):
    lo = 0
    hi = n          // note: n, not n-1
    while lo < hi:
        mid = lo + (hi - lo) / 2
        if A[mid] < target:
            lo = mid + 1
        else:
            hi = mid
    return lo       // lo == hi, the answer

Here hi is initialized to n (one past the end) because the answer could be “insert at the end.” The invariant is: all elements before lo are strictly less than target, all elements from hi onward are $\geq$ target. When lo == hi, that’s the answer.

Binary Search on the Answer: The Real Power

Here is where binary search becomes a design technique rather than just a lookup tool.

Many optimization problems have this structure: you want to find the minimum value $x$ such that some condition $f(x)$ is satisfied, where $f$ is monotone - once it becomes true, it stays true for all larger values. Instead of searching in an array, you search over the space of possible answers.

The template:

BinarySearchAnswer(lo, hi, predicate):
    while lo < hi:
        mid = lo + (hi - lo) / 2
        if predicate(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

This finds the smallest value in $[\text{lo}, \text{hi}]$ for which predicate returns true.

Example: integer square root. Find the largest integer $m$ such that $m^2 \leq n$, without floating-point arithmetic. Define predicate(x) = (x*x > n). Binary search over $[0, n]$. The answer is lo - 1 when the loop finishes. Time: $O(\log n)$.

Example: shipping packages. You have a list of package weights and must ship all of them in $D$ days, with a daily weight limit of $c$. What’s the minimum $c$ that works? Define predicate(c) = "can we ship everything in D days with capacity c?". This predicate is monotone: if capacity $c$ works, then $c+1$ also works. Binary search over capacities in $[\max(\text{weights}), \sum \text{weights}]$. Time: $O(n \log S)$ where $S = \sum \text{weights}$.

Example: finding a zero. Given a monotone function $f$ (strictly increasing), find the value $x$ where $f(x) = 0$. Binary search between a known negative value and a known positive value. Each step halves the interval. This is bisection, which engineers use to solve equations that have no closed-form solution.

Why does bisection take $O(\log(1/\varepsilon))$ steps? Say the initial interval has width $L$ (the distance between your known negative and positive points). After one step, the width is $L/2$. After two steps, $L/4$. After $k$ steps, $L/2^k$. You stop when this width is at most $\varepsilon$ - at that point, you know the root to within $\varepsilon$. So you need:

$$\frac{L}{2^k} \leq \varepsilon \implies 2^k \geq \frac{L}{\varepsilon} \implies k \geq \log_2\left(\frac{L}{\varepsilon}\right)$$

$L$ is just a constant (the initial bracket size), so this is $O(\log(1/\varepsilon))$. To gain one extra decimal digit of precision - to make $\varepsilon$ ten times smaller - you need $\log_2(10) \approx 3.32$ more steps. Precision comes at a fixed cost per digit.

The insight is that binary search works on any monotone predicate over any ordered domain - not just sorted arrays. This generalization makes it a fundamental problem-solving tool.

Newton’s Method: Faster Root-Finding

Bisection is reliable but slow - it always halves the interval, giving $O(\log(1/\varepsilon))$ iterations to reach precision $\varepsilon$. Newton’s method converges much faster by using the slope of the function to make a smarter guess.

The intuition: you are standing at point $x_n$ on the curve, above the x-axis. You want to know where the curve hits zero. You don’t know the full shape of the curve, but you do know two things right where you’re standing: the height $f(x_n)$ and the slope $f'(x_n)$. If the curve were a straight line - just a ramp - it would hit zero at exactly $x_n - f(x_n)/f'(x_n)$: you walk horizontally by the amount “height divided by slope.” The curve is not actually a straight line, but near any smooth point it is approximately straight, so this estimate lands you much closer to the root. You repeat from there.

Why does this converge so fast? The approximation error - how much the curve deviates from the straight line - is proportional to the square of your distance from the root. So when you are 0.1 away, the error in your tangent-line guess is around 0.01. When you are 0.01 away, the error drops to 0.0001. Each step squares the error - the number of correct decimal digits doubles every iteration. This is called quadratic convergence.

Given a function $f$ and current estimate $x_n$, the tangent line at $(x_n, f(x_n))$ has slope $f'(x_n)$ and crosses zero at:

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$

Example: compute $\sqrt{2}$. Let $f(x) = x^2 - 2$, $f'(x) = 2x$. Then $x_{n+1} = x_n - \frac{x_n^2 - 2}{2x_n} = \frac{x_n + 2/x_n}{2}$. Starting from $x_0 = 1$: $x_1 = 1.5$, $x_2 = 1.4167$, $x_3 = 1.41422$, $x_4 = 1.41421356\ldots$ - correct to 8 places in 4 steps. Notice how the error roughly squares each time: $0.414 \to 0.086 \to 0.002 \to 0.000001$.

When Newton’s method fails. Quadratic convergence requires the initial guess to be close to the root, $f'(x^*) \neq 0$ (no horizontal tangent at the root), and $f$ to be twice differentiable near the root. If these fail: the iteration may overshoot and oscillate, or converge to the wrong root. Safeguards: combine bisection (for guaranteed convergence) with Newton steps (for speed) - this is how robust root-finders like Brent’s method work.

Interpolation Search: O(log log n) for Uniform Data

What if you know the data is uniformly distributed? Instead of always probing the midpoint, you could probe proportionally - estimate where the target is based on its value relative to the endpoints.

$$\text{mid} = \text{lo} + \left\lfloor \frac{(\text{target} - A[\text{lo}]) \cdot (\text{hi} - \text{lo})}{A[\text{hi}] - A[\text{lo}]} \right\rfloor$$

For truly uniform data, this estimate is close to the target’s actual position, and the search converges in $O(\log \log n)$ expected comparisons - significantly fewer than binary search.

The catch: on non-uniform data, this estimate can be wildly wrong. Worst case is $O(n)$ (imagine data clustered near one end). In practice, interpolation search is used for large sorted tables of integers where the distribution is approximately uniform (timestamps, sequential IDs). For general data, stick with binary search.

Exponential Search: When n Is Unknown

Sometimes you’re searching an array so large that you don’t know its size, or you want to find the right range before binary searching.

Start at index 1, then 2, then 4, then 8, doubling until you overshoot the target or reach the end. Then binary search within the last range.

ExponentialSearch(A, target):
    if A[0] == target: return 0
    i = 1
    while i < n and A[i] < target:
        i = i * 2
    return BinarySearch(A, i/2, min(i, n-1), target)

If the target is at index $k$, the doubling phase takes $O(\log k)$ steps to find the right range, and binary search within that range also takes $O(\log k)$. Total: $O(\log k)$, which is $O(\log n)$ in the worst case.

This is useful when the target is near the beginning of a large or unbounded sorted sequence - you pay $O(\log k)$ instead of $O(\log n)$.

Search and the Lower Bound

We showed earlier that comparison-based sorting requires $\Omega(n \log n)$. A similar argument applies to searching.

Given a sorted array of $n$ elements, any comparison-based search must make at least $\Omega(\log n)$ comparisons in the worst case. The argument: there are $n + 1$ possible outcomes (the element is not there, or it’s at one of $n$ positions). Each comparison has two outcomes. To distinguish $n + 1$ cases, you need at least $\log_2(n+1)$ comparisons.

Binary search achieves this bound exactly: it’s optimal for searching sorted arrays.

Search in Trees and Hash Tables

Arrays aren’t the only place you search.

Balanced BST search also runs in $O(\log n)$ - you follow the BST property at each node to determine which subtree to recurse into. The comparison count is the same as binary search, but each comparison involves a pointer dereference that may cause a cache miss. For static data, a sorted array with binary search is typically 2-5 times faster than a balanced BST because of cache behavior.

Hash table lookup is $O(1)$ expected - no searching at all, just compute the hash and go to the bucket. When you don’t need ordered operations or range queries, hash tables dominate.

The choice: if the data never changes, use a sorted array with binary search. If you need insert/delete, use a balanced BST for ordered operations or a hash table for unordered ones.

Summary

Algorithm	Time	Preconditions	Notes
Linear search	$O(n)$	None	Optimal without preprocessing
Binary search	$O(\log n)$	Sorted array	Famously tricky off-by-ones
Binary search on answer	$O(\log R \cdot T_f)$	Monotone predicate	Generalization for optimization
Interpolation search	$O(\log \log n)$ expected	Sorted, uniform	$O(n)$ worst case
Exponential search	$O(\log k)$	Sorted	For unknown $n$, target at position $k$
BST search	$O(\log n)$	Balanced BST	Cache misses in practice
Hash table lookup	$O(1)$ expected	Hash table built	Unordered only
Bisection	$O(\log(1/\varepsilon))$	Continuous monotone $f$, sign change	Linear convergence; always works
Newton’s method	$O(\log \log(1/\varepsilon))$	Smooth $f$, good initial guess	Quadratic convergence; can fail

Read next:

Amortized Analysis - The True Cost of a Sequence