Searching Algorithms & Binary Search // Megha Bose

Prerequisite:

Searching is the most fundamental computational task. The gap between $O(n)$ linear search and $O(\log n)$ binary search - a factor of $\log n$, which is 30 for $n = 10^9$ - determines whether a solution is feasible or not. More importantly, binary search is not just an algorithm for sorted arrays; it is a general reasoning framework for any problem with a monotone structure.

Linear Search

Linear search scans the array from left to right until a match is found. Time: $O(n)$ worst case. No preconditions on the data. It is optimal when the array is unsorted and cannot be preprocessed. For small $n$ or cache-friendly sequential access, it can outperform binary search in practice due to simpler branch prediction.

Binary Search: Formal Invariant

Binary search requires a sorted array. The core idea: maintain a range $[\texttt{lo}, \texttt{hi}]$ guaranteed to contain the answer, and halve it at each step.

The standard implementation:

BinarySearch(A, target):
    lo = 0, hi = n - 1
    while lo <= hi:
        mid = lo + (hi - lo) / 2     // overflow-safe
        if A[mid] == target: return mid
        elif A[mid] < target: lo = mid + 1
        else: hi = mid - 1
    return -1    // not found

Overflow pitfall: computing mid = (lo + hi) / 2 overflows when lo + hi > INT_MAX. The correct form is mid = lo + (hi - lo) / 2, which keeps intermediate values bounded by hi.

Loop invariant: at every iteration, if the target exists in A, it lies within A[lo..hi]. Establishing this invariant and proving it is maintained by each branch is the cleanest way to reason about binary search correctness.

Each iteration halves the search space. Starting from $n$ elements, after $k$ iterations the range has $\lceil n/2^k \rceil$ elements. When $n/2^k < 1$, i.e., $k > \log_2 n$, the loop terminates. Total comparisons: $O(\log n)$.

lower_bound and upper_bound

Two essential variants find boundary positions rather than exact matches:

lower_bound: the first index where A[i] >= target. upper_bound: the first index where A[i] > target.

LowerBound(A, target):
    lo = 0, hi = n          // hi is one past the end
    while lo < hi:
        mid = lo + (hi - lo) / 2
        if A[mid] < target: lo = mid + 1
        else: hi = mid
    return lo               // lo == hi, the answer

Note that hi is initialized to n (not n-1) so the answer can be “insert at the end.” The invariant is: all elements before lo are < target, all elements from hi onward are >= target.

Binary Search on the Answer

The most powerful generalisation: instead of searching for a value in an array, search over the answer space using a monotone predicate.

Formal setup: Let $f : \mathbb{Z} \to {\text{false}, \text{true}}$ be a predicate that is monotone - once it becomes true, it stays true. Binary search finds the smallest value where $f$ is true:

BinarySearchAnswer(lo, hi, predicate):
    while lo < hi:
        mid = lo + (hi - lo) / 2
        if predicate(mid): hi = mid
        else: lo = mid + 1
    return lo

This template works for any problem that can be phrased as: “find the minimum $x$ such that $f(x)$ holds,” where $f$ is monotone. Time: $O(\log(\text{range}) \cdot T_f)$ where $T_f$ is the cost of evaluating the predicate.

Example: minimum capacity to ship packages in $D$ days. Let $f(c)$ = “capacity $c$ is sufficient.” As $c$ increases, $f$ eventually flips from false to true. Binary search over $c \in [\max(\text{weights}), \sum \text{weights}]$.

Interpolation Search

For uniformly distributed sorted integers, interpolation search estimates the position of the target proportionally:

$$\text{mid} = lo + \left\lfloor \frac{(\text{target} - A[lo]) \cdot (hi - lo)}{A[hi] - A[lo]} \right\rfloor$$

This probe lands near the target for uniform data. Expected comparisons: $O(\log \log n)$. Worst case (non-uniform data): $O(n)$. Useful for large sorted tables of integers or timestamps where the distribution is approximately uniform.

Exponential Search

For unbounded (infinite or very large) sorted arrays where $n$ is unknown:

ExponentialSearch(A, target):
    if A[0] == target: return 0
    i = 1
    while A[i] < target: i *= 2    // double until overshoot
    return BinarySearch(A[i/2 .. i], target)

Exponential search first finds a range $[2^{k-1}, 2^k]$ containing the target in $O(\log n)$ steps, then binary searches within that range in another $O(\log n)$. Total: $O(\log n)$. It is optimal when $n$ is unknown and the target is near the beginning of the sequence.

Ternary Search for Unimodal Functions

Ternary search finds the maximum of a unimodal function $f$ (one that increases then decreases) on a real interval $[lo, hi]$:

TernarySearch(lo, hi, f, eps):
    while hi - lo > eps:
        m1 = lo + (hi - lo) / 3
        m2 = hi - (hi - lo) / 3
        if f(m1) < f(m2): lo = m1
        else: hi = m2
    return (lo + hi) / 2

Each iteration reduces the interval by $1/3$. After $k$ iterations, width is $(2/3)^k \cdot (hi - lo)$. To reach precision $\varepsilon$: $k = O(\log_{3/2}(1/\varepsilon)) = O(\log(1/\varepsilon))$ iterations. Note: ternary search on integers (discrete unimodal) is subtly different - binary search on the derivative is often cleaner.

Binary Search in a Rotated Sorted Array

A sorted array rotated at some pivot: [4, 5, 6, 7, 0, 1, 2]. Standard binary search fails, but the structure is preserved: at least one half is always sorted.

SearchRotated(A, target):
    lo = 0, hi = n - 1
    while lo <= hi:
        mid = lo + (hi - lo) / 2
        if A[mid] == target: return mid
        if A[lo] <= A[mid]:                    // left half is sorted
            if A[lo] <= target < A[mid]: hi = mid - 1
            else: lo = mid + 1
        else:                                  // right half is sorted
            if A[mid] < target <= A[hi]: lo = mid + 1
            else: hi = mid - 1
    return -1

At each step, determine which half is sorted. If the target lies within the sorted half, recurse there; otherwise recurse into the unsorted half. Still $O(\log n)$.

Examples

Example 1 - Square root via binary search on reals: To compute $\lfloor \sqrt{n} \rfloor$ without floating-point, binary search over integers $[0, n]$ with predicate $f(x) = (x^2 \leq n)$. This runs in $O(\log n)$ comparisons with integer arithmetic only.

Example 2 - Peak-finding in $O(\log n)$: Given an array where no two adjacent elements are equal, find any local peak (element greater than both neighbors). At the midpoint, if A[mid] < A[mid+1], the right side has a peak; otherwise the left side does. Binary search gives $O(\log n)$.

Example 3 - git bisect: git bisect performs binary search over commit history. Given a “good” and “bad” commit, it checks the midpoint commit - $O(\log n)$ builds to isolate a regression among $n$ commits. This is binary search on the answer with predicate $f(\text{commit}) = $ “bug is present.”

Find-in-BST vs. Array Binary Search

Both run in $O(\log n)$ comparisons, but they differ in:

Memory layout: array binary search benefits from cache prefetching; BST pointer-chasing causes cache misses.
Flexibility: BSTs support dynamic insert/delete in $O(\log n)$; arrays require $O(n)$ to insert.
Locality: for read-heavy workloads on static data, a sorted array with binary search typically outperforms a BST by $2$–$5\times$ in wall-clock time.

The choice depends on whether the data changes frequently. Static data: sorted array. Dynamic data: balanced BST or skip list.

Read Next:

Divide and Conquer