Searching Algorithms & Binary Search
Prerequisite:
Searching is the most fundamental computational task. The gap between $O(n)$ linear search and $O(\log n)$ binary search - a factor of $\log n$, which is 30 for $n = 10^9$ - determines whether a solution is feasible or not. More importantly, binary search is not just an algorithm for sorted arrays; it is a general reasoning framework for any problem with a monotone structure.
Linear Search
Linear search scans the array from left to right until a match is found. Time: $O(n)$ worst case. No preconditions on the data. It is optimal when the array is unsorted and cannot be preprocessed. For small $n$ or cache-friendly sequential access, it can outperform binary search in practice due to simpler branch prediction.
Binary Search: Formal Invariant
Binary search requires a sorted array. The core idea: maintain a range $[\texttt{lo}, \texttt{hi}]$ guaranteed to contain the answer, and halve it at each step.
The standard implementation:
BinarySearch(A, target):
lo = 0, hi = n - 1
while lo <= hi:
mid = lo + (hi - lo) / 2 // overflow-safe
if A[mid] == target: return mid
elif A[mid] < target: lo = mid + 1
else: hi = mid - 1
return -1 // not found
Overflow pitfall: computing mid = (lo + hi) / 2 overflows when lo + hi > INT_MAX. The correct form is mid = lo + (hi - lo) / 2, which keeps intermediate values bounded by hi.
Loop invariant: at every iteration, if the target exists in A, it lies within A[lo..hi]. Establishing this invariant and proving it is maintained by each branch is the cleanest way to reason about binary search correctness.
Each iteration halves the search space. Starting from $n$ elements, after $k$ iterations the range has $\lceil n/2^k \rceil$ elements. When $n/2^k < 1$, i.e., $k > \log_2 n$, the loop terminates. Total comparisons: $O(\log n)$.
lower_bound and upper_bound
Two essential variants find boundary positions rather than exact matches:
lower_bound: the first index where A[i] >= target.
upper_bound: the first index where A[i] > target.
LowerBound(A, target):
lo = 0, hi = n // hi is one past the end
while lo < hi:
mid = lo + (hi - lo) / 2
if A[mid] < target: lo = mid + 1
else: hi = mid
return lo // lo == hi, the answer
Note that hi is initialized to n (not n-1) so the answer can be “insert at the end.” The invariant is: all elements before lo are < target, all elements from hi onward are >= target.
Binary Search on the Answer
The most powerful generalisation: instead of searching for a value in an array, search over the answer space using a monotone predicate.
Formal setup: Let $f : \mathbb{Z} \to {\text{false}, \text{true}}$ be a predicate that is monotone - once it becomes true, it stays true. Binary search finds the smallest value where $f$ is true:
BinarySearchAnswer(lo, hi, predicate):
while lo < hi:
mid = lo + (hi - lo) / 2
if predicate(mid): hi = mid
else: lo = mid + 1
return lo
This template works for any problem that can be phrased as: “find the minimum $x$ such that $f(x)$ holds,” where $f$ is monotone. Time: $O(\log(\text{range}) \cdot T_f)$ where $T_f$ is the cost of evaluating the predicate.
Example: minimum capacity to ship packages in $D$ days. Let $f(c)$ = “capacity $c$ is sufficient.” As $c$ increases, $f$ eventually flips from false to true. Binary search over $c \in [\max(\text{weights}), \sum \text{weights}]$.
Interpolation Search
For uniformly distributed sorted integers, interpolation search estimates the position of the target proportionally:
$$\text{mid} = lo + \left\lfloor \frac{(\text{target} - A[lo]) \cdot (hi - lo)}{A[hi] - A[lo]} \right\rfloor$$
This probe lands near the target for uniform data. Expected comparisons: $O(\log \log n)$. Worst case (non-uniform data): $O(n)$. Useful for large sorted tables of integers or timestamps where the distribution is approximately uniform.
Exponential Search
For unbounded (infinite or very large) sorted arrays where $n$ is unknown:
ExponentialSearch(A, target):
if A[0] == target: return 0
i = 1
while A[i] < target: i *= 2 // double until overshoot
return BinarySearch(A[i/2 .. i], target)
Exponential search first finds a range $[2^{k-1}, 2^k]$ containing the target in $O(\log n)$ steps, then binary searches within that range in another $O(\log n)$. Total: $O(\log n)$. It is optimal when $n$ is unknown and the target is near the beginning of the sequence.
Ternary Search for Unimodal Functions
Ternary search finds the maximum of a unimodal function $f$ (one that increases then decreases) on a real interval $[lo, hi]$:
TernarySearch(lo, hi, f, eps):
while hi - lo > eps:
m1 = lo + (hi - lo) / 3
m2 = hi - (hi - lo) / 3
if f(m1) < f(m2): lo = m1
else: hi = m2
return (lo + hi) / 2
Each iteration reduces the interval by $1/3$. After $k$ iterations, width is $(2/3)^k \cdot (hi - lo)$. To reach precision $\varepsilon$: $k = O(\log_{3/2}(1/\varepsilon)) = O(\log(1/\varepsilon))$ iterations. Note: ternary search on integers (discrete unimodal) is subtly different - binary search on the derivative is often cleaner.
Binary Search in a Rotated Sorted Array
A sorted array rotated at some pivot: [4, 5, 6, 7, 0, 1, 2]. Standard binary search fails, but the structure is preserved: at least one half is always sorted.
SearchRotated(A, target):
lo = 0, hi = n - 1
while lo <= hi:
mid = lo + (hi - lo) / 2
if A[mid] == target: return mid
if A[lo] <= A[mid]: // left half is sorted
if A[lo] <= target < A[mid]: hi = mid - 1
else: lo = mid + 1
else: // right half is sorted
if A[mid] < target <= A[hi]: lo = mid + 1
else: hi = mid - 1
return -1
At each step, determine which half is sorted. If the target lies within the sorted half, recurse there; otherwise recurse into the unsorted half. Still $O(\log n)$.
Examples
Example 1 - Square root via binary search on reals: To compute $\lfloor \sqrt{n} \rfloor$ without floating-point, binary search over integers $[0, n]$ with predicate $f(x) = (x^2 \leq n)$. This runs in $O(\log n)$ comparisons with integer arithmetic only.
Example 2 - Peak-finding in $O(\log n)$: Given an array where no two adjacent elements are equal, find any local peak (element greater than both neighbors). At the midpoint, if A[mid] < A[mid+1], the right side has a peak; otherwise the left side does. Binary search gives $O(\log n)$.
Example 3 - git bisect: git bisect performs binary search over commit history. Given a “good” and “bad” commit, it checks the midpoint commit - $O(\log n)$ builds to isolate a regression among $n$ commits. This is binary search on the answer with predicate $f(\text{commit}) = $ “bug is present.”
Find-in-BST vs. Array Binary Search
Both run in $O(\log n)$ comparisons, but they differ in:
- Memory layout: array binary search benefits from cache prefetching; BST pointer-chasing causes cache misses.
- Flexibility: BSTs support dynamic insert/delete in $O(\log n)$; arrays require $O(n)$ to insert.
- Locality: for read-heavy workloads on static data, a sorted array with binary search typically outperforms a BST by $2$–$5\times$ in wall-clock time.
The choice depends on whether the data changes frequently. Static data: sorted array. Dynamic data: balanced BST or skip list.
Read Next: