PageRank & Power Iteration // Megha Bose

Prerequisite: Graph Algorithms & Traversal

Overview

PageRank assigns an importance score to each node in a directed graph by modelling a random surfer: at each step the surfer follows a random outgoing link with probability $\alpha$ (the damping factor, typically $0.85$), or teleports to a uniformly random node with probability $1 - \alpha$. The PageRank vector is the stationary distribution of this Markov chain. Pages that are linked to by many high-ranked pages receive high scores - a self-reinforcing definition that an eigenvector equation captures precisely.

Naive Rank: The Eigenvector Formulation

Let $G$ have $n$ nodes. Define the column-stochastic transition matrix $M$ where

$$M_{ji} = \begin{cases} 1/d_i & \text{if } (i \to j) \in E \ 0 & \text{otherwise} \end{cases}$$

and $d_i$ is the out-degree of node $i$. The rank vector satisfies the fixed-point equation

$$r = Mr$$

so $r$ is the eigenvector of $M$ corresponding to eigenvalue 1. By the Perron-Frobenius theorem, if $M$ is irreducible and aperiodic (the chain is ergodic), the dominant eigenvector is unique and positive.

However, two structural problems break this ideal:

Dangling nodes (out-degree zero) absorb probability mass without redistributing it, making columns of $M$ zero instead of stochastic.
Spider traps (strongly connected components with no outgoing edges to the rest of the graph) accumulate all probability mass, starving the rest of the graph.

PageRank with Teleportation

Both problems are resolved by mixing $M$ with a uniform teleportation matrix:

$$r = \alpha M r + \frac{1-\alpha}{n} \mathbf{1}$$

where $\mathbf{1}$ is the all-ones vector. This is equivalent to defining the Google matrix

$$G = \alpha M + \frac{1-\alpha}{n} \mathbf{1}\mathbf{1}^T$$

and solving $r = Gr$. Since $G$ is column-stochastic, irreducible, and aperiodic (every state can reach every other state in one step via teleportation), Perron-Frobenius guarantees a unique, positive stationary distribution.

Dangling nodes are handled by replacing their column in $M$ with $\mathbf{1}/n$ before forming $G$ (equivalently, the surfer at a dangling node teleports uniformly).

Power Iteration

Computing $r = Gr$ directly requires storing $G$, which is dense ($n^2$ entries). Instead, we exploit the structure of $G$ to apply it implicitly:

$$r^{(t+1)} = \alpha M r^{(t)} + \frac{1-\alpha}{n} \mathbf{1}$$

(assuming dangling-node mass has been redistributed into the teleportation term). This requires only the sparse matrix $M$ and a length-$n$ vector.

PowerIteration(M, alpha, epsilon):
    r = (1/n) * ones(n)          // uniform initialisation
    while true:
        r_new = alpha * M @ r + ((1 - alpha) / n) * ones(n)
        if ||r_new - r||_1 < epsilon:
            return r_new
        r = r_new

Complexity per iteration. $M$ has $E$ non-zeros (one per directed edge), so the sparse matrix-vector product costs $O(E)$. The teleportation term costs $O(n)$. Total per iteration: $O(n + E)$.

Convergence Analysis

For the Google matrix, the second eigenvalue satisfies $|\lambda_2| \leq \alpha$ because the teleportation term contracts the spectral radius of $M - I$ by $\alpha$. Concretely:

$$|r^{(t)} - r^\ast|_1 \leq \alpha^t |r^{(0)} - r^\ast|_1$$

where $r^\ast$ is the true PageRank vector. To achieve $|r^{(t)} - r^\ast|_1 < \epsilon$:

$$\alpha^t < \epsilon \implies t > \frac{\log(1/\epsilon)}{\log(1/\alpha)} = O!\left(\log_{1/\alpha} \frac{1}{\epsilon}\right)$$

For $\alpha = 0.85$ and $\epsilon = 10^{-6}$: $t \approx 6/(-\log_{10} 0.85) \approx 85$ iterations - fast in practice. The damping factor $\alpha$ thus controls a fundamental trade-off: higher $\alpha$ gives more “link-based” ranking but slower convergence; lower $\alpha$ converges faster but dilutes the graph structure with teleportation.

HITS: Hubs and Authorities

The Hyperlink-Induced Topic Search (HITS) algorithm maintains two scores per node: a hub score $h_i$ (quality of the node as a pointer to good content) and an authority score $a_i$ (quality of the node as a content source). The co-evolving update rules are:

$$a_i^{(t+1)} = \sum_{j : j \to i} h_j^{(t)}, \qquad h_i^{(t+1)} = \sum_{j : i \to j} a_j^{(t+1)}$$

followed by normalisation. In matrix form: $a = A^T h$ and $h = A a$, so $a$ is the leading eigenvector of $A^T A$ and $h$ of $A A^T$, where $A$ is the adjacency matrix. HITS is query-dependent (run on a subgraph of relevant pages), while PageRank is query-independent and computed once for the whole web.

Personalised PageRank

Replace the uniform teleportation vector $\mathbf{1}/n$ with a preference vector $v$ (a probability distribution over nodes):

$$r_v = \alpha M r_v + (1-\alpha) v$$

The resulting $r_v$ is the stationary distribution of a surfer who teleports according to $v$ rather than uniformly. Setting $v = e_s$ (the unit vector at node $s$) gives the random walk with restart from $s$, widely used for local community detection and recommendation systems (e.g., Pinterest’s Pixie system computes personalised PageRank from a user’s pin history to recommend related content).

Examples

Search engine ranking. The original Google search engine (circa 1998) ranked results by PageRank: pages with many high-PageRank inbound links scored higher. Power iteration was run offline on a crawled snapshot of the web; the resulting rank vector was stored and used to break ties in relevance scoring at query time.

Academic citation networks. PageRank applied to citation graphs (nodes = papers, edges = citations) ranks papers by influence. A paper cited by highly cited papers receives a high score. The approach generalises to author networks (where edges weight co-authorship) and is used by tools like Semantic Scholar and Google Scholar.

Protein interaction importance. In protein-protein interaction (PPI) networks, nodes are proteins and edges denote interactions. PageRank identifies hub proteins - those that are central to the interaction network and therefore likely to be essential (knockouts are lethal). Studies have shown PageRank scores correlate with protein essentiality across multiple organisms, outperforming degree centrality because they account for the global structure of the network.

Read Next:

Graph Mining