PageRank & Power Iteration - Importance as a Fixed Point of a Random Walk // Megha Bose

Helpful context:

In 1998, two Stanford PhD students figured out that the web’s link structure could rank pages by importance. They named their algorithm after one of them: Larry Page. It made them billionaires and changed the internet. The math is a random walk on a graph.

The idea is embarrassingly simple to state: a page is important if important pages link to it. Yes, that’s circular. And yes, it works.

The Random Surfer

Imagine a user who browses the web in the most mindless way possible. They land on some page. With probability $d = 0.85$ (the damping factor), they click a random outgoing link. With probability $1 - d = 0.15$, they get bored and teleport to a completely random page.

If you run this random surfer forever and ask “what fraction of time does the surfer spend on each page?” - that fraction is the PageRank.

Pages linked to by many highly-visited pages get visited more. Pages that point to them get credit for sending the surfer there. The fractions add up to 1, so it’s a proper probability distribution. This is the stationary distribution of a Markov chain.

PageRank as an Eigenvector Problem

Let $n$ be the number of pages. Build the transition matrix $M$: entry $M_{ji} = 1/d_i$ if there’s a link from page $i$ to page $j$ (where $d_i$ is the out-degree of page $i$), and $0$ otherwise. Each column sums to 1.

The PageRank vector $r$ satisfies:

$$r = Mr$$

This says $r$ is an eigenvector of $M$ with eigenvalue 1. The Perron-Frobenius theorem guarantees that a column-stochastic matrix with the right connectivity properties has a unique positive eigenvector with eigenvalue 1. That’s our PageRank.

But two problems break the pure version.

Dangling nodes. Pages with no outgoing links (like PDFs) absorb probability mass - the surfer gets stuck. Their column in $M$ is all zeros, so $M$ is no longer column-stochastic.

Spider traps. Clusters of pages that only link to each other suck in all the probability mass and starve the rest of the web.

Both problems vanish with the teleportation term. Mix in a uniform jump:

$$r = \alpha M r + \frac{1-\alpha}{n} \mathbf{1}$$

This is the full PageRank equation. For dangling nodes, replace their zero column with $\mathbf{1}/n$ (the surfer teleports uniformly). Now every page can reach every other page in one teleportation step - the chain is ergodic, and a unique stationary distribution exists.

Power Iteration

We could solve $r = Gr$ directly, but the Google matrix $G = \alpha M + \frac{1-\alpha}{n}\mathbf{1}\mathbf{1}^T$ is dense - storing it for a billion-page web is impossible. Instead, we exploit the structure:

$$r^{(t+1)} = \alpha M r^{(t)} + \frac{1-\alpha}{n} \mathbf{1}$$

$M$ is sparse (each page has maybe 10-100 links out of a billion), so this matrix-vector product costs $O(E)$ where $E$ is the number of edges. Add $O(n)$ for the teleportation term. Each iteration is $O(n + E)$.

PowerIteration(M, alpha, n, epsilon):
    r = (1/n) * ones(n)
    while true:
        r_new = alpha * M @ r + ((1 - alpha) / n) * ones(n)
        if L1_norm(r_new - r) < epsilon:
            return r_new
        r = r_new

Start from uniform, iterate, converge. Why does this work? The matrix $G$ has a dominant eigenvalue of 1 (corresponding to the stationary distribution) and all other eigenvalues have magnitude at most $\alpha < 1$. Repeated multiplication amplifies the dominant eigenvector and suppresses everything else:

$$|r^{(t)} - r^|_1 \leq \alpha^t |r^{(0)} - r^|_1$$

The error shrinks by a factor of $\alpha$ each iteration. For $\alpha = 0.85$ and tolerance $10^{-6}$, convergence takes roughly 85 iterations. Damping factor $\alpha$ is a dial: higher means more link-structure influence but slower convergence; lower means faster convergence but rankings diluted by random teleportation.

Discomfort check. Isn’t this just circular reasoning? A page is important because important pages link to it - but how do you know which pages are important in the first place?

Yes, it’s self-referential. But power iteration resolves the circularity. Start from any distribution (say, uniform). Multiply by $G$. Multiply again. The sequence converges to the unique fixed point - the stationary distribution. You don’t need to know the answer in advance; you iterate toward it. The circularity is broken by the dynamics, not the definition.

Personalized PageRank

The standard algorithm teleports to a uniform distribution over all pages. Personalized PageRank replaces that with a preference distribution $v$ over specific pages you care about:

$$r_v = \alpha M r_v + (1-\alpha) v$$

Set $v = e_s$ (the unit vector at a single node $s$) and you get the random walk with restart from $s$: the surfer teleports back to $s$ instead of anywhere. High $r_v$ scores mean “close to $s$ in the link structure.” Pinterest uses personalized PageRank from a user’s pin history to recommend related content.

Applications Beyond the Web

The algorithm doesn’t care that nodes are web pages. Any directed graph works.

Citation networks. Apply PageRank to academic papers (edges = citations). A paper cited by highly-cited papers scores highly. Google Scholar and Semantic Scholar both use variants of this for influence ranking.

Drug-protein interaction networks. Nodes are proteins, edges are interactions. PageRank identifies hub proteins - centrally connected ones likely to be essential to cellular function. Studies show PageRank correlates better with protein essentiality than raw degree, because it accounts for global network structure.

Recommendation systems. Build a bipartite graph of users and items. Personalized PageRank from a user node finds items “nearby” in the interaction graph - used in Pinterest, Twitter’s “Who to Follow,” and Spotify’s related-artist recommendations.

Summary

Concept	Detail
Random surfer model	Follow link with prob $\alpha$, teleport with prob $1 - \alpha$
PageRank equation	$r = \alpha M r + \frac{1-\alpha}{n}\mathbf{1}$
Damping factor	Typically $\alpha = 0.85$
Power iteration cost	$O(n + E)$ per step, ~85 steps to converge
Convergence rate	Error shrinks by factor $\alpha$ each iteration
Dangling node fix	Replace zero column with uniform teleportation
Personalized PageRank	Teleport to preference distribution $v$ instead of uniform
Why it works	Perron-Frobenius: unique positive stationary distribution for ergodic chain

Read next:

String Algorithms & Pattern Matching - Finding Needles in Haystacks, Fast