Google’s PageRank and Beyond: The Science of Search Engine Rankings Amy N.Langville , and Carl D.Meyer Princeton U. Press, Princeton, NJ, 2006. $35.00 (224 pp.). ISBN 0-691-12202-4

Suppose you have some time to kill and decide to spend it surfing the Web. You point your cursor at whatever page comes up in your browser, and pick a link to click at random. You click again at the page you’ve reached, and again, several more times. What is the probability that you will end up looking at the Web page of Physics Today? Obviously, the chances can’t be too high, because about 10 billion pages exist on the open Web, and you’ve just been searching at random. One would think that the chances would be about one in 10 billion. It’s obvious, right?

No, not really. Think about it. To get to http://www.physicstoday.org you must be on a page that points to it. If that page points to a small number of pages, you will be more likely to get to Physics Today. Furthermore, your chances of having gotten to that page are higher if more pages point to it. So the distribution is not really random at all. Your chances of landing on a particular page are influenced by the graph of the network—that is, the graph created by the hypertext links on the Web. In fact, currently the graph of the network puts Physics Today somewhere around the 90th percentile of likelihood, meaning your chances of landing on the link are closer to one in a billion—still not very high, but an order of magnitude better than random.

How the Web graph affects the likelihood that a user will find a particular page and the mathematics of computing the probabilities for webpages combine to form the basis of one of the most successful algorithms in the history of computing: PageRank. The algorithm is at the heart of the Google search engine. When you type a search query to Google, it retrieves the list of pages that contain your search words—a computing feat in itself—but more important, it sorts those pages according to their “page rank,” the outcome of the algorithm applied to the many billions of Web pages. So understanding the algorithm is the key to understanding how search engines work and to the many other derivative industries based on them. In fact, developing the first practical commercial version of the algorithm made Google’s founders Sergey Brin and Larry Page, who developed PageRank at Stanford University, two of the richest men in America.

In Google’s PageRank and Beyond: The Science of Search Engine Rankings, Amy Langville and Carl Meyer use the PageRank algorithm as the unifying theme to discuss the mathematics underlying search engines. Langville is an assistant professor of mathematics at the College of Charleston in South Carolina, and Meyer is a professor of mathematics at North Carolina State University in Raleigh. As easy as it is to explain the behavior of randomly surfing the Web, the actual mathematics of the Web graph involves Markov chains and the eigenvectors of extremely large graphs. Langville and Meyer present the mathematics in all its detail, which would make for a dry book if that was all they presented. But they vary the math with discussions of the many issues involved in building search engines, the “wars” between search engine developers and those trying to artificially inflate the position of their pages, and the future of search-engine development. The authors also include a number of asides and boxes that discuss amusing anecdotes and interesting issues that arise in the search-engine world.

Last year, I taught a course on Web architecture and programming in which one of the primary topics was PageRank and search-engine algorithms. While looking for a text to use for the course, I found several books on trying to improve the page rank of websites, and Google for Dummies (Wiley, 2003) by Brad Hill. But I found nothing that was appropriate for a graduate or advanced undergraduate course. Langville and Meyer’s book neatly fits that bill.

Although not a textbook, Google’s PageRank and Beyond makes good reading for anyone, student or professional, who wants to understand the details of search engines. For those interested in trying to implement search engines, the book can be invaluable. The authors, however, could have covered some additional search-related topics. For example, Langville and Meyer do provide Matlab code for many of their algorithms, but they could also have introduced Lucene, the open-source search engine. Their chapter on the future of search engines already seems a bit dated because they leave out such topics as Web 2.0, the Semantic Web, social networks, and multimedia search engines, all now starting to loom large on the ever-expanding search-engine horizon. But it seems unfair to criticize the authors for omissions in such a rapidly changing field as the science of the search engine, in which any book is doomed to need updating before it hits the market.