Mathematics of Genome Analysis , Jerome K. Percus Cambridge U. Press, New York, 2002. $59.95, $19.95 paper (139 pp.). ISBN 0-521-58517-1, ISBN 0-521-58526-0 paper
The sequencing of the human genome alerted researchers to the importance of sequence data for modern molecular biology. Acquiring and interpreting that data requires powerful quantitative methods, and the rapidly growing field of computational biology develops such methods. Computational biology draws heavily on several disciplines (such as computer science, mathematics, statistics, and statistical physics), and in turn stimulates new research in those areas by posing new kinds of problems.
Mathematicians have led the way in computational biology. Many of their contributions are summarized in the textbook, Introduction to Computational Biology: Maps, Sequences, and Genomes (Chapman and Hall, 1995) by Michael S. Waterman, a leader in the field. Waterman’s book introduces many of the fundamental techniques of computational biology and focuses on real-world applications while maintaining mathematical rigor.
In Mathematics of Genome Analysis (in the Cambridge Studies of Mathematical Biology series) Jerome K. Percus takes a very different approach. As the book’s title suggests, Percus’s focus is mathematics rather than biological or computational application. His theme is the DNA molecule and its sequence, and indeed the book discusses many aspects of DNA, including sequencing and statistical properties of genomes, comparison of DNA sequences, and such physical properties of the DNA molecule as its melting behavior. Percus uses such practical questions about DNA and its sequence to showcase a variety of mathematical problems triggered by the biological questions, and to offer techniques for solving them. Many of those techniques—including stochastic processes described by the Fokker-Planck equation, correlation functions, power spectra, transfer matrices, and the WKB approximation—are rooted in physics. Others involve more mathematics, reflecting the breadth of Percus’s own research.
The book, based on a mathematics course that Percus taught at New York University, features a variety of assignments that exemplify the techniques and can be used for problem sets. Its moderate length is well suited for a textbook of a one-semester course, and its witty language makes it easy for the mathematically inclined reader to join the author in his obvious excitement. However, the dense technical detail and mathematical symbols demand very careful reading and at times obscure the bigger picture. One should definitely work through the text—it is not bedtime reading.
Because of the book’s focus on mathematics, I would not recommend it as a source to learn biology. Although the book gives biological background, it does so only to the extent needed to understand the mathematical problems. That limitation often leaves the reader with wrong impressions. One example is the chapter on determining DNA sequences. That chapter elaborates the statistics needed to sequence a randomly generated genome, but fails to mention that the main challenge in determining real-life genomes is the repetition of subsequences that are far longer than would be expected at random. Another example is the chapter on sequence comparison. By artificially restricting himself to DNA sequences, the author implies that they are the topic’s main application. However, most real applications compare sequences of protein rather than of DNA. Reducing the important protein case to a side remark is especially puzzling, since it can be treated in the same way as the comparison of DNA sequences.
The biggest downside of the book is its references. The author admits the reference list is “very incomplete” and I can confirm that at least for my own area of expertise. Such a subjective choice of references may be adequate in a book written for experts, but for a textbook, I would prefer a bit more diligence. Another reference-related problem is that it is sometimes difficult to tell which parts of the book present other people’s results and which are the author’s own ideas.
In summary, despite its shortcomings in biology, Mathematics of Genome Analysis is a suitable textbook for a mathematics course aimed at raising awareness of the challenges that are posed by computational biology. It is also good first reading for mathematics students and professionals who want to get an idea of the exciting mathematical problems in the analysis of biological sequences.