Introduction to Mathematical Methods in Bioinformatics , AlexanderIsaev , Springer-Verlag, New York, 2004. $59.95 paper (294 pp.). ISBN 3-540-21973-0

How are biological-sequence data organized? What is the probability that two sequences in a database—such as GenBank, the NIH genetic-sequence database—are related? And if they are related, how long ago did they diverge? How can the evolutionary history of a family of related proteins be reconstructed from observations today? What is the protein-folding problem, and how can it be solved?

In Introduction to Mathematical Methods in Bioinformatics , Alexander Isaev describes the mathematical foundations of bioinformatics—the field that addresses these questions. The science started as a practical sub-field of molecular biology, grew as a discipline in computer science, and now benefits from contributions from mathematics, statistics, engineering, and physics. At the same time, interest in bioinformatics among undergraduate and graduate students majoring in all of those fields is growing.

Going beyond algorithms and their implementation in software, Isaev discusses broadly the fundamental mathematical, statistical, and physical theories that are used today in bioinformatics and computational biology. As a mathematician, he is well positioned to detail the mathematical principles on which the analyses of DNA, RNA, and protein-sequence data are based.

The book succeeds in describing the fundamental mathematics behind computational sequence analysis, and in giving a taste of typical applied bioinformatics. For example, Isaev presents exact methods of dynamic programming for sequence alignment to motivate the description of heuristic alignment algorithms used in practice. The statistical significance of the resulting alignment is related to extreme-value statistics of random walks. The author details the theory of Markov models and hidden Markov models used to find genes; he then discusses the practical aspect of training such models. His discussion of the theory of continuous-time Markov processes and the convergence of a Markov chain to a unique probability distribution complements his treatment of the molecular-clock hypothesis, phylogenetic trees, evolutionary models, and construction of substitution matrices.

The historical trend of bioinformatics to focus mostly on computational sequence analysis and less on protein structure and function is apparent in the book, although Isaev does devote a chapter to protein folding. The discussion of protein folding, and computational biology in general, is less complete than the rather thorough discussions of sequence analysis in the other eight chapters of the book. The author’s relatively light discussion of protein structure and function limits, to some extent, the apparent connection of bioinformatics with physics.

Isaev’s book may be difficult to use as a standalone text for an advanced undergraduate course in biological physics. The motivated instructor might reduce the emphasis on sequence analysis and increase the coverage of more general aspects of computational biology. For example, I would establish the connection between statistical physics and the calculation of protein structure and function, and then calculate some examples of partition functions. Much of the material in the second half of the text is, on the one hand, too formal for typical advanced undergraduate and graduate physics students and, on the other, too restrictive in assumptions. For instance, the book excludes Dirac delta functions from probability distributions and ignores the problem of slow convergence of distribution tails to the Gaussian central limit.

The number of examples presented in the book is excellent, and instructors may use a selected set of them to rebalance the rather formal discussions in the second half of the text. In addition, tools from mathematical physics might be used to simplify some of the topics covered. For example, one could use Fourier-transform techniques and Gaussian statistics to provide an alternative derivation for the significance of sequence-alignment scores from random-walk models. One could also use cumulant expansions or steepest-descent methods and the Cramer function in a discussion of the rare events that lead to nonuniform convergence of a sum of random variables to the central limit. The discussion of maximum-likelihood estimation might be simplified to introduce readers to estimation theory and its importance as a statistical tool.

Introduction to Mathematical Methods in Bioinformatics is a strong description of the theory behind the standard methods of computational sequence analysis. The book serves as a springboard for considering current bioinformatics research problems—such as the analysis of gene chip data—whose solutions entail a mixture of mathematics, statistics, engineering, and physics. With some additional work on the reader’s or instructor’s part, the text may also serve as an introduction to computational biology in general.