Statistical Mechanics of Learning , C. A. Engel Oxford U. Press, New York, 2001. $110.00, $39.95 paper (329 pp.). ISBN 0-521-77307-5, ISBN 0-521-77479-9 paper
In recent years physicists with an interest in statistical mechanics, in their search for interesting problems, have strayed increasingly far from their traditional home area of actual physical systems. One distant field in which they have had a significant impact is learning theory. Statistical Mechanics of Learning, by Andreas Engel of the University of Magdeburg and Chris Van den Broeck of the Limburg University Center, summarizes the results that have been achieved. The authors have themselves been in the thick of this action, and they give an exceptionally lucid account not only of what we have learned but also of how the calculations are done.
Learning theory has a long history, dominated in its development by statisticians, computer scientists, and mathematical psychologists. The field’s core problem is essentially one of statistical inference. The following simple example illustrates the main points: Suppose we have a function—a rule, or input–output relation—that is implemented by some machine. (I use “machine” in a very general sense; it could be an animal or some other natural phenomenon. All that is necessary is that its output depend on its input.) We do not know in detail how the machine actually works; all we can do is observe and measure its response to some set of inputs. The general question is, then: What can we infer about the function on the basis of this example set of input–output pairs? If we try to make a machine of our own based on these examples, how well can we expect it to imitate the original machine?
Like the mathematicians before them, statistical physicists naturally focus on some simple model systems for which one can hope to calculate something nontrivial. The ones most studied are simple computational networks having layered structures. One can prove that such a machine can compute any continuous function of its input variables with just one layer of simple computational units between input and output. So such studies have had quite general implications.
The unique contribution that statistical physicists were able to make to this work was, not surprisingly, the calculation of average properties in the “thermodynamic limit” in which both the size of the network and the number of examples are taken to infinity. This calculation complemented nicely a lot of other analyses that focused mostly on worst-case scenarios, often of finite networks.
To obtain generic knowledge one has to consider random distributions of examples, which places the problem in the realm of disordered system theory (as originally developed for alloys and polymers). Elizabeth Gardner and her coworker Bernard Derrida pioneered the application to these networks of methods from spin glass theory and thus opened the door to hundreds of subsequent investigations that have provided much new insight into learning systems. The models, methods, and results are the focus of Engel and Van den Broeck’s book.
The book starts by orienting the nonexpert reader to the basic concepts in the field and then illustrates those concepts for the simplest kind of machine, the perceptron—a machine that simply computes a weighted sum of its inputs and gives a 1 or 0 output, depending on whether the sum is above or below a threshold. The authors then further develop the statistical mechanical framework, including the “replica” methods from spin glass theory, and the reader is given simple yet nontrivial examples of phase transitions. Subsequent chapters treat topics such as data clustering, the statistical dynamics of learning, the multifractal structure of the parameter space in a problem, and more complex networks. There is also a nice chapter relating the results obtained by these methods to those found by other techniques.
Given the highly technical nature of the calculations, the presentation is miraculously clear, even elegant. Although I have worked on these problems myself, I found, in reading the chapters, that I kept getting new insights. And for someone interested in applying these methods to other problems (perhaps joining in the current work on error-correcting codes and hard optimization problems, which are sketched in the final chapter), I can’t think of a better place to learn the techniques. In fact, for readers with all levels of interest, I highly recommend this book as a way to learn what statistical mechanics can say about an important basic problem.