Let me begin by saying that I greatly enjoyed reviewing this book for the simple reason that it contains a lot of comments that are just invitations to go dig into the background, but I will leave some of these diversions until later. The author calls this book a “technical narrative” as opposed to a text. My suspicion is that it might be more effective than many texts, particularly when used as “case studies” combined with a more standard text. As in most texts, some of the material presented can be, and usually is, misused, so more on that later.
One of the things I like is the author's liberal use of quotes, partly, I suspect, because I find it difficult to get students to pay enough attention to comments such as George Box's (on the gas law, pV = nRT): “All models are wrong, but some are useful.” Students, unfortunately, seem more attuned to “Theorems.” There are lots of these too, many concerning the derivation of useful probability distributions, but presented in the informal manner that I prefer. Several are derived in more than one way. Neat!
One of the “rabbit holes” this book opened (p. 343) was the speculation on why Shannon used H for entropy and why he called it “entropy.” As I was in what had been Shannon's department at Bell Lab for 18 years—although he had moved to MIT before I joined it—this captured my interest. The first part of the question can be settled by looking at Boltzmann's Collected Works. In 1872 he was using an ordinary Roman E, but by his 1895 paper [L. Boltzman, “On certain questions of the theory of gases,” Nature 51, 413–415 (1895)], he had switched to H. Exploring the more interesting speculation of whether Shannon discussed entropy with Von Neumann leads to contradictory articles. The first is a 1964 book chapter by Tribus (then Dean of the Thayer School of Engineering at Dartmouth) and expanded on in a 1971 article, “Energy and Information” in Scientific American, where he describes a 1961 interview with Shannon, who recalls meeting with Von Neumann. Then there is a 1984 interview with the equally distinguished Robert Price in IEEE Communications Magazine, where Shannon comments, “I'm quite sure that it did not happen between Von Neumann and me.” My suspicion is that the key is not in Shannon's recollection, which was getting shaky when I talked to him at the Brighton International Symposium on Information Theory (ISIT) in 1984, but in Von Neumann's personality. Would the man who wrote “if the discipline is under the influence of men with an exceptionally well-developed taste” miss an opportunity to get the measure of a hot new post doc from MIT who was already destined for Bell Labs Research.
Parts of Chapters 3, 4, 6, 9, and 10 cover analyses of random processes and, in particular, power spectra and related problems. Estimation of power spectra is my specialty and, in my opinion, these parts have value. They are very nicely presented and convey the essence of the problems well. However, most of the methods covered are “classical” in the sense that they use periodograms essentially in the original form suggested by Stokes in 1879 and named by Schuster in 1898. Lord Rayleigh's 1903 paper “On the spectrum of an irregular disturbance” proved that the periodogram is inconsistent, that is, its variance does not decrease with sample size. Now I share Silverman's admiration for Fermi, and more than once I have heard Fermi quoted as having said “Most great scientists have one idea in their working life, except for Lord Rayleigh who had two.” So, when Rayleigh proved that a method has a serious problem, I find it particularly distressing that physicists are still using it more than a century later. The periodogram is not the spectrum, but only a particularly poor estimate of the spectrum. The spectrum was rigorously defined by Wiener, Cramér, and others in the 1930s and possibly best summarized in Doob's 1953 book.
A second problem with the periodogram is that inconsistency is minor compared to bias. I have personally encountered real data where the periodogram is biased by factors exceeding 107 over much of the frequency range [e.g., Fig. 18, D. J. Thomson, “Spectrum estimation techniques for characterization and development of the WT4 waveguide—II,” Bell Syst. Tech. J., 56, 1983–2005 (1977) and Fig. 1, D. J. Thomson and C. L. Haley, “Spacing and shape of random peaks in non--parametric spectrum estimates,” Proc. R. Soc. A 470, 20140101 (2014)]. A factor of 107 would be large even for a politician discussing the national debt, but such errors have no place in physics. Let me stress that in that Silverman's radioactive decay example, a departure of 1 dB from a white spectrum would be an epochal discovery, so bias will not be a problem there. However, if your work involves barometric pressure, space physics, geomagnetic, or seismic data, to name a few, you should expect to encounter data where the spectrum has a range (max/min) of at least 108, and often much more, and in these cases the periodogram will simply give a wrong answer. It would be surprising if subsequent physical inferences based on such an estimate were better. Try a multitaper estimate.
With the climate examples, Silverman is correct that there is a definite warming trend (it is at least an 8σ effect), but a periodogram would be unable to detect that the year measured in temperature is offset from the calendar year by precession. (Unfortunately, none of 24 IPCC climate models tested in 2009 reproduced this; see my News & Views column [D. J. Thomson, “Climate change: Shifts in season,” Nature 457, 391–392 (2009)] on this study.) Subtracting a periodic term with a slightly wrong length of year gives a nasty bias, but filtering the precession effects out of the data allows one to see that the solar sensitivity in global temperature data agrees closely with Stefan–Boltzmann. One obtains 0.0529 °C/(W/m2) from data, 0.0527 theoretical. The same analysis gives a sensitivity of 2.94 °C for doubling atmospheric CO2.
A very minor quibble is that the Wiener–Khintchine theorem (proving that the spectrum and autocovariance sequence are a Fourier transform pair) should properly be known as the Einstein–Wiener–Khintchine theorem. Einstein gave the first proof of it in 1914, 20 years before either Wiener or Khintchine. The great Russian probabilist Yaglom noted [A. M. Yaglom “Einstein's 1914 Paper on the theory of irregularly fluctuating series of observations,” IEEE Signal Process. Mag. 4, 7–11, (1978)] that Einstein's derivation is more satisfactory than either of the later two. The zero-padded periodogram and the Bartlett estimate of autocovariances are also a Fourier transform pair. Consequently, the bias and variance of the periodogram (recall Rayleigh's proof that it is inconsistent) imply that the autocovariance estimates are also unreliable. [L. T. McWhorter and L. L. Scharf, “Multiwindow estimators of correlation,” IEEE Trans. Signal Process. 46, 440–448, (1998)] showed that the definition of autocovariances implies a multitaper estimate. This is relevant because the Akaike information criterion (AIC) used to compare parametric models in Chapter 9 depends on prediction error. Wiener and Kolmogorov showed that the prediction variance is the geometric mean of the spectrum, so if one uses a bad spectrum estimate, here the periodogram, one gets a similarly biased AIC. Before leaving time series analysis, I note another recent book on the subject by Mills, The Foundations of Modern Time Series Analysis (Palgrave Macmillan, Houndmills, Basingstoke, Hampshire, England, 2011), written in a similar style but from a different viewpoint. (Here, I have wondered if “Foundations” should be “History.”) Parts of both books will be required reading for my time series course next winter.
Chapter 8, The Guesses of Groups, was particularly impressive. It corrects a common misinterpretation of Galton's observation that the average estimate of the weight of an ox at a fair was better than the individual estimates. This, however, was not “wisdom of crowds,” as commonly represented, but of people whose living depended on being able to make accurate estimates of the yield of meat from a live animal. But the fun part of this chapter is the description of his experiments with estimates made by his physics class and elsewhere. Future classes, beware!
In summary, this is a great book. It has a few defects but these are minor. I particularly like the way statistics and physics are interwoven. The notation is clean and comprehensible, much like that in Kendall and Stuart's “Advanced Theory of Statistics,” and not obfuscated by a lot of the pretentious notation that has contaminated many more recent books. Although it is not designed as a text, it is definitely going on my recommended reading list.
David J. Thomson, FRSC, is a Professor and Canada Research Chair in Statistics and Signal Processing in the Department of Mathematics and Statistics at Queen's University, Kingston, Ontario. He has done his Ph.D. in Electrical Engineering and he is both a Chartered Statistician and a Licensed Professional Engineer. His research is on statistical spectrum estimation, a field in which he is best known for inventing the multitaper method [D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. IEEE 70, 1055–1096 (1982)]. He also works on climate, space physics, seismology, and other applications that intrigue him.