In this era of big data, most researchers in the physical sciences will encounter statistics and data analytics at some point in their careers. And data science skills are not just relevant to the physical sciences—they are applicable to a wide array of modern problems in areas ranging from health care to marketing. Data Analysis Techniques for Physical Scientists by Claude Pruneau is thus of potential interest not only for physical scientists but also for those interested in other fields that deal with large data sets and their challenges.

Pruneau draws on his extensive research experience at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory and at the Large Hadron Collider at CERN, and his book contains much of the fundamental knowledge required for graduate students studying nuclear and high-energy physics. As such, it lies in a sparsely populated middle ground in the literature. Books specifically about data analysis techniques, such as The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition, 2009) by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, usually assume some prior knowledge of basic statistics and contain little introductory material. The few books about data analysis methods for physical scientists, including Roger Barlow’s Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (1989) and Brian Martin’s Statistics for Physical Sciences: An Introduction (2012), are aimed primarily at undergraduates.

Data Analysis Techniques for Physical Scientists, in contrast, presents a comprehensive, high-level treatment of topics specific to nuclear- and particle-physics experiments. It is also accessible to researchers who work with multidimensional data sets and are interested in learning about the data analysis methods used in large-scale particle experiments, a topic that’s typically not covered in general statistics texts.

The introductory chapter takes the reader on a philosophical journey that sets the tone for the rest of book. Using anthropological and historical arguments, it explains the meaning and purpose of the scientific method. The chapter illustrates that statistical methods are essential for extracting significant scientific results. It is a delightful read for scientists and nonscientists alike.

The 13 chapters that follow are divided into three parts. Each chapter closes with problems designed to deepen students’ understanding of the concepts discussed. Those exercises allow the reader to derive relevant formulas by means of creative and realistic examples often taken from actual experiments.

The first part of the book, “Foundation in Probability and Statistics,” gives a thorough and mathematically rigorous tour of its topic that could in fact be a standalone introduction to advanced statistics for a broad range of readers. The section features a modern account of the frequentist and Bayesian interpretations of probabilities. The in-depth catalog of the language of statistics and probability is a welcome resource that the reader will be able to reference as needed.

Part 1 also contains three chapters devoted to classical inference, a formal introduction of confidence intervals and hypothesis testing, and an excellent review of Kalman filtering. A study of Bayesian inference methodology brings part 1 to a conclusion.

Part 2, “Measurement Techniques,” is equally outstanding. Discussions of particle decays, cross sections, and corresponding observables are followed by thorough treatments of particle identification, event reconstruction, and correlation functions. Instrumental effects, detection efficiency, and unfolding methods also receive extensive consideration. The brief part 3, “Simulation Techniques,” highlights Monte Carlo methods.

Some topics are missing from this otherwise thorough 716-page book. As he admits in his preface, Pruneau does not go into the details of detector technologies. In my view, that gap is not detrimental to a reader interested in learning specifically about data analysis. However, some missing topics deserved at least a brief mention. A word on minimizing experimenters’ bias by performing blind analyses would be beneficial, especially to graduate students entering the field. Overviews of some commonly used advanced algorithms, such as neural networks and decision trees, would complete this modern data analysis practitioner’s toolbox. Fortunately, those topics are covered elsewhere—for example, interested physical-sciences students may find a good supplemental read in Adrian Bevan’s Statistical Data Analysis for the Physical Sciences (2013).

Data Analysis Techniques for Physical Scientists offers an accessible but rigorous and comprehensive presentation of data analysis techniques in modern large-scale experiments. Furthermore, much of the book is applicable beyond the physical sciences; it is a useful resource on probability and statistics that would benefit anyone who works with large data sets. Taken as a whole, it is an exceptional general reference for graduate students and seasoned experimental researchers alike.

Emilie Martin-Hein earned a PhD in high-energy physics at the University of California, Irvine, in 2009 and a data mining and applications graduate certificate at Stanford University in 2017. She currently works at Skyline College and City College of San Francisco in California.