Paul Steinhardt’s criticism1 that multiverse inflationary cosmology, which was flexible enough to explain both negative and positive results of the BICEP2 experiment (see the Commentary by Mario Livio and Marc Kamionkowski, Physics Today, December 2014, page 8), is unfalsifiable has resulted this past year in a renewed interest in the old debate: What defines the scientific method?2 What makes a good physical theory? While the underlying inflationary theory is mathematically sophisticated and modern, the current debate itself has been surprisingly qualitative, similar to what it could have been five decades ago, when Karl Popper brought falsifiability into the spotlight. Such data-less arguments that are often binary to the extreme—for example, whether falsifiability should be retired altogether3—seem out of place in the data-driven, nuanced scientific world.

In fact, we scientists already have a mathematical framework to deal with falsifiability quantitatively. It is based on statistical principles that have long been a part of science. In particular, falsifiability is not an independent concept: Its graded, real-valued generalization emerges automatically from the empirical nature of science, much like the way Occam’s razor transformed itself from a qualitative philosophical principle into a statistical result.4,5 

The emergence of falsifiability from statistical inference is easiest seen in the language of Bayesian statistics. Suppose we want to decide which of two theories, T1 and T2, explains the world better. Our a priori knowledge of that is summarized in Bayesian priors, P1 and P2. After experimental data x are collected, the ratio of posterior probabilities of the theories is given by Bayes’s theorem, P(T1|x)/P(T2|x) = P(x|T1)P1/ P(x|T2)P2, where P(x|T1) and P(x|T2) are the likelihood terms—the probabilities to get the observed data within the theory. The likelihood increases when the theory “fits” the data. However, because probabilities must be normalized, the likelihood scales inversely with the total number of typical data sets that could have been generated within the theory. The tradeoff between the quality of fit and the statistical complexity is known as Bayesian model selection, and it is used routinely in modern statistics. Against statistically complex theories it provides an automatic Occam’s razor that depends only weakly on specifics of the priors.

At an extreme, any data set is equally compatible with an unfalsifiable theory and hence can come from it with the same probability. Thus the likelihood is the inverse of the total possible number of experimentally distinct data sets. In contrast, a falsifiable theory is incompatible with some data and hence has a higher probability of generating other, compatible data. The difference between the theories grows with the number of conducted experiments. Thus within Bayesian model selection, any falsifiable theory that fits data well wins eventually, unless the unfalsifiable theory had astronomically higher a priori odds. For example, as pointed out by biologist J. B. S. Haldane, evolution cannot generate “fossil rabbits in the Precambrian.” Thus Bayesian model selection leads to an immediate empirical, quantitative choice of evolutionary theory over creationism as the best explanation of the fossil record, without the need to reject creationism a priori as unscientific.

In other words, there is no need to require falsifiability of scientific theories: The requirement emerges automatically from statistical principles, on which empirical science is built. Its statistical version is more nuanced, as has been recognized by philosophers.5 The practical applications are hard and require computing probabilities of arbitrary experimental outcomes. In fact, it was an error in such a computation that rekindled the current debate. In addition, there is an uncomfortable possibility that statistics can reject a true theory that just happens to be unfalsifiable. Yet, crucially, statistical model selection is quantitative and evidence driven; it potentially moves the inflationary multiverse debate and similar discussions from the realm of philosophy to that of empirical, physical science. Whereas inflation predicts many different worlds, it is incompatible with others—the theory is not completely unfalsifiable. One can hope to end the long-running arguments about its scientific merits by calculating the relevant likelihood terms.

2.
G.
Ellis
,
J.
Silk
,
Nature
516
,
321
(
2014
).
3.
S.
Carroll
,
“2014: What scientific idea is ready for retirement?”
https://edge.org/response-detail/25322 (
2014
).
4.
D. J. C.
MacKay
,
Information Theory, Inference, and Learning Algorithms
,
Cambridge U. Press
(
2003
).
5.
E.
Sober
,
Ockham’s Razors: A User’s Manual
,
Cambridge U. Press
(
2015
).