My December 2021 editorial elicited an unusually high number of emails sent directly to me: three. The first to arrive came from Samantha Holland, who is the audio–video archivist at the American Institute of Physics’s Niels Bohr Library and Archives. (AIP publishes Physics Today.) Holland asked me if the editorial’s title, “It’s all too much,” was an allusion to the Beatles’ song of the same name. Yes, I confirmed.
The editorial was inspired by a paper by Johan Chu of Northwestern University and James Evans of the University of Chicago.1 Having analyzed 1.8 billion citations of 90 million papers in 10 scientific fields, the pair concluded that as the number of papers in a field increases, researchers find it harder to recognize innovative work and scientific progress slows.
My second email correspondent, retired particle physicist Dick Land, told me about a past instance of innovative work that failed to achieve recognition: John James Waterston’s 1845 paper “On the physics of media that are composed of free and perfectly elastic molecules in a state of motion.” Rejected by the Philosophical Transactions of the Royal Society, the paper languished in the society’s archives until Lord Rayleigh, having encountered a reference to it, retrieved it. He grasped its significance. In his view, the failure to publish it promptly retarded the development of the kinetic theory of gases by 10–15 years. An engaging account of the rediscovery of Waterston’s paper appeared in John Howard’s From the Editor column in the May 1969 issue of Applied Optics.
Judy Lamana’s email to me acknowledged that Chu and Evans’s predictions “seem inevitable.” Nevertheless, she went on to propose a way to forestall them: Each paper should come with a concise table that identifies whether the paper describes a method, furthers existing ideas, or intends to be disruptive. She would also like papers to include a declaration about how they were reviewed. For example, “blind as to an author’s gender, and affiliations,” as she put it.
I like Lamana’s idea of an at-a-glance way to evaluate a paper’s novelty. Some journals already offer something somewhat similar. Papers in the Proceedings of the National Academy of Sciences include a distinctive blue box on the first page that outlines a paper’s significance in more or less plain English. Papers in Geophysical Research Letters include not just a lay-language summary but also a bulleted list of key points.
But such an approach, however helpful, has two drawbacks. First, the summary and key points are generated by authors and are therefore not impartial. Second, although they make it easier to decide whether to read the whole paper, you still have to read each summary. An ideal system for identifying innovative research would be unbiased and automatic. Is such a system possible?
A portent of a truly automatic method came my way recently in the form of a paper by Brian Thomas and others.2 They evaluated the feasibility of using machine learning to identify research priorities in astronomy. Specifically, they applied natural language processing to evaluate the prevalence of topics in two sets of bibliographic data: the abstracts of papers published in 1998–2010 in 10 top astronomy journals and the chapters of the 2010 decadal survey of astronomy and astrophysics that were devoted to the frontiers of astronomical science.
Thomas and company found a significant but modest correlation. Evidently, the priorities identified by the survey for the upcoming decade reflected the topics that astronomers most actively published on in the previous decade.
But are those topics of lasting impact or are they merely fashionable? For each paper in their data set, Thomas and company estimated its mean lifetime citation rate. The rate was modestly correlated with the prevalence of topics, as you might expect. But it did not correlate with topics in the decadal survey, from which Thomas and company conclude: “This result suggests that the Decadal Survey places significant emphasis on established research and may under-emphasize new, growing research topic areas.”
Because machine learning works on existing data, the approach could indeed struggle to identify truly revolutionary science. But what if that’s a feature, not a bug? Maybe the value of algorithms like Thomas and company’s lies in identifying research that, as Lamana put it, furthers existing ideas. What’s left could be the game-changing new work.