A newly proposed, citation-based metric assesses the veracity of scientific claims by evaluating the outcomes of subsequent replication attempts. Introduced in an August bioRxiv preprint by researchers at the for-profit firm Verum Analytics, the R-factor was developed in response to long-standing concerns about the lack of reproducibility in biomedicine and the social sciences. Yet the measure, which its creators also plan to apply to physics literature, has already triggered concerns among researchers for its relatively simple approach to solving a complex problem.
R-factors are calculated from what Verum director of research Sean Rife and his colleagues call golden citations, which reference manuscripts that directly replicate a particular study. (The vast majority of citations—about 95%—just mention other papers.) A paper’s R-factor is the number of confirmatory golden citations divided by the sum of confirmatory and refuting golden citations. The more reproducible a study is, the closer its R-factor is to 1.
In their paper, Rife and colleagues calculated R-factors of three cancer papers that were recently evaluated in the cancer biology reproducibility project, an initiative that aims to replicate the results of 53 high-profile cancer papers. The authors found that one study reporting inhibition of tumors in mice due to certain antibodies—which had been refuted by a repeat study in the reproducibility project—had a high R-factor of 0.8818. (The subscript number represents the total number of replication studies in the literature.) “A single failed replication doesn’t necessarily tell you that much,” Rife says. Evaluating all instances in which an attempt was made to replicate a finding, he says, is much more telling.
Additionally, the authors scanned more than 12 000 excerpts that cite other papers, judging them as confirming, refuting, mentioning, or unsure. Ultimately the aim is to build a database of around 200 000 sorted texts, including physics and math preprints from the arXiv, that would be used to train an algorithm to do the classification autonomously.
Although it takes on a critical flaw in modern science, the new metric has drawn plenty of criticism. Pseudonymous science blogger Neuroskeptic, who was one of the first to report on R-factors, writes that the metric fails to account for the fact that positive results are submitted and selected for publication more often than negative ones.
Another caveat is the tool’s simplicity, says Adam Russell, an anthropologist and program manager at the Defense Advanced Research Projects Agency who has called for solutions to improve the credibility of social and behavioral sciences research. “History suggests that simple metrics are unlikely to address the multifaceted problems that have given rise to these crises of reproducibility, in part because simple metrics are easier to game,” Russell says. Verum’s Rife, however, says R-factors are less susceptible to gaming than existing metrics are.
Marcel van Assen, a statistician at Tilburg University in the Netherlands, says the R-factor approach is similar to a procedure in meta-analyses called vote counting, which has “long been discarded because it is suboptimal and misleading.” He concludes that the R-factor “is more like two steps backward rather than one forward.”
Sabine Hossenfelder, a theoretical physicist at the Frankfurt Institute for Advanced Studies in Germany, says R-factors will be of limited use in much of the physical sciences. “In theoretical physics, results don’t usually get reproduced in an organized way,” she says. “If a result is being used, that means the user thinks it’s correct, but it might be correct in an entirely different context as originally envisioned.” Although the intentions behind the R-factor are good, Hossenfelder says, she doesn’t expect it to work in practice.
Thumbnail photo credit: Amitchell125, CC BY-SA 3.0