Skip to Main Content
Skip Nav Destination

The case for tracking self-citations

19 September 2017

As the h-index becomes the standard for measuring researcher impact, the risk for gaming the system grows.

In 2005 physicist Jorge Hirsch introduced the h-index, a citation-based measure of scientists’ research output and impact. A scientist has an index of h if he or she has published h articles each of which has received at least h citations. This simple formula has taken the research community by storm, and it seems to have firmly established citation counting as a way to order the ranks in today’s fast-moving world of scientific publishing.

The h-index is now widely used to guide decision making on who gets hired, funded, and promoted in academia. The European Research Council and the global publisher Elsevier, among other groups, promote the use of the h-index for evaluating research performance. Hirsch suggested in his paper that for faculty at major research universities, an h-index of 12 might be typical for advancement to tenure, whereas 18 could be enough to achieve full professorship.

With the stakes so high, it’s no wonder that some scientists obsess over their scores, carefully checking for their name in the little treasure chest of references at the end of every new paper that comes out in their field. But a problem could arise if scientists decide to go a step further and cross an ethical line. There is growing concern that some researchers may resort to superfluously citing themselves in their papers to gain an edge.

For an individual, self-citation is tempting because it can quickly boost scholarly impact and visibility with little effort and no negative consequences. Yet it can also dramatically alter the direction of science by influencing the flow of ideas and making it increasingly difficult to recognize and reward good research. Consider the young, aspiring researcher who, in an effort to gain recognition, chooses a project based not on interest or value but on how quickly the work will get cited. Consider that men have self-cited 70% more than women have in scholarly publications over the past two decades, according to a recent study. If left unchecked, excessive self-citing may jeopardize the very ideals of fairness that quantitative measures such as Hirsch’s h-index were developed to uphold.

One approach to dealing with the problem is to penalize those who abuse the practice, perhaps by weighting self-citations differently in citation-based metrics. But that would likely lead to high levels of self-censorship and provoke endless debate about what qualifies as bad behavior. After all, there are times when it is appropriate to self-cite, such as when a paper is the result of a coordinated, sustained, leading-edge research effort. Another option is to exclude self-citations from citation tracking. Databases like Web of Science and Scopus have made it possible to nix them from performance reports, which helps curb deliberate attempts to boost citation counts. But that also unfairly punishes those who use self-citations appropriately, and it does nothing to address the fact that once you cite yourself, others follow. Overall, attempts to exclude or penalize provide only modest improvements to how we measure productivity, ultimately falling short of promoting good citation habits.

The best solution is to be transparent with self-citation reporting. Doing so is easy, as we need to turn the h-index in on itself to create what Alessandro Blasimme, Effy Vayena, and I call the self-citation index, or s-index: A scientist has an index of s if he or she has published s articles each of which has received at least s self-citations.

In a recent paper in Publications, we demonstrated what the transparent reporting of self-citing behavior might look like. For a given researcher, we examined all the papers that cited each of that person’s publications and tallied how many included the person as an author. We translated those data into a figure that displayed the h and s scores for each researcher and broke down the frequency of self-citation. Here we apply that method for three physicists from the same area of research (see figure). It’s clear that self-citation behavior differs—notably, the majority of the third physicist’s citations are from his or her own papers.

Self-citation examples
A snapshot of the citation habits of three physicists in the same field reveals the propensity of some scientists to cite themselves. Although the third researcher has the highest h-index, he or she also has more self-citations (red) than citations from other groups (gray). Including the s-index as an additional metric would provide important context.

Pairing the h and s indices would highlight the degree of self-promotion and help dampen the incentive to excessively self-cite. We would be able for the first time to see clearly how much the different scientific fields are resorting to self-citing, thereby making excessive behavior more identifiable, explainable, and accountable. That’s something that the practice of excluding self-citations can’t accomplish. As a result, the s-index would make an important contribution to the fair and objective assessment of scientific impact and success.

At the moment, the data needed to generate s-index scores are locked away. Citation information is not easy to access, especially in a machine-readable fashion that would allow us to look deeply at self-citing patterns. The Initiative for Open Citations is doing the critical work of trying to provide unrestricted access to scholarly citation data in a way that is structured, separable, and open. Let’s get behind that initiative so we can generate more dependable metrics for research impact.

Justin Flatt is a Marie Curie Actions Fellow at the University of Helsinki in Finland.

Close Modal

or Create an Account

Close Modal
Close Modal