Observing time on the Hubble Space Telescope is a scarce resource. Each year only one-fifth of the more than 1000 submitted proposals survive the rigorous system of peer review conducted by the Space Telescope Science Institute (STScI) in Baltimore. With such a competitive process, it’s essential to minimize bias based on factors such as gender, race, career stage, institutional size, and geographic origin.
Five years ago, we learned that some degree of bias had long been inadvertently baked into the Hubble time-allocation process. STScI associate director for science Neill Reid found that from 2001 to 2012, the success rates for proposals led by female principal investigators (19%, on average) were worse than the success rates of their male counterparts (23%). Although the difference in success rates was not statistically significant for any given observing cycle, the imbalance was systemic, with female PIs falling short of male PIs year after year. The disparity became statistically significant when data from just three cycles were cumulatively analyzed. A pattern of potential accumulation of disadvantage mediated by gender appeared in the analysis.
To combat that bias, last year the STScI adopted a system of dual-anonymous review, in which the names of the reviewers and the investigators are made known to each other only after the review is complete. Such a double-blind review is among the first of its kind for resource allocation in the sciences. And the early returns are impressive: In the most recent allocation cycle, for the first time in the 18 years of relevant record-keeping, proposals with women PIs had a higher success rate than those led by men. The results suggest that double-blind review has the potential to level the playing field, not just for women but for other marginalized and disadvantaged groups.
What over who
In response to Reid’s study, in 2017 the STScI hired an independent consultant, Stefanie Johnson of the University of Colorado, who has worked with businesses and science groups in dealing with conscious and unconscious bias, to personally observe the Cycle 25 peer-review process. As in previous years, the STScI invited approximately 150 astronomers from the international community to review the roughly 1100 to 1200 proposals and make recommendations to the STScI directorate. In a first round of sifting, essentially a triage stage, the reviewers, working remotely, assigned grades to the various proposals. Then STScI staff generated a list of the highly ranked ones, which were discussed in Baltimore by panels of 8 to 10 members meeting in-person.
Johnson and her graduate student, Jessica Kirk, found no evidence of gender bias in the preliminary grading that determined which proposals made it to the discussion stage. It was only in the in-person discussions that bias reared its head, and Johnson and Kirk noted a potential reason for it: Much of the in-person discussion on a given proposal focused on the track record of the applicant and colleagues, rather than on the science he or she was proposing to do. Johnson and Kirk recommended an anonymous process at the discussion stage to help refocus the dialogue.
Acting on that recommendation, STScI director Kenneth Sembach commissioned a working group, led by one of us (Strolger), to assess the state of anonymous reviews in related fields and put forth a recommended design. Although there are examples of double-blind refereeing of scientific papers, we could not find precedents for dual-anonymous reviews for allotting scientific resources. A Hubble dual-anonymous review would likely be the first of its kind in the physical sciences.
Community buy-in for such a bold step was obviously going to be key. Proposers would have to make the effort to anonymize their submissions, which would involve removing obviously identifying information from the text and figures and crafting prose in a voice that did not express identity. Reviewers, in turn, would have to resist the urge to spend time guessing who the applicants were, a tricky task given the small size and intricate connections within the field. Instead, reviewers would need to identify the important underlying intellectual questions and evaluate them, without the help of knowing the prior record of the proposer.
To make sure they met the needs of the thousands of astronomers who participate in the Hubble time-allocation process, the working group solicited feedback from the community. A common concern was that anonymization would prevent proposers from leveraging their expertise and reputations for further success. But those who have been successful in the past have been so for a reason, and studies support the idea that those applicants would be able to communicate the importance of their work effectively. Moreover, most agreed it was worth the trade-off to allow new scientists with radical and compelling ideas to access this important public resource.
Another common concern was that it would be more difficult to catch an unqualified or unscrupulous party who submitted a proposal that sounded good but was, in reality, infeasible, or who overpromised what could be delivered. In response, the working group added an option: a final, unblinded post-evaluation round to cross-check the veracity of the team. The panel could then decide if any proposal should be disqualified on the basis of a lack of expertise or an inability to carry forth the program.
Within the working group, there was some concern that the reviewers might not buy into the idea of anonymous review. They could, perhaps even unintentionally, undermine the process by inferring who team members were or where they were from, or instigating a discussion that could end up being subversive. As a solution, the working group recommended the use of witnesses, known as levelers, in each panel. Unlike panel chairs, who would be focused on the scientific discussion, levelers would monitor conversations and ensure they stayed focused on the science content. Somewhat like field umpires, they would have the power to “stop play” (halt a conversation) and refocus discussion on the science, not the scientists.
An impressive first run
The working group presented its recommendations to the STScI directorate and the Space Telescope Users Committee in the spring of 2018. Sembach then decided to implement a dual-anonymous review for Cycle 26, which commenced with a call for proposals in May 2018. The call included detailed information on the process, guidelines for proposers on how to sufficiently anonymize their submissions, and instructions for reviewers on the discussion process.
As chair of the set of panels that conducted the in-person review, one of us (Natarajan) roved around and attended discussions. There was a totally different timbre to the discussions compared with previous cycles. Not once did I witness any mention of who the proposers might be. When it reportedly did happen, the panelists worked to reframe the discussion. For example, one reviewer wondered aloud who a particular team was for the sake of determining its access to additional data sets and telescopes. Almost immediately, the panelists realized the question was poorly phrased. The reviewer really wanted to know what supporting data would be brought to bear, something that the proposers had failed to mention. It was not a problem with the lack of names on the proposal, but rather a deficiency in the proposal that easily could have been addressed.
There was a noticeable shift in the depth of discussions as well. It was clear that reviewers had read the proposals very diligently, and that without the distraction of names and institutions, there was no recourse but to focus on the proposed science. After the panels ranked proposals, they were given the option of conducting a post-evaluation review and assess expertise by unblinding the names of the team members. Most panels declined—the accepted proposals themselves were sufficiently persuasive to convince the panel of the ability of the team to execute the proposed project.
Despite being the debut run, the Cycle 26 review process unfolded very smoothly. And satisfyingly, the results were impressive. For the first time since the STScI has kept track, women PIs had a higher success rate than men (8.7% versus 8.0%, respectively). And that comes in a cycle in which proposals led by women accounted for 28% of the total, up from 19% in 2001.
Even before the Cycle 26 review began, Sembach emphasized at a debriefing that this was not a pilot—dual anonymous is going to be the standard process going forward. Given the outcome in the very first round, we are very optimistic that the process will reward originality and intellectual heft as markers of successful proposers while helping mitigate biases.
Building on our success
Several easy improvements would vastly help in future iterations. First, it is important for the STScI to train astronomers to write effective proposals for double-blind reviews. Such training could come in the form of clear guidelines on a website and as recorded interactive webinars with an opportunity for attendees to ask questions. The instruction should focus on how applicants can still effectively disseminate their expertise and experience and make a convincing case for their science. We can also improve the process by not disclosing the grades assigned during triage to the reviewers meeting in person.
Perhaps the most important lesson from Cycle 26 is that levelers are crucial to the success of the process. The levelers, who for Cycle 26 were chosen from the STScI scientific staff, were extremely valuable both as knowledgeable resources regarding the process and as witnesses to ensure that panelists abided by the rules. Their presence in the room is crucial to ensuring the integrity of the process in the early rounds of the double-blind review. In the future, the levelers could also collect data on the effectiveness of individual panels’ reviewers, chairs, and vice chairs.
We are thrilled that the STScI has pioneered a review process to level the playing field in astronomy. Going forward, we plan to analyze data from future convenings to ferret out other biases and track how they manifest in the final outcomes of resource allocation. Gender is not the only vector along which biases dog us; race, age, career stage, and the institution of the proposer (private versus public university, research versus teaching colleges) are also axes along which the data need to be parsed.
It behooves us to look at all the barriers that handicap talented members of our community from realizing their full creative and intellectual potential. We hope that the very encouraging first results of this process will catalyze the other funding agencies that allocate scarce resources from NASA and NSF to follow suit and institute double-blind grant proposal evaluations.
Lou Strolger is an observatory scientist in the science policies group at the STScI, where he was the chair of the working group on anonymous proposing and a chief architect in its implementation. Priyamvada Natarajan, an astrophysicist and professor in the departments of astronomy and physics at Yale University and an avid advocate for equity and access, chaired the Cycle 26 time-allocation committee that implemented the double-blind review of Hubble proposals.
The implementation of the double-blind review for Cycle 26 required commitment, effort, and leadership from the top brass at the STScI. Ken Sembach, Neill Reid, Nancy Levenson, Claus Leitherer, and many other staff members deserve acknowledgment for setting up the logistics and arranging a smooth and seamless review process. The reviewers from the astronomical community who generously gave their time and participated with utmost sincerity also deserve thanks. Special mention should be made of the levelers who aided the reviewers, overseers from NASA headquarters and Goddard Space Flight Center, and other institutional representatives who participated in and facilitated this round of evaluations.