Universities were happy to partake of the more than $21 billion in federal research grants that were doled out as part of last year’s American Recovery and Reinvestment Act (ARRA). But there was an unwanted side effect, as Susan Sedwick, associate vice president for research at the University of Texas (UT) at Austin, soon learned. Because of its emphasis on job creation, ARRA requires that funding recipients carefully document the jobs created or saved by the stimulus funding. According to Sedwick, it was taking her and one assistant two and a half hours to complete the paperwork that had to be filed for each of the 100 or so ARRA grants that UT was awarded. “With limited resources, it has been a struggle to sustain the reporting requirement,” she says.

Sedwick’s experience could become the norm as lawmakers and public officials increasingly demand to know what the government is getting from its spending on basic research. No mechanism exists today that can systematically couple basic research funding with outcomes. But a multiagency initiative is attempting to build one and, remarkably, do so without imposing more red tape on academia.

With endorsements from presidential science adviser John Holdren and National Institutes of Health (NIH) director Francis Collins, STAR METRICS—shorthand for Science and Technology for America’s Reinvestment: Measuring the Effect of Research on Innovation, Competitiveness, and Science—was given the go-ahead in late May to begin scaling up a 2009 pilot project involving seven universities into a five-year program to build a set of yardsticks with which to gauge a full range of impacts from federally funded research. From the modest initial objective of counting the jobs that are created or saved by ARRA-funded research grants, STAR program managers hope over time to develop and incorporate additional metrics that will encompass a full range of the economic, scientific, and societal benefits of research.

“It is essential to document with solid evidence the returns our nation is obtaining from its investment in research and development,” says Holdren, who also is director of the White House Office of Science and Technology Policy (OSTP). “STAR METRICS is an important element of doing just that.” The NIH and NSF have committed to providing a total of $1 million for STAR.

The impetus for STAR was provided by ARRA’s reporting requirement. Universities do not have a consistent mechanism for identifying the income streams that are used to pay the salaries of research staff, says Tobin Smith, vice president for policy at the Association of American Universities. Those universities that participated in the 2009 pilot phase simply had to provide internal data on the number of employees, the rates that are used to calculate indirect cost charges, and the like; STAR then calculated the number of jobs supported by ARRA grants.

The strict accounting mandates of ARRA are illustrative of a broad and pervasive demand from Congress, the Obama administration, and state and local governments for meaningful and objective measures of the outcomes of research—ARRA funded or not. “There has been a high demand for these types of data,” says Arden Bement, who, as NSF director until the end of May, oversaw STAR’s development. “Members of Congress are always asking the question, and it’s important that we satisfy that question to the best of our ability.” The Office of Management and Budget (OMB) “is pushing pretty hard” for agencies to provide better measures, notes NSF’s Julia Lane, codeveloper of STAR.

For academic institutions, which often complain of growing and burdensome record-keeping and reporting requirements for federal research grants, a key attraction of STAR is its promise to derive input exclusively from data that are already being collected.

The project is operating within the framework of the Federal Demonstration Partnership, an OMB-sanctioned university-government collaboration formed 22 years ago to find innovative ways to ease that burden. More than 90 universities and 10 federal agencies are partnership members, and 80 of them have some degree of interest in STAR, says Lane. It’s hoped that a set of empirical metrics will be developed that can be used to quickly respond to demands from government officials and the public on the benefits of basic research. The program is voluntary and will remain so, says Lane. “Anything that becomes mandatory loses its excitement and interest, and the data quality goes down.”

Job creation is one measure of return from research spending. But longitudinal metrics capable of following the career tracks of individuals hired with research funds would be useful for gauging the long-term benefits. Smith notes that “a breakdown in the system” has kept universities from tracing the flow of research dollars through the careers of faculty, graduate students, and postdocs. He hopes STAR will fix that. Lacking such data, research advocacy groups such as the Science Coalition can produce only retrospective studies, which examine the history of success stories such as Google or Cisco Systems, both of which originated from NSF-sponsored research at Stanford University. Such anecdotal analyses, of course, fail to account for the unsuccessful case studies.

While it is one thing to count new jobs, it is quite another to create a set of indicators covering the range of economic, scientific, and societal benefits of research. Initially, STAR plans to track economic impacts by using available measures such as the numbers of startup companies formed and patents awarded. Scientific articles produced and the frequency of their citation in other scholarly papers will provide data on the quantity and quality of the scientific outputs, while societal benefits will be gauged by improvements in health (mainly for NIH-sponsored work) and the environment. Benefits to society, however, will only begin to show up years or even decades later, after the basic research funding occurred. Predicting those long-term outcomes is often impossible; that was one factor that doomed the Superconducting Super Collider in the early 1990s.

Lane acknowledges that new measurements are needed, and she stresses that the scientific community, not bureaucrats, should be the source of improved metrics. “Right now we are at zero in describing the impacts of science investments in an analytical fashion,” she says. “It’s true that we’d like to get to 100, but you have to do it one step at a time.” For the time being, STAR can measure only a fraction of the scientific outputs, given that much of the peer-reviewed literature and the bibliographic databases that include acknowledgments of project funding sources are held by the commercial and nonprofit publishers of journals. Those publishers charge for access to their services, and licensees aren’t allowed to turn around and provide free public access to the raw data.

Some freely available searchable databases, such as RePEc (Research Papers in Economics), have sprung up in specific disciplines. But Lane argues that outputs are showing up in other places besides scholarly papers—even on YouTube and Second Life.

For all those reasons, it’s important that the scientific community develop better ways of measuring itself, in a “bottom-up” fashion, says STAR codeveloper Stefano Bertuzzi, a neuro-scientist at NIH. “If we don’t tackle the issue as scientists, I think someone else will do it for us. And it’s likely that they are not going to be scientists, but people who do not understand deeply the scientific process.”

Not all outcomes of research can be quantified, cautions Bement. “In some cases, it’s a matter of expert judgment, especially in areas where there are intangible benefits. It’s the sort of thing where we’ll never be fully satisfied by quantitative approaches.”

Caroline Wagner sees big challenges ahead for STAR, starting with the lack of a central database for tracking research inputs—the tens of thousands of grants that constitute federal spending for basic research—let alone a way to measure outputs. As a staff member at the Critical Technologies Institute, a support contractor to the OSTP in the 1990s, Wagner helped build a database dubbed RaDiUS (Research and Development in the US) that could enumerate the individual projects and their funding levels. Developed with NSF and Rand Corp funds at the OSTP’s request, RaDiUS was expensive to maintain, she says, and was closed down several years ago after it failed to attract a sufficient number of paying customers.

Building the database of inputs that became RaDiUS proved to be complex, she explains, and required the creation of a searchable architecture capable of merging data kept in a variety of formats at each federal agency. “One of the things we learned from RaDiUS is that you can’t simply dump all this information into an open database,” says Wagner, who is now the chief executive for the US arm of Science-Metrix, a Montreal-based company that specializes in providing quantitative data about science for Canadian agencies and other clients. “US government research is a very detailed and multilayered endeavor, and the data exist in many different formats, including paper,” she says. “To make a fully accessible and readable database that could track both inputs and outputs, one would have to know how the US government budgeting system works and then seek to tie inputs to outputs and outcomes traceable to those investments in a relational database. It is a complex job.”

Making sense of a jumble of data on jobs, startups, patents, and other statistics will require some doing, Bertuzzi acknowledges. But he says that NIH has “done a reasonably good job of documenting its investments” and has been putting considerable resources, including its formidable information-technology assets, into improving the associations between research funding and outcomes. And NIH director Collins has shown particular interest in analyzing and understanding the impact of NIH research. House and Senate appropriators annually press NIH for progress reports on the fight against various diseases.

But Bertuzzi cautions that tying improvements in human health to basic research will always be fraught with confounding factors. Basic research, he notes, doesn’t follow a linear path to commercial technology; development, financing, regulatory approval, and other factors all complicate the process. Nor do new technologies emerge in a vacuum. “When we make statements like, ‘In the past century the number of cases for disease X have decreased by Y percent,’ we think of attributing it all to biomedical research, but in reality, there are many other determinants of health,” he says. “How do we disambiguate the proportion that is attributable to medical research?” he asks. “It is difficult, and it requires good analytical tools. This is precisely the goal of STAR METRICS, a tool to perform better analyses.”

Former White House science adviser John Marburger, whom Lane and others credit for starting the process that led to STAR’s development, says the program is “credible and well structured, and it’s unique.” He also observes that ARRA “offers an extraordinary opportunity to trace impacts on a variety of measures,” in part because of its limited duration; the law stipulates that all ARRA spending be obligated by 30 September. The resulting funding spike acts “like the proverbial ‘impulse function’ that engineers use to analyze complex systems,” he notes. “Its consequences may ripple through the time series of the wide variety of metrics already tracked by federal agencies and policy scholars.”

The magnitude of the science and technology stimulus spending ensures that a large number of institutions will constitute the experimental sample, says Marburger, now Stony Brook University’s vice president for research. Stony Brook has received 97 ARRA grants, which will generate some 400 new reports per year. “The only comparably abrupt change in science funding is the exponential increase that occurred following Sputnik in 1957. But we didn’t have the reporting requirements then that we do today that can provide useful data,” he notes.

New and better metrics could also produce unintended consequences. Smith cautions that the types of data that STAR will generate could also be used to grade faculty on the number of jobs their research produces—not necessarily the best measure of scientific quality. Agrees Lane, “Any time you put information out, there is a danger it can be misused. In a free society, that’s going to happen.”

Got a shovel? Academic institutions already shuffle a mountain of federal paperwork related to research funding. An expanding interagency initiative claims some success in creating measurements of basic research outcomes to lighten the load.

Got a shovel? Academic institutions already shuffle a mountain of federal paperwork related to research funding. An expanding interagency initiative claims some success in creating measurements of basic research outcomes to lighten the load.

Close modal