Note: The information below is an overview. The full text of the report is available
for download. We hope that you will carefully read and consider
both of these documents; we welcome your comments, which can be sent to DDER@NIH.GOV.
An Update including frequently made comments and frequently asked questions is also available for your perusal.
As part of reinvention activities and the ongoing effort to maintain high
standards for peer review at the NIH, a subcommittee of the NIH Committee on
Improving Peer Review was formed in the fall. This Rating of Grant Applications
(RGA) subcommittee was tasked with examining the process by which scientific
review groups rate grant applications and making recommendations to improve that
process, and to make these recommendations in light of current knowledge of
measurement and decision making.
Changes to so critical an element of peer review as the system of rating
grant applications should not be implemented without the participation and
contributions of the scientific community that they will affect. Therefore, we
are broadcasting this overview of the Report On Rating of Grant Applications, in
the hope that we may benefit from the close scrutiny that we anticipate it will
be given. In your reading of it, you may want to consider separately the
various aspects of the recommendations that might be implemented; as the
report itself points out, recommendations can be implemented independently of
The report has been scrutinized by staff of the NIH, and is the product of a
careful and conscientious working group. The full report has been sent to the
directors of the Institutes and Centers, and been an item of discussion for all
of the relevant NIH-wide standing committees: the Extramural Program Management
Committee (EPMC), the Review Policy Committee (RPC), and the Program Officers
and Project Officers Forum (POPOF).
Current Rating System Works Reasonably Well
The current system for rating grant applications works reasonably well. No
one appears to believe that poor quality science is consistently being given
good scores or that exceptionally good science is being given poor scores, and
this is the gold standard for a reasonable system of rating science. Thus, we
recognize that there is considerable commitment to the present method of "doing
business" and a disinclination to change it.
Why Change a System That Works?
In today's funding environment, it becomes increasingly important to ensure
that scores are as reliable as they can be, and that NIH staff have the maximal
amount of useful information on which to base funding decisions. So, while
worthy science is being given good scores, there is still a range within "good"
that is distinguished by reviewers but that fails to be conveyed via the scores
assigned under the current system. This loss of information is due in part to a
tendency of initial review groups to cluster priority scores in the "outstanding"
region, which could be attributed to qualitative differences in the science
being reviewed by different groups or to differences in the scoring behavior of
different groups. Percentiling was an attempt to account for and counter by
statistical means the differences among review groups in their scoring behavior
in this regard, but the subcommittee felt that the very arithmetic of priority
scores and percentiles gives an impression of a greater precision of discernment
than is really the case. Besides the compression of scores within a particular
range, other information that tends to be lost under the present rating system
is the initial review group's assessment of the scientific significance of a
grant application as distinguished from its assessment of an application's
methods and feasibility.
Committee Task and Method
The work on rating of grant applications grew out of the larger context of
the reinvention of NIH extramural activities. The Rating of Grant Applications
(RGA) subcommittee was established by the Extramural Reinvention Committee to
examine the grants review rating process with an eye toward fine-tuning the
current system. The subcommittee was composed of staff representing different
ICDs. Several outside experts in the behavioral aspects of decision making and
in psychometrics served as consultants.
In defining the scope of its activities, the subcommittee viewed the initial
review of applications as serving two functions: to assess the scientific and
technical merit of an application through a narrative and one or more
quantitative indices, and to comment on other aspects of the application that
should be clearly separated from the assessment of scientific merit, e. g.,
biosafety, animal and human subject welfare. During the course of its
discussions, the subcommittee developed a set defining characteristics used in
initial scientific peer review, which served as points of departure in their
subsequent discussions and in the development of their recommendations.
Defining Characteristics of Peer Review:
a. The rating assigned to an application should be a quantitative
representation of scientific merit alone and not represent any other property of
b. The criteria used to review applications should include all aspects of
scientific merit, should be as salient as possible to reviewers, and should form
the only basis for both the quantitative ratings and narrative critique of each
c. The ratings of all the reviewers should be equally able to influence the
final score of scientific merit.
d. The potential for "gaming" the system (i.e., consciously
unconsciously introducing inequities based on factors other than scientific
merit or otherwise distorting the rating of scientific merit) should be
e. The manner in which review results are reported should summarize the
totality of information contained in the review group's ratings.
f. The form in which review results are reported should be useful to those
making funding decisions and informative to advisory councils and applicants.
g. The rating system should encourage reviewers to make as fine
discriminations as they can reliably make.
h. Procedures should minimize the burden for reviewers both before and at
the review meeting.
i. Federal policy issues (e.g., gender/minority representation, protection
of animal and human subjects) must be addressed appropriately.
The Committee agreed that it should not be constrained by current practice
but should be prepared to propose any workable system, even if radically
different, should such a system be superior. Psychometric experts and the
literature on decision making and evaluative processes were consulted, available
data were analyzed, and simulations were made. On the basis of all of this
information, the recommendations below have been developed. These
recommendations, which will be the basis for discussion about possible changes
in the scoring system, are now made available for your scrutiny and comment.
It is not the case that these recommendations must be considered as a packet; we
are interested in opinions regarding the merit and feasibility of piloting each
of them individually.
Recommendation 1: The three proposed review criteria listed below should
be adopted for unsolicited research project grant applications.
Significance: The extent to which the project, if successfully carried out,
will make an original and important contribution to biomedical and/or behavioral
Approach: The extent to which the conceptual framework, design (including,
as applicable, the selection of appropriate subject populations or animal
models), methods, and analyses are properly developed, well-integrated, and
appropriate to the aims of the project.
Feasibility: The likelihood that the proposed work can be accomplished by
the investigators, given their documented experience and expertise, past
progress, preliminary data, requested and available resources, institutional
commitment, and (if appropriate) documented access to special reagents or
technologies and adequacy of plans for the recruitment and retention of
Recommendation 2: Reviews should be conducted criterion by criterion, and
the reviewers' written critiques should address each criterion separately.
Recommendation 3: Applications should receive a separate numerical rating
on each criterion.
Recommendation 4: Reviewers should not make global ratings of scientific
Recommendation 5: The rating scale should be defined so that larger
numerical values represent greater degrees of the characteristic being rated and
the smaller values represent smaller degrees.
Recommendation 6: The number of scale positions should be commensurate with
the number of discriminations that reviewers can reliably make in the
characteristic being rated. An eight-step scale (0-7) is recommended on the
basis of the psychometric literature; however, a maximum of 11 steps (0-10) are
Recommendation 7: The rating scale should be anchored only at the ends.
The performance of end-anchors should be evaluated and other approaches to
anchoring should be investigated as needed.
Calculation, Standardization, and Reporting of Scores
Recommendation 8: Scores should be standardized on each criterion within
reviewer and then averaged across reviewers. The exact parameters for this
standardization should be defined by an appropriately constituted group.
Recommendation 9: Scores should be reported on the scale used by reviewers
in making the original ratings. Scores should be reported with an implied
precision commensurate with the information contained in the scores. Two
significant digits are recommended.
Recommendation 10: If a single score is required that represents overall
merit, it should be computed from the three criterion scores using an algorithm
that is common to all applications. The Committee favors the arithmetic average
of the three scores; however, an appropriately constituted group should test and
choose the algorithm to be used.
Any or all of these recommendations could conceivably be implemented as part
of the peer review process. We are currently considering the pros and cons of
each recommendation, and the positive and negative impacts that each could have
on the peer review system and on other aspects of the awarding of research
grants at NIH. You are invited to read the Report of the Committee on
Rating of Grant Applications and to offer your comments. Decisions on implementation
of any of these recommendations would need to be made by January of 1997 if they
were to be in place for the review of grant applications to be funded in fiscal year 1998.
Downloading This Document
The full text of the report is available in Text and PDF format.
To download the appropriate file, please choose the format of your choice:
Note: For help accessing PDF, RTF, MS Word, Excel, PowerPoint, Audio or Video files, see Help Downloading Files.