• An index to quantify an individual’s scientific


  •   
  • FileName: 16569.full.pdf [preview-online]
    • Abstract: An index to quantify an individual’s scientificresearch outputJ. E. Hirsch*Department of Physics, University of California at San Diego, La Jolla, CA 92093-0319Communicated by Manuel Cardona, Max Planck Institute for Solid State Research, Stuttgart, Germany, September 1, 2005 (received for review

Download the ebook

An index to quantify an individual’s scientific
research output
J. E. Hirsch*
Department of Physics, University of California at San Diego, La Jolla, CA 92093-0319
Communicated by Manuel Cardona, Max Planck Institute for Solid State Research, Stuttgart, Germany, September 1, 2005 (received for review
August 15, 2005)
I propose the index h, defined as the number of papers with (i) Total number of papers (Np). Advantage: measures pro-
citation number >h, as a useful index to characterize the scientific ductivity. Disadvantage: does not measure importance or
output of a researcher. impact of papers.
(ii) Total number of citations (Nc,tot). Advantage: measures
citations impact unbiased total impact. Disadvantage: hard to find and may be inflated
by a small number of ‘‘big hits,’’ which may not be repre-
sentative of the individual if he or she is a coauthor with
F or the few scientists who earn a Nobel prize, the impact and
relevance of their research is unquestionable. Among the rest
of us, how does one quantify the cumulative impact and rele-
many others on those papers. In such cases, the relation in
Eq. 1 will imply a very atypical value of a, 5. Another
vance of an individual’s scientific research output? In a world of disadvantage is that Nc,tot gives undue weight to highly cited
limited resources, such quantification (even if potentially dis- review articles versus original research contributions.
(iii) Citations per paper (i.e., ratio of Nc,tot to Np). Advantage:
tasteful) is often needed for evaluation and comparison purposes
allows comparison of scientists of different ages. Disadvan-
(e.g., for university faculty recruitment and advancement, award
tage: hard to find, rewards low productivity, and penalizes
of grants, etc.).
high productivity.
The publication record of an individual and the citation record (iv) Number of ‘‘significant papers,’’ defined as the number of
clearly are data that contain useful information. That informa- papers with y citations (for example, y 50). Advantage:
tion includes the number (Np) of papers published over n years, eliminates the disadvantages of criteria i, ii, and iii and gives
the number of citations (Njc) for each paper (j), the journals an idea of broad and sustained impact. Disadvantage: y is
where the papers were published, their impact parameter, etc. arbitrary and will randomly favor or disfavor individuals,
This large amount of information will be evaluated with different and y needs to be adjusted for different levels of seniority.
criteria by different people. Here, I would like to propose a single (v) Number of citations to each of the q most-cited papers (for
number, the ‘‘h index,’’ as a particularly simple and useful way to example, q 5). Advantage: overcomes many of the
characterize the scientific output of a researcher. disadvantages of the criteria above. Disadvantage: It is not
A scientist has index h if h of his or her Np papers have at least a single number, making it more difficult to obtain and
h citations each and the other (Np h) papers have h citations compare. Also, q is arbitrary and will randomly favor and
each. disfavor individuals.
The research reported here concentrated on physicists; how-
ever, I suggest that the h index should be useful for other Instead, the proposed h index measures the broad impact of an
scientific disciplines as well. (At the end of the paper I discuss individual’s work, avoids all of the disadvantages of the criteria
listed above, usually can be found very easily by ordering papers
some observations for the h index in biological sciences.) The
by ‘‘times cited’’ in the Thomson ISI Web of Science database
highest h among physicists appears to be E. Witten’s h, which is
(http: isiknowledge.com),† and gives a ballpark estimate of the
110. That is, Witten has written 110 papers with at least 110
total number of citations (Eq. 1).
citations each. That gives a lower bound on the total number of Thus, I argue that two individuals with similar hs are compa-
citations to Witten’s papers at h2 12,100. Of course, the total rable in terms of their overall scientific impact, even if their total
number of citations (Nc,tot) will usually be much larger than h2, number of papers or their total number of citations is very
PHYSICS
because h2 both underestimates the total number of citations of different. Conversely, comparing two individuals (of the same
the h most-cited papers and ignores the papers with h citations. scientific age) with a similar number of total papers or of total
The relation between Nc,tot and h will depend on the detailed citation count and very different h values, the one with the higher
form of the particular distribution (1), and it is useful to define h is likely to be the more accomplished scientist.
the proportionality constant a as For a given individual, one expects that h should increase
approximately linearly with time. In the simplest possible model,
N c,tot ah 2 . [1] assume that the researcher publishes p papers per year and that
I find empirically that a ranges between 3 and 5. each published paper earns c new citations per year every
subsequent year. The total number of citations after n 1 years
Other prominent physicists with high hs are A. J. Heeger
is then
(h 107), M. L. Cohen (h 94), A. C. Gossard (h 94), P. W.
Anderson (h 91), S. Weinberg (h 88), M. E. Fisher (h n
88), M. Cardona (h 86), P. G. deGennes (h 79), J. N. pcn n 1
Nc,tot pcj . [2]
Bahcall (h 77), Z. Fisk (h 75), D. J. Scalapino (h 75), 2
j 1
G. Parisi (h 73), S. G. Louie (h 70), R. Jackiw (h 69),
F. Wilczek (h 68), C. Vafa (h 66), M. B. Maple (h 66),
D. J. Gross (h 66), M. S. Dresselhaus (h 62), and S. W. *E-mail: [email protected]
Hawking (h 62). I argue that h is preferable to other †Of course, the database used must be complete enough to cover the full period spanned
single-number criteria commonly used to evaluate scientific by the individual’s publications.
output of a researcher, as follows: © 2005 by The National Academy of Sciences of the USA
www.pnas.org cgi doi 10.1073 pnas.0507655102 PNAS November 15, 2005 vol. 102 no. 46 16569 –16572
Assuming all papers up to year y contribute to the index h, we
have
n yc h [3a]
py h. [3b]
The left side of Eq. 3a is the number of citations to the most
recent of the papers contributing to h; the left side of Eq. 3b is
the total number of papers contributing to h. Hence, from Eq. 3,
c
h n. [4]
1 c p
The total number of citations (for not-too-small n) is then
Fig. 1. Schematic curve of number of citations versus paper number, with
approximately
papers numbered in order of decreasing citations. The intersection of the 45°
2 line with the curve gives h. The total number of citations is the area under the
1 c p
Nc,tot h2 [5] curve. Assuming the second derivative is nonnegative everywhere, the mini-
2c p mum area is given by the distribution indicated by the dotted line, yielding a
2 in Eq. 1.
of the form Eq. 1. The coefficient a depends on the number of
papers and the number of citations per paper earned per year as
given by Eq. 5. As stated earlier, we find empirically that a 3–5 (the more realistic case), and the lower sign corresponds to the
is a typical value. The linear relation case where the less frequently cited papers dominate the total
citation count.
h mn [6] In a more realistic model, Nc(y) will not be a linear function
of y. Note that a 2 can safely be assumed to be a lower bound
should hold quite generally for scientists who produce papers of
quite generally, because a smaller value of a would require the
similar quality at a steady rate over the course of their careers;
second derivative 2Nc y2 to be negative over large regions of
of course, m will vary widely among different researchers. In the
y, which is not realistic. The total number of citations is given by
simple linear model, m is related to c and p as given by Eq. 4.
the area under the Nc(y) curve that passes through the point
Quite generally, the slope of h versus n, the parameter m, should
Nc(h) h. In the linear model, the lowest a 2 corresponds to
provide a useful yardstick to compare scientists of different
seniority. the line of slope 1, as shown in Fig. 1.
In the linear model, the minimum value of a in Eq. 1 is a A more realistic model would be a stretched exponential of the
2, for the case c p, where the papers with h citations and form
those with h citations contribute equally to the total Nc,tot. The y
value of a will be larger for both c p and c p. For c p, most Nc y N 0e y0 . [10]
contributions to the total number of citations arise from the
‘‘highly cited papers’’ (the h papers that have Nc h), whereas Note that for 1, N c(y) 0 for all y; hence, a 2 is true. We
for c p, it is the sparsely cited papers (the Np h papers that can write the distribution in terms of h and a as
have h citations each) that give the largest contribution to Nc,tot.
a y
We find that the first situation holds in the vast majority of, if not Nc y he h [11]
all, cases. For the linear model defined in this example, a 4 I
corresponds to c p 5.83 (the other value that yields a 4,
c p 0.17, is unrealistic). with I( ) the integral
The linear model defined above corresponds to the distribution
z
N0 I dze [12]
Nc y N0 1 y, [7] 0
h
and determined by the equation
where Nc(y) is the number of citations to the yth paper (ordered
from most cited to least cited) and N0 is the number of citations a
of the most highly cited paper (N0 cn in the example above). e . [13]
The total number of papers ym is given by Nc(ym) 0; hence, I
N0 h The maximally cited paper has citations
ym . [8]
N0 h a
N0 h, [14]
We can write N0 and ym in terms of a defined in Eq. 1 as I
and the total number of papers (with at least one citation) is
N0 h a a2 2a [9a] determined by N(ym) 1 as
1/
ym h1 ln h . [15]
2
ym h a a 2a . [9b] A given researcher’s distribution can be modeled by choosing
the most appropriate and a for that case. For example, for
For a 2, N0 ym 2h. For larger a, the upper sign in Eq. 9 1, if a 3, 0.661, N0 4.54h, and ym h[1 .66lnh]. With
corresponds to the case where the highly cited papers dominate a 4, 0.4644, N0 8.61h, and ym h[1 0.46ln(h)]. For
16570 www.pnas.org cgi doi 10.1073 pnas.0507655102 Hirsch
0.5, the lowest possible value of a is 3.70; for that case, N0 Society might occur typically for h 15–20. Membership in the
7.4h and ym h[1 0.5ln(h)]2. Larger a values will increase N0 National Academy of Sciences of the United States of America
and reduce ym. For 2 3, the smallest possible a is a 3.24, may typically be associated with h 45 and higher, except in
for which case N0 4.5h and ym h[1 0.66ln(h)]3/2. exceptional circumstances. Note that these estimates correspond
The linear relation between h and n (Eq. 6) will of course roughly to the typical number of years of sustained research
break down when the researcher slows down in paper production production assuming an m 1 value; the time scales of course
or stops publishing altogether. There is a time lag between the will be shorter for scientists with higher m values. Note that the
two events. In the linear model, assuming the researcher stops time estimates are taken from the publication of the first paper,
publishing after nstop years, h continues to increase at the same which typically occurs some years before the Ph.D. is earned.
rate for a time There are, however, a number of caveats that should be kept
in mind. Obviously, a single number can never give more than a
h 1 rough approximation to an individual’s multifaceted profile, and
nlag n [16]
c 1 c p stop many other factors should be considered in combination in
evaluating an individual. Furthermore, the fact that there can
and then stays constant, because now all published papers always be exceptions to rules should be kept in mind, especially
contribute to h. In a more realistic model, h will smoothly level in life-changing decisions such as the granting or denying of
off as n increases rather than with a discontinuous change in tenure. There will be differences in typical h values in different
slope. Still, quite generally, the time lag will be larger for fields, determined in part by the average number of references
scientists who have published for many years, as Eq. 16 indicates. in a paper in the field, the average number of papers produced
Furthermore, in reality, of course, not all papers will by each scientist in the field, and the size (number of scientists)
eventually contribute to h. Some papers with low citations will of the field (although, to a first approximation in a larger field,
never contribute to a researcher’s h, especially if written late there are more scientists to share a larger number of citations,
in the career, when h is already appreciable. As discussed by so typical h values should not necessarily be larger). Scientists
Redner (3), most papers earn their citations over a limited working in nonmainstream areas will not achieve the same very
period of popularity and then they are no longer cited. Hence, high h values as the top echelon of those working in highly topical
it will be the case that papers that contributed to a researcher’s areas. Although I argue that a high h is a reliable indicator of high
h early in his or her career will no longer contribute to h later accomplishment, the converse is not necessarily always true.
in the individual’s career. Nevertheless, it is of course always There is considerable variation in the skewness of citation
true that h cannot decrease with time. The paper or papers that distributions even within a given subfield, and for an author with
at any given time have exactly h citations are at risk of being a relatively low h that has a few seminal papers with extraordi-
eliminated from the individual’s h count as they are super- narily high citation counts, the h index will not fully reflect that
seded by other papers that are being cited at a higher rate. It scientist’s accomplishments. Conversely, a scientist with a high h
is also possible that papers ‘‘drop out’’ and then later come achieved mostly through papers with many coauthors would be
back into the h count, as would occur for the kind of papers treated overly kindly by his or her h. Subfields with typically large
termed ‘‘sleeping beauties’’ (4). collaborations (e.g., high-energy experiment) will exhibit larger
For the individual researchers mentioned earlier, I find n h values, and I suggest that in cases of large differences in the
from the time elapsed since their first published paper till the number of coauthors, it may be useful in comparing different
present and find the following values for the slope m defined individuals to normalize h by a factor that reflects the average
in Eq. 6: Witten, m 3.89; Heeger, m 2.38; Cohen, m 2.24; number of coauthors. For determining the scientific ‘‘age’’ in the
Gossard, m 2.09; Anderson, m 1.88; Weinberg, m 1.76; computation of m, the very first paper may sometimes not be the
Fisher, m 1.91; Cardona, m 1.87; deGennes, m 1.75; appropriate starting point if it represents a relatively minor early
Bahcall, m 1.75; Fisk, m 2.14; Scalapino, m 1.88; Parisi, contribution well before sustained productivity ensued.
m 2.15; Louie, m 2.33; Jackiw, m 1.92; Wilczek, m Finally, in any measure of citations, ideally one would like to
2.19; Vafa, m 3.30; Maple, m 1.94; Gross, m 1.69; eliminate the self-citations. Although self-citations can obviously
Dresselhaus, m 1.41; and Hawking, m 1.59. From inspec- increase a scientist’s h, their effect on h is much smaller than on
tion of the citation records of many physicists, I conclude the the total citation count. First, all self-citations to papers with h
following: citations are irrelevant, as are the self-citations to papers with
PHYSICS
(i) A value of m 1 (i.e., an h index of 20 after 20 years of many more than h citations. To correct h for self-citations, one
scientific activity), characterizes a successful scientist. would consider the papers with number of citations just h and
(ii) A value of m 2 (i.e., an h index of 40 after 20 years of count the number of self-citations in each. If a paper with h n
scientific activity), characterizes outstanding scientists, citations has n self-citations, it would be dropped from the h
likely to be found only at the top universities or major count, and h would drop by 1. Usually, this procedure would
research laboratories. involve very few if any papers. As the other face of this coin,
(iii) A value of m 3 or higher (i.e., an h index of 60 after scientists intent in increasing their h index by self-citations would
20 years, or 90 after 30 years), characterizes truly unique naturally target those papers with citations just h.
individuals. As an interesting sample population, I computed h and m for
the physicists who obtained Nobel prizes in the last 20 years (for
The m parameter ceases to be useful if a scientist does not calculating m, I used the latter of the first published paper year
maintain his or her level of productivity, whereas the h param- or 1955, the first year in the ISI database). However, the set was
eter remains useful as a measure of cumulative achievement that further restricted by including only the names that uniquely
may continue to increase over time even long after the scientist identified the scientist in the ISI citation index, which restricted
has stopped publishing. our set to 76% of the total. It is, however, still an unbiased
Based on typical h and m values found, I suggest (with large estimator, because the commonality of the name should be
error bars) that for faculty at major research universities, h 12 uncorrelated with h and m. h indices range from 22 to 79, and
might be a typical value for advancement to tenure (associate m indices range from 0.47 to 2.19. Averages and standard
professor) and that h 18 might be a typical value for advance- deviations are h 41, h 15 and m 1.14, m 0.47. The
ment to full professor. Fellowship in the American Physical distribution of h indices is shown in Fig. 2; the median is at hm
Hirsch PNAS November 15, 2005 vol. 102 no. 46 16571
is h 118, of which the largest individual contribution is 25; the
highest individual h is 66, and the sum of individual hs is 300.
The contribution of each individual to the group’s h is not
necessarily proportional to the individual’s h, and the highest
contributor to the group’s h will not necessarily be the individual
with highest h. In fact, in principle (although rarely in practice),
the lowest-h individual in a group could be the largest contrib-
utor to the group’s h. For a prospective graduate student
considering different graduate programs, a ranking of groups or
departments in his or her chosen area according to their overall
h index would likely be of interest, and for administrators
concerned with these issues, the ranking of their departments or
entire institution according to the overall h could also be of
interest.
To conclude, I discuss some observations in the fields of
biological and biomedical sciences. From the list compiled by
Fig. 2. Histogram giving the number of Nobel prize recipients in physics in Christopher King of Thomson ISI of the most highly cited
the last 20 years versus their h index. The peak is at the h index between 35 scientists in the period 1983–2002 (5), I found the h indices for
and 39.
the top 10 on that list, all in the life sciences, which are, in order
of decreasing h: S. H. Snyder, h 191; D. Baltimore, h 160;
35, lower than the mean due to the tail for high h values. It is R. C. Gallo, h 154; P. Chambon, h 153; B. Vogelstein, h
interesting that Nobel prize winners have substantial h indices 151; S. Moncada, h 143; C. A. Dinarello, h 138; T.
(84% had an h of at least 30), indicating that Nobel prizes do not Kishimoto, h 134; R. Evans, h 127; and A. Ullrich, h 120.
originate in one stroke of luck but in a body of scientific work. It can be seen that, not surprisingly, all of these highly cited
Notably, the values of m found are often not high compared with researchers also have high h indices and that high h indices in the
other successful scientists (49% of our sample had m 1), clearly life sciences are much higher than in physics. Among 36 new
because Nobel prizes are often awarded long after the period of inductees in the National Academy of Sciences in biological and
maximum productivity of the researchers. biomedical sciences in 2005, I find h 57, h 22, highest h
As another example, among newly elected members of the 135, lowest h 18, and median hm 57. These latter results
National Academy of Sciences in physics and astronomy in 2005, confirm that h indices in biological sciences tend to be higher
I find h 44, h 14, highest h 71, lowest h 20, and than in physics; however, they also indicate that the difference
median hm 46. Among the total membership in the National appears to be much higher at the high end than on average.
Academy of Sciences in physics, the subgroup of last names Clearly, more research in understanding similarities and differ-
starting with ‘‘A’’ and ‘‘B’’ has h 38, h 10, and hm 37. ences of h index distributions in different fields of science would
These examples further indicate that the index h is a stable and be of interest.
consistent estimator of scientific achievement. In summary, I have proposed an easily computable index, h,
An intriguing idea is the extension of the h-index concept to which gives an estimate of the importance, significance, and
groups of individuals.‡ The SPIRES high-energy physics litera- broad impact of a scientist’s cumulative research contributions.
ture database (www.slac.stanford.edu spires hep) recently im- I suggest that this index may provide a useful yardstick with
plemented the h index in their citation summaries, and it also which to compare, in an unbiased way, different individuals
allows the computation of h for groups of scientists. The overall competing for the same resource when an important evaluation
h index of a group will generally be larger than that of each of criterion is scientific achievement.
the members of the group but smaller than the sum of the
I am grateful to many colleagues in the University of California at San
individual h indices, because some of the papers that contribute Diego Condensed Matter group and especially Ivan Schuller for stim-
to each individual’s h will no longer contribute to the group’s h. ulating discussions on these topics and encouragement to publish these
For example, the overall h index of the condensed matter group ideas. I also thank the many readers who wrote with interesting com-
at the University of California at San Diego physics department ments since this paper was first posted at arXiv.org (6); the referees who
made constructive suggestions, all of which led to improvements in the
paper; and Travis Brooks and the SPIRES database administration for
‡This was first introduced in the SPIRES database. rapidly implementing the h index in their database.
1. Laherrere, J. & Sornette, D. (1998) Eur. Phys. J. E Soft Matter B2, 4. van Raan, A. F. J. (2004) Scientometrics 59, 467–472.
525–539. 5. King, C. (2003) Sci. Watch 14, no. 5, 1.
2. Redner, S. (1998) Eur. Phys. J. E Soft Matter B4, 131–134. 6. Hirsch, J. E. (2005) arXiv.org E-Print Archive (Aug. 3, 2005). Available at
3. Redner, S. (2005) Phys. Today 58, 49–54. http: arxiv.org abs physics 0508025.
16572 www.pnas.org cgi doi 10.1073 pnas.0507655102 Hirsch


Use: 0.7211