Problems with using scientometrics in allocating research funding

2010-02-22

A critique of publication metrics and the assessment of quality in academic research

Scientometrics have been developed as a way to summarize the quality of scientific research in a single number. The major metric of the prestige of a journal is its Impact Factor (IF), which is the average rate of citation of its articles. It is calculated as the number of times articles from the journal are cited divided by the number of articles published within a specified time period.¹ Most commonly, a time frame of two years is used, although this has been criticized as too short a time for a paper to reach its greatest impact.²

Although IF is increasingly used to quantify the quality of a scientist’s research output,³ it strictly only applies to journals. The h-index is a more appropriate measure of a scientist’s citation rate, where h is the number of his/her publications that have been cited h or more times each. E.g., a scientist with an h-index of 10 has authored ten papers with ten or more citations.⁴

In recent years, a number of objections have been raised against the use of impact factors and the attitude it instils in researchers. Although originally intended to evaluate the quality of a journal, it has been found that the journals of certain fields differ systematically in their IF, though presumably not in quality, with, e.g., medical journals having consistently higher ratings than mathematical ones.⁵ As journals seek to maximize their IF, publications become biased towards fashionable topics. Some fields will become underfunded if funding patterns follow the trends of the high-impact journals.

Publication bias has also been demonstrated, with studies reporting significant results being much more likely to get published in high-IF journals than those with negative results.⁶ As there are few single articles that manage to transform their field, it is rather through the gradual accumulation of results that theoretical progress is made. A large collection of negative results is no less valuable than positive ones in this context, although journals appear more reluctant to publish them.

If researchers’ funding depends on producing positive results to publish in journals with high IFs, they are more likely to make Type I errors (false positives), whether deliberately or inadvertently.⁷ They may also overemphasize the significance or importance of their results for the paper to be accepted by more prestigious journals. This occurs especially in medical research, which is funded to a large extent by profit-seeking pharmaceutical companies, and which is desirable for journals to publish as it raises their IF in turn.³

Not only is the impact factor problematic for rating journals, there are additional reasons for not using it to rate the contributing scientists, not least of which is the fact that an important article with many citations does not necessarily appear in a high-IF journal. For example, Kingman 1982⁸ has had a substantial influence on population-genetic studies with 828 citations so far, but it was published in Stochastic Processes and their Applications, a journal with an IF of 1.07 (cf. Molecular Phylogenetics and Evolution with an IF of 3.87).

The h-index overcomes this problem, but credit will still be given to authors that were only peripherally involved in a study. As scientific merit is held to depend on number of citations, supervisors may be tempted to take credit for the work of their students, and author lists swell to include scientists that made no substantial contribution to the study.⁹ If credit does not go to the workers that contribute most, any metric used will not be an accurate reflection of the competence of a researcher or of the quality of their work.

By aiming to publish highly cited work, prestigious journals ensure that controversial ideas are given a wide audience, which is beneficial in encouraging public debate. However, when making decisions as to who should receive funding, metrics based on number of citations are inappropriate as the metric will be biased towards senior scientists, those working in fashionable fields, and members of research teams in which other members have made famous discoveries. They are, thus, not a good predictor of whether the researcher will produce important work in future.

A bigger problem with using these scientometrics to allocate funding is that it fails to support the work that most scientists do, as it is based on an incorrect model of the scientific method: Karl Popper had an idealistic image of the scientist as one who proposes bold hypotheses, with the best science being that which takes big risks and has great implications for our view of the world.¹⁰ This is the kind of work that is celebrated with publication in prestigious journals and high citation rates. Since it is unnecessary for hypotheses to be well-informed or probable, a negative result is uninformative — making the publication bias reasonable and even desirable.

An alternative view was proposed by Thomas Kuhn, in which normal science is instead characterized as puzzle-solving, using the accepted methodology and within the current theoretical framework.¹¹ It is only rarely that paradigm shifts become necessary, after the evidence against a particular world view has mounted to the point where it is no longer tenable. If this is closer to the reality of how science is practised, then it is misguided to focus funding on work that promises to be paradigm-busting and neglect research that seeks understanding within the current framework, and that can provide evidence in favour of one grand theory or another. It should also be remembered that it does not cost much to propose ambitious ideas. It is testing those ideas methodically that requires funding.

By only funding research by highly cited authors, idiographic work is also sidelined. Whether it is considered “science” or not (which depends on one’s definition), studies in taxonomy and natural history, the publication of floras and monographs, and maintaining museums and herbaria all support the entirety of organismal biology and have long-term usefulness despite not producing papers with high citation rates.¹² Indeed, Charles Darwin’s theory of natural selection was only formulated after decades of idiographic work.¹³

Idiographic research may be regarded, along with technological innovation, as essential to the scientific enterprise, even though they do not engage in hypothesis-testing. While some equipment and software is commercially viable (e.g. infrared gas analysers and cycle sequencers), designers of freely distributed software such as BEAST¹⁴ and BioEdit¹⁵ have started requesting citation in order to gain recognition for their contribution to the many studies that use their programs. Funding bodies must acknowledge the importance to any nomothetic attempt at generalization of both the basic knowledge of systems and the technology that allows them to be observed.

As much of science is publicly funded, it is worth asking which kind of research society benefits from more: sophisticated theories about the deep structure of life and the universe, or the gradual accumulation of knowledge and understanding about our immediate surroundings? Although the importance and utility of future theoretical advances cannot be predicted, it is quite reasonable for certain societies to be more concerned with, e.g., the life cycle of tropical disease vectors than the existence of the Higgs boson. If publicly funded scientists are to do work that has local interest or application, the approval of their grant applications cannot be contingent upon the number of papers they have published in Science or Nature.

A number of alternative metrics have recently been proposed, e.g. the number of downloads of a paper from an online database.¹⁶ In my opinion, the more traditional measure of scientific productivity, the number of publications, is adequate to show that a scientist is competent to produce results that are of enough interest to achieve publication. However, all metrics suffer from the flaw that they are a means of quantifying and objectifying what is a fundamentally subjective process. Brischoux & Cook 2009¹⁷ consider this an advantage, but I believe that it may be favoured by members of funding boards because it allows them to evade personal responsibility for their decisions. By looking at the number of publications of applicants, the theoretical importance and breadth of their work, and most importantly by reading their papers, an informed assessment can be made in light of the priorities of the funding organization whether or not to fund the proposed project.

Garfield E. 1972. Citation analysis as a tool in journal evaluation. Science 178: 471–479. ↩︎
Kurmis AP. 2003. Understanding the limitations of the journal impact factor. The Journal of Bone and Joint Surgery (American) 85: 2449–2454. ↩︎
Lawrence PA. 2007. The mismeasurement of science. Current Biology 17: R583–5. ↩︎
Hirsch JE. 2005. An index to quantify an individual’s scientific research output. PNAS 102: 16569–16572. ↩︎
Dong P, Loh M, Mondry A. 2005. The “impact factor” revisited. Biomedical Digital Libraries 2: 7. ↩︎
Easterbrook PJ, Berlin JA. 1991. Publication bias in clinical research. Lancet 337: 867–872. ↩︎
Ioannidis JPA. 2005. Why most published research findings are false. PLoS Medicine 2: e124. ↩︎
Kingman JFC. 1982. The coalescent. Stochastic Processes and Their Applications 13: 235–248. ↩︎
Lawrence PA. 2002. Rank injustice. Nature 415: 835–836. ↩︎
Popper K. 1963. Conjectures and Refutations. London, UK: Routledge and Kegan Paul. ↩︎
Kuhn TS. 1970. Logic of Discovery or Psychology of Research? In: Lakatos I, Musgrave A, eds. Criticism and the Growth of Knowledge. Cambridge, UK: Cambridge University Press, 4–10. ↩︎
Cotterill FPD, Foissner W. 2010. A pervasive denigration of natural history misconstrues how biodiversity inventories and taxonomy underpin scientific knowledge. Biodiversity and Conservation 19: 291–303. ↩︎
Dawkins R. 2009. The Greatest Show on Earth: The Evidence for Evolution. London, UK: Bantam Press. ↩︎
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 214. ↩︎
Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41: 95–98. ↩︎
Kaplan NR, Nelson ML. 2000. Determining the publication impact of a digital library. Journal of the American Society for Information Science 51: 324–339. ↩︎
Brischoux F, Cook TR. 2009. Juniors seek an end to the impact factor race. BioScience 59: 638–639. ↩︎