Citation analysis and bibliometrics

Help for researchers and students:
Bibliometrics - a practical guide

Bibliometrics is the quantitative analysis of scholarly publications, intended to provide an indication of their impact on academic and public discourse. Traditionally, bibliometrics takes account of the number of times a research paper is cited, in order to compare it against other papers in the same field. Metrics for other forms of research output and other measures of impact are slowly becoming established.

Uses

What are bibliometrics used for?

To indicate the impact of your own research or that of your research group:

  • On a CV
  • When applying for funding
  • When reporting project outcomes

To identify the most highly-cited researchers in a field, in order to:

  • Get an overview of an area which is new to you
  • Locate potential collaborators or competitors
  • Inform a selection process

To identify the most highly-cited journals in a field, in order to:

  • Decide where to publish
  • Undertake a literature review
  • Manage library collections
  • To help identify department, faculty or institution-level research strengths and weaknesses to support strategic decision-making.

(Adapted from Bibliometrics Explained, University of Leeds, 2017.)

Sources

Sources of bibliometrics

Three main providers currently dominate the market for citation data:

  • Web of Science (Clarivate Analytics; formerly Thomson-Reuters) - for coverage of scholarly journals in all academic disciplines, mostly Anglophone. Web of Science data is also used to compile Clarivate’s Journal Citation Reports, Essential Science Indicators, and Highly Cited Researchers.
  • Scopus (Elsevier) - historically strongest in the sciences and social sciences, although humanities coverage is increasing. Scopus data is utilized in Elsevier's SciVal tool for institution-level analysis.
  • Google Scholar - the most comprehensive source of bibliographic records for scholarly publications with a digital footprint. Citation data is harvested unmoderated, hence mismatches and double-counting may occur. Professor Anne-Wil Harzing of Melbourne University has created Publish or Perish, free software for presenting and analysing an author's Google Scholar data.

In addition, Altmetric is a well-established provider of non-traditional metrics, tracking data about publications mentioned in policy documents, mainstream media and social media.

Access to citation analysis databases:

E-resources Guide »

Search techniques and tips:

Limitations

Limitations of bibliometrics

Be aware that bibliometric data is not a measure of research quality. It is an indicator of the level of interest in a piece of research.

When evaluating the quality of a body of work, it is important to take account of additional sources of data about the research, such as funding received, awards granted, and any patents claimed. Peer review and other subjective indicators of esteem are another important source of evidence.

Publication cycle

  • A highly-cited work is not necessarily high-quality research: other authors may be challenging or refuting its conclusions.
  • Only a small percentage of articles are highly cited, and they are found in a small subset of journals.
  • Citations take time to accrue, although on average a paper will reach its citation ‘peak’ within 2 years of publication. The cut-off date of any citation-based metric will affect the score.
  • Review papers typically attract the most citations.
  • Editorials, letters, news items and meeting abstracts are generally "non-citable".

Disciplinary variation

  • Citation patterns differ greatly between disciplines, so direct comparisons cannot be made
  • Very little citation data is available for books or conference papers
  • Contributions to multi-authored papers (particularly common in the sciences and engineering) may skew author citation rates.

Database differences

  • The proprietary bibliographic databases are selective in their coverage of publications, so citation scores will differ depending on the source of the data.
  • Indexes may not reliably differentiate between researchers who share the same surname and initials, meaning that citation counts may be inflated. The use of author identifiers such as ORCID can be key.

Bias and discrepancies

  • Authors may show a bias towards citing their own work, their immediate research networks, or the titles they publish in. Some bibliometric tools allow you to exclude self-citations where appropriate.
  • Eminent authors may get more citations than a comparatively unknown researcher, even when their work is similar (the so-called Matthew Effect).
  • Researchers publishing in languages other than English tend to get fewer readers, hence fewer citations.
  • A 2006 study (Symonds et al, PLoS One) found that women get fewer citations than men.

(Adapted from Bibliometrics Explained, University of Leeds, 2017.)

Responsible use

Responsible use of bibliometrics

Responsible bibliometrics may be understood in terms of the following dimensions:

  • Robustness – base any bibliometric analysis on the best possible data in terms of accuracy and scope. Don't draw comparisons between researchers or research outputs using data from different sources.
  • Humility – recognise that quantitative evaluation should support, but not supplant, qualitative, expert assessment.
  • Transparency – ensure that data collection and analytical methods are open and auditable, so that researchers can test and verify the results.
  • Diversity – account for variation by field, and use a range of indicators to reflect and support a plurality of research outputs and researcher career paths across the system.
  • Reflexivity – recognise and anticipate the systemic and potential effects of indicators, and update them in response.

Bibliometric analysis is mainstream within the Anglophone academic community. Use of bibliometric data to inform an assessment of research performance may be more transparent and less vulnerable to bias than peer review, as well as cost-effective.

It is increasingly recognised, however, that no single bibliometric measure is sufficient to assess research quality, and that a subjective element will add depth.

The San Francisco Declaration on Research Assessment (DORA) originates from a 2012 scholarly conference at which participants recognised the "need to improve the ways in which the output of scientific research is evaluated by funding agencies, academic institutions, and other parties". It is particularly critical of the use of journal-level metrics as a surrogate measure of author impact. DORA has been signed by representatives of 859 institutions worldwide, including several UK Russell Group universities.

In 2015, a team of five academics released the Leiden Manifesto for Research Metrics: ten principles "distilling best practice in metrics-based research assessment, so that researchers can hold evaluators to account, and evaluators can hold their indicators to account". It had immediate impact and is highly-regarded as a framework for developing an institutional position.

UK context

Bibliometrics in the UK context

In April 2014, HEFCE set up an Independent Review of the Role of Metrics in Research Assessment and Management, chaired by Professor James Wilsdon (Sussex), to investigate "the current and potential future roles that quantitative indicators can play in the assessment and management of research".

The Review's report, The Metric Tide, was published in July 2015, calling for the research community to "develop a more sophisticated and nuanced approach to the contribution and limitations of quantitative indicators". Analysis of REF2014 results concluded that author-level metrics "cannot provide a like-for-like replacement for REF peer review".

Universities UK has convened the Forum for Responsible Metrics: research funders, sector bodies and infrastructure experts working in partnership to consider "how quantitative indicators might be used in assessing research outputs and environments" in the context of the next REF, and "working to improve the data infrastructure that underpins metric use".