Skip to content Accessibility statement

Patient data vital in understanding Covid-19 and its mutations

Posted on 16 November 2020

A new study has found 95.5 per cent of current entries in the world’s largest novel coronavirus genome database do not contain relevant patient information — a critical piece of the puzzle to understand how the virus is evolving

Report author, Professor Vasan says its critical to gather patient data

Researchers - led by a York virologist - have used this finding to develop a standardised data collection template, which can be implemented on repositories like GISAID, without identifying the patient and making it easier for clinical teams treating patients to share more of their knowledge.

This enables the scientific community to access important information including symptoms, vaccine status and travel history and in doing so build a more complete picture of the impact of Covid-19 on each patient.

SARS-CoV-2, the virus that causes Covid-19, is one of the most sequenced viruses in history, with over 200,000 sequences on GISAID as of 16 November 2020. The last 100,000 sequences of the virus were uploaded in the past two months, a global record.

Vital information

The study - led by Australia’s national science agency Commonwealth Scientific and Industrial Research Organisation (CSIRO) who are collaborating with GISAID and other academic partners - proposes a standardised data collection method to help scientists and clinicians around the world gather and share vital information in the fight against Covid-19.

CSIRO researcher and senior author of the paper Professor S.S. Vasan, who is also Honorary Professor at the University of York, UK, said it is critical to collect the ‘patient journey’ in as much detail as possible to understand the impact of virus evolution on the disease and its consequences.

Professor Vasan added: “We urgently need de-identified patient data associated with these virus genome sequences in order to decipher whether disease outcomes are due to a mutation, or multiple mutations, in the virus or host factors such as age, gender and co-morbidities.

 “It’s very likely this information is known to the clinical teams who treated the patient but does not make its way to public repositories such as GISAID, due to the number of steps involved.”

Recognising this need for clinical data, GISAID made ‘patient status’ a compulsory field for uploading virus sequences since 27 April 2020. 

However, the study showed a lack of digital infrastructure for collecting clinical information has hampered progress. 

Health systems

It also identified the need for a standardised vocabulary and mechanism for linking in with health systems as key factors for capturing the necessary information.

Lead author and CSIRO researcher Dr Denis Bauer, who is also Honorary Associate Professor at Macquarie University, Sydney, said with the adoption of the study’s proposed data collection template, future sequences shared through the GISAID initiative could contain more meaningful de-identified patient information.

Dr Bauer added: “We have identified steps in the clinical health data acquisition cycle and workflows that likely have the biggest impact in the data-driven understanding of this virus.

 “Following the ‘Fast Healthcare Interoperable Resource’ implementation guide, we have introduced an ontology-based standard questionnaire consistent with the World Health Organization’s recommendations.”

Genome sequences

Barwon Health’s Director of Infectious Diseases Professor Eugene Athan welcomed the new data collection template.

Professor Athan said: “Barwon Health is leading a study on the long-term biological, physiological and psychological effects of Covid-19, in partnership with CSIRO and Deakin University, and we intend to implement this mechanism for our data collection and reporting.

“Having a simplified and standardised approach to sharing relevant patient information alongside genome sequences will enable critical research into Covid-19 and comparisons between different studies and population sets.

 “I encourage clinicians and scientists around the world to share, wherever possible, de-identified patient information and clinical outcomes using this template to support ongoing research efforts.”

Explore more news

Media enquiries

Julie Gatenby
Deputy Head Media Relations (maternity cover)

Tel: +44 (0)1904 322029

About this research

The paper ‘Interoperable medical data: the missing link for understanding COVID‐19' was published in the Transboundary and Emerging Diseases journal.

Explore more of our research.

Our response to the coronavirus pandemic

We're working with partners in York and further afield as part of a global effort to fight the COVID-19 virus. From covid analysis in the labs to producing face shields for the frontline, we're using our knowledge and expertise to support the effort.