Sharing, preserving and depositing your dataimage: research data management

These pages focus on activities which often take place at the end of a research project including:

  • Choosing which data to retain and archive for long-term preservation

  • Disposing of data appropriately

  • Sharing data for future reuse

  • Depositing data with a data centre or repository.

Remember that your research funder may have specific requirements around data retention and archiving.

Data retention

The University's Research Data Management Policy states:

"3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained.

3.6 In the absence of the other provisions described in 3.5, the default period for research data retention is 10 years from date of last requested access."

Selecting data for long-term preservation

Challenges arise when selecting data to retain and archive, and in turn identifying data that can be disposed of.

As each research project is unique it’s impossible to provide a one-size-fits-all approach but careful consideration, meeting funder and institutional requirements, and documenting all the decisions made and why, should mitigate against many unforeseen issues that may arise later.

The University's Preserving Information (PDF  , 282kb) Records Management Guide offers advice on the preservation challenges facing digital and manual records: Why preserve information, preservation measures, how long different media last, and confidence in document formats for long‐term preservation.

Why you shouldn't keep everything

It’s important to remember that not everything should be retained for long-term preservation. Just because you can preserve, doesn't always mean you should. There are costs to preserving data (time, technology, space, maintenance) and risks in keeping things (storing massive amounts of data makes it difficult to find meaningful data easily), just as there are in not keeping them. It’s also important to note that under the Freedom of Information Act , what we keep must be disclosed if requested.

Deciding what data to keep

The DCC’s How to appraise and select research data for curation is a useful guide to the process of appraising and selecting data for long-term preservation.

Questions to consider when making decisions on what data to keep:

  • What data am I required to keep by my research funder, my institution or legally?
  • Does the data underpin a research publication?
    This data should be kept to allow others to validate and build on your research results. Some publishers now require data to be made available as a condition of publication.

  • Can the data be reused? Do I hold the intellectual property and legal rights to keep and reuse this data or can I negotiate these rights?

  • Is the data effectively documented to allow it to be found wherever it is to be stored, and for reuse?

  • Can the data be replicated, easily and cost-effectively?

  • What are the costs associated with keeping the data? Do I have the funds available to do this?

It's also worth considering if software and/or computer code needs to be retained. For example,  if you have produced computer code or software to visualise or interrogate your research data you may wish to preserve your code (with full documentation on any dependencies) in order to enable others to verify your findings or reproduce your methodology. See Digital preservation and curation - the danger of overlooking software from the Software Sustainability Institute.

Amongst the outputs from the Jisc-funded PrePARe Project is a useful checklist, Selecting what to keep and what to bin. The DCC has a guide to help researchers select data for long-term storage, Five steps to decide what data to keep: a checklist for appraising research data.

How long should the data be kept?

University policy is to preserve research data and records for a minimum of 10 years, unless otherwise required. This may be longer where the data is actively used. Legislative and regulatory needs, including any stipulated by your funder, may also change this retention period.

It is best practice to define the retention period before you create or receive data. Knowing how long the data is needed, and what your preservation requirements are, will also help with other choices such as medium and format. Many research funders now require a Data Management Plan where you will be expected to define your strategies for deposit and the long-term preservation of your data, including how preservation will be funded.

Data disposal

Data which has fulfilled its purpose and does not need to be kept for long-term preservation needs to be disposed of securely. Remember, you have a legal responsibility for the information you store and must ensure information security.

The University's Research Data Management Policy states:

“3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained."

The University's Disposing of Information (PDF  , 276kb) Records Management Guide (login required) provides advice about:

  • reasons for disposing of records

  • what needs to be disposed of securely

  • disposal methods

  • keeping a record of what you destroy

  • remote working

  • out‐sourcing the storage and disposal of information

  • making disposal easier: act at the point of creation

  • specific requirements for the disposal of certain information.

Funder and contractual obligations

As well as specifying how and where data are to be stored and accessed, contracts governing the provision of access to research data and the funding of research often specify how data is to be disposed of. For example, users who obtain access to Special License data from the UK Data Archive must follow the advice in the document Microdata Handling and Security: Guide to Good Practice [PDF], which includes guidance on how to permanently destroy copies of data files.

Some projects, agreements and research contracts may specify disposal of data to a particular standard. In some cases this standard for destruction may differ from or exceed that recommended in university guidance and therefore special attention should be paid to such obligations.

Further advice and support on the disposal of digital data is available from IT Services, email itsupport@york.ac.uk.

Sharing data

Benefits of sharing

Research data is a valuable resource and can often be put to significant use beyond its original purpose. There are benefits to you as a researcher and to the wider community of sharing data.

The UK Data Service's guide Why share data? lists the following benefits.

Sharing data:

  • encourages scientific enquiry and debate

  • promotes innovation and potential new data uses

  • leads to new collaborations between data users and data creators

  • maximises transparency and accountability

  • enables scrutiny of research findings

  • encourages the improvement and validation of research methods

  • reduces the cost of duplicating data collection

  • increases the impact and visibility of research

  • provides credit to the researcher as a research output in its own right

  • provides great resources for education and training.

Sharing data also helps you:

When not to share data

It is not always possible or desirable to share data.

  • Legal requirements - Your data cannot be shared under the Data Protection Act.
  • Ethical concerns - Your data includes sensitive or confidential data where no consent for data sharing has been given.
  • Licence restrictions - You are using data owned by others, such as commercial entities or authors, and don't have the rights to share the data.

  • Commercial value - Your data has financial value or a patent is pending. Contact the Intellectual Property & Legal Team if you need help in determining the value of your research data.

In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied. The UK Data Service provide guidance on legal and ethical issues, including sections on consent for data sharing (with example consent forms) and anonymisation of quantitative data and qualitative data. The Information Commissioner's Office provide useful information around anonymisation [PDF], removing hidden personal data from datasets [PDF] and big data [PDF] (note: this guidance has not been updated since the Data Protection Act 2018 became law but will be updated soon to reflect the changes). For more information see our Ethical and legal issues web page.

Options for sharing

There are a number of different ways that data can be shared:

  • You can deposit your data with a specialist data centre or repository. See Depositing your data section on this page for more information

  • It may be possible to submit your data alongside the associated publication, for example, when publishing a journal article.

  • You could make some data available via your website so that it is easily accessible to anyone who wishes to view it. Care would need to be taken in how this is set up and maintained.

It is easier to meet data sharing requests if your data is organised and effectively documented and if you plan for data sharing at the beginning of your project you decide how your data will be shared. An entertaining and informative Data sharing and management video (created by NYU Health Sciences Library) highlights what can go wrong if you don’t manage your data for sharing.

Data can also be requested under Freedom of Information legislation. For more information see the University's guidance on FOI and research data.

Rights and licensing

The rights relating to and ownership of research data should be established at the start of a project to avoid later confusion (see the Intellectual Property Rights information on the Ethical and legal issues web page).

In most cases, licensing your data can help clarify the terms of its use. The DCC guide on How to license research data provides a range of information on why and how to use licences. The University of Glasgow's Information Guide provides guidance in choosing a licence for research data. 

Depositing your data

Why deposit your data?

The University's Research Data Management Policy states:

“3.8 Retained data must be deposited in an appropriate national or international data service, or as mandated by the funder. Data should be transferred to the University Research Data York service when suitable data services are not available.”

 There are a number of advantages to deposit:

  • Depositing your data with a digital repository will ensure that the data is maintained in a readable format and remains usable over a longer period of time.
  • Your funder may require you to deposit. For more information see the Funder data policies page.

  • Digital repositories make your data available to more people helping you make a contribution to the development of your research area.

  • Making your data available could raise the impact of your research and your research profile.

In some cases it is possible to consider depositing subsets of your data or embargoing release for a given period of time.

Where to deposit

Two options are available to you for the deposit of selected data with long-term value. To deposit/transfer data:

  1. with external services, i.e. a funder/subject/publisher repository
    This option should be chosen if a suitable data archive or repository for your data exists. Specialised services dealing with discipline-specific data are best placed to manage and provide appropriate access to your data for the long-term.

  2. with University Research Data York service
    Choose this option where no suitable external repository can be found. Research Data York will store the data (physical or digital) for the longer term, manage requests to access those data and ensure that data remain unchanged. We will also allocate a Digital Object Identifier (DOI) where appropriate. Data will be securely destroyed after the agreed retention period has passed.

Note: All University of York researchers must record the data they have selected for long-term retention in PURE, irrespective of where the data is deposited. See our guidance on Recording datasets in PURE. If no suitable external service can be found to home your data, recording your dataset in PURE will trigger the University to discuss transfer to Research Data York.

Some funders will expect data with long-term value to be deposited in specific data centres to ensure that it is preserved and remains accessible for future use.  For example:

You can check your funder's data archiving policy (and Open Access requirements) using SHERPA/JULIET.

A range of other data repositories are also available in various subject disciplines. For example, the Archaeology Data Service and Dryad specialising in biological data. Zenodo integrated with GitHub can be a good place to make software available.

You can search for a suitable repository for your data by searching tools such as re3data.org.

Some work is being undertaken on defining criteria for the accreditation of repositories and what constitutes a Trustworthy repository. To assess whether a repository is a suitable home for your data, you should consider:

  • Does the repository have a good reputation in your field? Have you seen favourable references to it? Is it recommended by your funder or journal?
  • What metadata requirements are there? Will others be able to find and cite your data?
  • Will a persistent identifier (e.g. a Digital Object Identifier (DOI)) be assigned to your data, that you can include in your data access statement?
  • Can you apply access restrictions or an embargo period if you need to?
  • Will the repository ensure that confidential or personal data are secured if that is required?
  • Under what licence terms are datasets made available for reuse? Will the licence terms fit with your funder requirements and with the University's Research Data Management Policy?
  • Are you required to assign any copyright in the data to the archive? Note: We recommend avoiding using repositories that require transfer of rights. See the University's Policy on Intellectual Property.
  • Can you rely on it to preserve your data in 10 years time? Is it established and well funded?

Preparing for deposit

Thinking about depositing your data as part of your data management planning will help ensure that your data is ready for deposit at the appropriate time. For example, data centres may ask you to meet minimum quality standards to make sure that your data can be understood and reused by other researchers.

If your data is to be transferred to Research Data York and your data can not be shared openly (if restrictions on access will need to be applied) or if you wish to transfer large volumes of data, you should contact the Library's Research Support Team (email: lib-research-support@york.ac.uk) to discuss your options as early as possible.

The UK Data Service's guide Depositing shareable survey data was specifically developed to support new depositors of large-scale surveys but the principles apply to a wide range of significant deposits.

Data citation

How to get a DOI

All University of York researchers must record the data they have selected for long term retention in PURE. The 'datasets' record created will be checked by University Library staff and a DOI minted where appropriate, which you can then cite within your published paper. PURE datasets records are discoverable through the York Research Database, providing a permanent and public record of the dataset along with a description of the data, how it may be accessed and any constraints that may apply. See Recording datasets in PURE for further guidance.

If depositing your data with an external service (funder/subject/publisher repository), you should ask the repository for a persistent identifier (such as a DOI) to cite within your published paper.

Data citation

There are two elements to data citation. If you are publishing a dataset as part of your research output you will be expected to provide an accessibility statement, often referred to as a data access statement, in your published paper. If you are using existing or a third party dataset as part of your research you will be expected to cite the dataset in the text or in your references.

Data access statements

Data access statements are required for most publications that are publicly-funded. They are a requirement of many funders' data policies and are a requirement of the RCUK Policy on Open Access which states:

"[3.3] (ii) As part of supporting the drive for openness and transparency in research, and to ensure that researchers think about data access issues, the policy requires all research papers, if applicable, to include a statement on how underlying research materials, such as data, samples or models, can be accessed."

Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. The objective of the statement is to aid data discovery. Accordingly, data access statements need to include a persistent URL (e.g. a Digital Object Identifier (DOI)) which links directly to the dataset or to supporting documentation that describes the data in detail, how it may be accessed and any constraints that may apply.

What to include in the statement:

  • If data are openly available the name(s) of the data repositories should be provided, as well as any persistent URLs/DOIs or accession numbers for the dataset.
  • If there are justifiable legal or ethical reasons why your data cannot be made available, these should be noted in the statement.
  • If the data themselves are not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted.
  • If you did not collect the research data yourself but instead used existing data obtained from another source, this source should be credited.

A simple 'contact the author' instruction is not sufficient.

The data access statement should be included in submitted papers, even if a persistent URL or DOI has not been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.

Example data access statements:

  • All data created during this research are available by request from the University of York's York Research Database http://dx.doi.org/10.15124/12345
  • Expression data are openly available from ArrayExpress (Accession E-MTAB-01234 at https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-01234/). Crystal structures are available from the Cambridge Crystallographic Data Centre (Identifier BATHRS) at http://dx.doi.org/10.15124/12345. Microscopy images are openly available from Dryad at http://dx.doi.org/10.15124/12346.
  • All data supporting this study are provided as supplementary information accompanying this paper.
  • All data are provided in full in the results section of this paper.
  • Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration, at http://dx.doi.org/10.15124/12345.
  • Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available on the University of York's York Research Database http://dx.doi.org/10.15124/12345
  • Due to the (commercially, politically, ethically) sensitive nature of the research, no participants consented to their data being retained or shared. Additional details relating to other aspects of the data are available from the University of York's York Research Database http://dx.doi.org/10.15124/12345
  • Supporting data will be available by request from the University of York's York Research Database at http://dx.doi.org/10.15124/12345 after a 6 month embargo from the date of publication to allow for commercialisation of research findings.
  • No new data were created during this study.

Citing datasets

The same principles of attribution and credit apply to research data as they do to other research outputs such as journal articles or books. Good data citation will acknowledge the original author/producer and will provide the information necessary to identify and locate the data.

The elements that make up a data citation are still the subject of debate and may vary across subject disciplines. The core elements are:

  • Author(s) - the creator(s) of the data
  • Title - title by which the resource is known
  • Publisher
  • Publication date
  • Persistent identifier - usually a DOI

The following elements are also commonly recommended:

  • Edition and/or version
  • Resource type
  • Location - information on where the resource can be accessed

The order of elements and presentation of the citation will be defined by the referencing style that is being used. This may vary by journal or publisher.

More information on citing data can be found in the DCC guide to How to cite datasets and link to publications. DataCite also provide more information on how and Why to cite data. The ESRC guidelines on Data citation: what you need to know are also useful.

Software citation

It is important to give credit to other researchers for software they have developed which you have used in your research. Moveover, some software publishers may require the use of software to be acknowledged or cited in published research ouputs. Citing software that has played an integral part in your research can help others to understand, reproduce or reuse your research data.

In general, software should be cited in a similar manner to research data and research papers. If guidelines from your publisher or citation styles exist follow them, or check with your editor if you are writing for publication. Many software packages also give guidance on how they want to be cited. If no guidance exists, various organisations have been working to develop guidelines for software citation; examples include:

You should also ensure that it is possible for other people to cite your code/software. GitHub and other version control platforms are increasingly offering tools for making your software easier to cite, including assigning DOIs to your code. For further guidance see DataCite's software citation workflows.

Training

University of Edinburgh, Research Data MANTRA

Sharing, preservation, and licensing a training module from the MANTRA research data management online course. This module covers a range of issues relating to the long-term preservation, licensing and sharing of data with other researchers.