These pages focus on activities which often take place at the end of a research project including:
Choosing which data to retain and archive for long-term preservation
Disposing of data appropriately
Sharing data for future reuse
Depositing data with a data centre or repository.
Remember that your research funder may have specific requirements around data retention and archiving.
The University's Research Data Management Policy states:
"3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained.
3.6 In the absence of the other provisions described in 3.5, the default period for research data retention is 10 years from date of last requested access."
Challenges arise when selecting data to retain and archive, and in turn identifying data that can be disposed of.
As each research project is unique it’s impossible to provide a one-size-fits-all approach but careful consideration, meeting funder and institutional requirements, and documenting all the decisions made and why, should mitigate against many unforeseen issues that may arise later.
The University's Preserving Information (PDF , 282kb) Records Management Guide offers advice on the preservation challenges facing digital and manual records: Why preserve information, preservation measures, how long different media last, and confidence in document formats for long‐term preservation.
It’s important to remember that not everything should be retained for long-term preservation. Just because you can preserve, doesn't always mean you should. There are costs to preserving data (time, technology, space, maintenance) and risks in keeping things (storing massive amounts of data makes it difficult to find meaningful data easily), just as there are in not keeping them. It’s also important to note that under the Freedom of Information Act , what we keep must be disclosed if requested.
The DCC’s How to appraise and select research data for curation is a useful guide to the process of appraising and selecting data for long-term preservation.
Questions to consider when making decisions on what data to keep:
Does the data underpin a research publication?
This data should be kept to allow others to validate and build on your research results. Some publishers now require data to be made available as a condition of publication.
Can the data be reused? Do I hold the intellectual property and legal rights to keep and reuse this data or can I negotiate these rights?
Is the data effectively documented to allow it to be found wherever it is to be stored, and for reuse?
Can the data be replicated, easily and cost-effectively?
What are the costs associated with keeping the data? Do I have the funds available to do this?
It's also worth considering if software and/or computer code needs to be retained. For example, if you have produced computer code or software to visualise or interrogate your research data you may wish to preserve your code (with full documentation on any dependencies) in order to enable others to verify your findings or reproduce your methodology. See Digital preservation and curation - the danger of overlooking software from the Software Sustainability Institute.
Amongst the outputs from the Jisc-funded PrePARe Project is a useful checklist, Selecting what to keep and what to bin. The DCC has a guide to help researchers select data for long-term storage, Five steps to decide what data to keep: a checklist for appraising research data.
University policy is to preserve research data and records for a minimum of 10 years, unless otherwise required. This may be longer where the data is actively used. Legislative and regulatory needs, including any stipulated by your funder, may also change this retention period.
It is best practice to define the retention period before you create or receive data. Knowing how long the data is needed, and what your preservation requirements are, will also help with other choices such as medium and format. Many research funders now require a Data Management Plan where you will be expected to define your strategies for deposit and the long-term preservation of your data, including how preservation will be funded.
Data which has fulfilled its purpose and does not need to be kept for long-term preservation needs to be disposed of securely. Remember, you have a legal responsibility for the information you store and must ensure information security.
The University's Research Data Management Policy states:
“3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained."
The University's Disposing of Information (PDF , 276kb) Records Management Guide (login required) provides advice about:
reasons for disposing of records
what needs to be disposed of securely
keeping a record of what you destroy
out‐sourcing the storage and disposal of information
making disposal easier: act at the point of creation
specific requirements for the disposal of certain information.
As well as specifying how and where data are to be stored and accessed, contracts governing the provision of access to research data and the funding of research often specify how data is to be disposed of. For example, users who obtain access to Special License data from the UK Data Archive must follow the advice in the document Microdata Handling and Security: Guide to Good Practice [PDF], which includes guidance on how to permanently destroy copies of data files.
Some projects, agreements and research contracts may specify disposal of data to a particular standard. In some cases this standard for destruction may differ from or exceed that recommended in university guidance and therefore special attention should be paid to such obligations.
Further advice and support on the disposal of digital data is available from IT Services, email firstname.lastname@example.org.
Research data is a valuable resource and can often be put to significant use beyond its original purpose. There are benefits to you as a researcher and to the wider community of sharing data.
The UK Data Service's guide Why share data? lists the following benefits.
encourages scientific enquiry and debate
promotes innovation and potential new data uses
leads to new collaborations between data users and data creators
maximises transparency and accountability
enables scrutiny of research findings
encourages the improvement and validation of research methods
reduces the cost of duplicating data collection
increases the impact and visibility of research
provides credit to the researcher as a research output in its own right
provides great resources for education and training.
Sharing data also helps you:
Meet your funder’s requirements
Meet publisher requirements. Some publishers require data to be made available as a condition of publishing
Potentially extend your impact (see Sharing detailed research data is associated with increased citation rate: Piwowar, et al (2007) and Data reuse and the open data citation advantage: Piwowar and Vision (2013))
It is not always possible or desirable to share data.
Licence restrictions - You are using data owned by others, such as commercial entities or authors, and don't have the rights to share the data.
Commercial value - Your data has financial value or a patent is pending. Contact the Intellectual Property & Legal Team if you need help in determining the value of your research data.
In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied. The UK Data Service provide guidance on legal and ethical issues, including sections on consent for data sharing (with example consent forms) and anonymisation of quantitative data and qualitative data. The Information Commissioner's Office provide useful information around anonymisation [PDF], removing hidden personal data from datasets [PDF] and big data [PDF] (note: this guidance has not been updated since the Data Protection Act 2018 became law but will be updated soon to reflect the changes). For more information see our Ethical and legal issues web page.
There are a number of different ways that data can be shared:
You can deposit your data with a specialist data centre or repository. See Depositing your data section on this page for more information
It may be possible to submit your data alongside the associated publication, for example, when publishing a journal article.
You could make some data available via your website so that it is easily accessible to anyone who wishes to view it. Care would need to be taken in how this is set up and maintained.
It is easier to meet data sharing requests if your data is organised and effectively documented and if you plan for data sharing at the beginning of your project you decide how your data will be shared. An entertaining and informative Data sharing and management video (created by NYU Health Sciences Library) highlights what can go wrong if you don’t manage your data for sharing.
Data can also be requested under Freedom of Information legislation. For more information see the University's guidance on FOI and research data.
The rights relating to and ownership of research data should be established at the start of a project to avoid later confusion (see the Intellectual Property Rights information on the Ethical and legal issues web page).
In most cases, licensing your data can help clarify the terms of its use. The DCC guide on How to license research data provides a range of information on why and how to use licences. The University of Glasgow's Information Guide provides guidance in choosing a licence for research data.
The University's Research Data Management Policy states:
“3.8 Retained data must be deposited in an appropriate national or international data service, or as mandated by the funder. Data should be transferred to the University Research Data York service when suitable data services are not available.”
There are a number of advantages to deposit:
Your funder may require you to deposit. For more information see the Funder data policies page.
Digital repositories make your data available to more people helping you make a contribution to the development of your research area.
Making your data available could raise the impact of your research and your research profile.
In some cases it is possible to consider depositing subsets of your data or embargoing release for a given period of time.
Two options are available to you for the deposit of selected data with long-term value. To deposit/transfer data:
Note: All University of York researchers must record the data they have selected for long-term retention in PURE, irrespective of where the data is deposited. See our guidance on Recording datasets in PURE. If no suitable external service can be found to home your data, recording your dataset in PURE will trigger the University to discuss transfer to Research Data York.
Some funders will expect data with long-term value to be deposited in specific data centres to ensure that it is preserved and remains accessible for future use. For example:
A range of other data repositories are also available in various subject disciplines. For example, the Archaeology Data Service and Dryad specialising in biological data. Zenodo integrated with GitHub can be a good place to make software available.
You can search for a suitable repository for your data by searching tools such as re3data.org.
Some work is being undertaken on defining criteria for the accreditation of repositories and what constitutes a Trustworthy repository. To assess whether a repository is a suitable home for your data, you should consider:
Thinking about depositing your data as part of your data management planning will help ensure that your data is ready for deposit at the appropriate time. For example, data centres may ask you to meet minimum quality standards to make sure that your data can be understood and reused by other researchers.
If your data is to be transferred to Research Data York and your data can not be shared openly (if restrictions on access will need to be applied) or if you wish to transfer large volumes of data, you should contact the Library's Research Support Team (email: email@example.com) to discuss your options as early as possible.
The UK Data Service's guide Depositing shareable survey data was specifically developed to support new depositors of large-scale surveys but the principles apply to a wide range of significant deposits.
All University of York researchers must record the data they have selected for long term retention in PURE. The 'datasets' record created will be checked by University Library staff and a DOI minted where appropriate, which you can then cite within your published paper. PURE datasets records are discoverable through the York Research Database, providing a permanent and public record of the dataset along with a description of the data, how it may be accessed and any constraints that may apply. See Recording datasets in PURE for further guidance.
If depositing your data with an external service (funder/subject/publisher repository), you should ask the repository for a persistent identifier (such as a DOI) to cite within your published paper.
There are two elements to data citation. If you are publishing a dataset as part of your research output you will be expected to provide an accessibility statement, often referred to as a data access statement, in your published paper. If you are using existing or a third party dataset as part of your research you will be expected to cite the dataset in the text or in your references.
Data access statements are required for most publications that are publicly-funded. They are a requirement of many funders' data policies and are a requirement of the RCUK Policy on Open Access which states:
"[3.3] (ii) As part of supporting the drive for openness and transparency in research, and to ensure that researchers think about data access issues, the policy requires all research papers, if applicable, to include a statement on how underlying research materials, such as data, samples or models, can be accessed."
Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. The objective of the statement is to aid data discovery. Accordingly, data access statements need to include a persistent URL (e.g. a Digital Object Identifier (DOI)) which links directly to the dataset or to supporting documentation that describes the data in detail, how it may be accessed and any constraints that may apply.
A simple 'contact the author' instruction is not sufficient.
The data access statement should be included in submitted papers, even if a persistent URL or DOI has not been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.
The same principles of attribution and credit apply to research data as they do to other research outputs such as journal articles or books. Good data citation will acknowledge the original author/producer and will provide the information necessary to identify and locate the data.
The elements that make up a data citation are still the subject of debate and may vary across subject disciplines. The core elements are:
The following elements are also commonly recommended:
The order of elements and presentation of the citation will be defined by the referencing style that is being used. This may vary by journal or publisher.
More information on citing data can be found in the DCC guide to How to cite datasets and link to publications. DataCite also provide more information on how and Why to cite data. The ESRC guidelines on Data citation: what you need to know are also useful.
It is important to give credit to other researchers for software they have developed which you have used in your research. Moveover, some software publishers may require the use of software to be acknowledged or cited in published research ouputs. Citing software that has played an integral part in your research can help others to understand, reproduce or reuse your research data.
In general, software should be cited in a similar manner to research data and research papers. If guidelines from your publisher or citation styles exist follow them, or check with your editor if you are writing for publication. Many software packages also give guidance on how they want to be cited. If no guidance exists, various organisations have been working to develop guidelines for software citation; examples include:
You should also ensure that it is possible for other people to cite your code/software. GitHub and other version control platforms are increasingly offering tools for making your software easier to cite, including assigning DOIs to your code. For further guidance see DataCite's software citation workflows.