These pages focus on activities which often take place at the end of a research project, including:
Remember that your research funder may have specific requirements around data retention and archiving.
How to share, discover and reuse COVID-19 related data. This guidance is aimed at helping researchers to share their COVID-19 related data in a timely and responsible manner.
The University's Research Data Management Policy states:
"3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained.
3.6 In the absence of the other provisions described in 3.5, the default period for research data retention is 10 years from date of last requested access."
Challenges arise when selecting data to retain and archive, and in turn identifying data that can be disposed of.
As each research project is unique it’s impossible to provide a one-size-fits-all approach but careful consideration, meeting funder and institutional requirements, and documenting all the decisions made and why, should mitigate against many unforeseen issues that may arise later.
The University's Preserving Information (PDF , 282kb) Records Management Guide offers advice on the preservation challenges facing digital and manual records: Why preserve information, preservation measures, how long different media last, and confidence in document formats for long-term preservation.
It’s important to remember that not everything should be retained for long-term preservation. Just because you can preserve, doesn't always mean you should. There are costs to preserving data (time, technology, space, maintenance) and risks in keeping things (storing massive amounts of data makes it difficult to find meaningful data easily), just as there are in not keeping them. It’s also important to note that under the Freedom of Information Act , what we keep must be disclosed if requested.
The DCC’s How to appraise and select research data for curation is a useful guide to the process of appraising and selecting data for long-term preservation.
Questions to consider when making decisions on what data to keep:
Does the data underpin a research publication?
This data should be kept to allow others to validate and build on your research results. Some publishers now require data to be made available as a condition of publication.
Can the data be reused? Do I hold the intellectual property and legal rights to keep and reuse this data or can I negotiate these rights?
Is the data effectively documented to allow it to be found wherever it is to be stored, and for reuse?
Can the data be replicated, easily and cost-effectively?
What are the costs associated with keeping the data? Do I have the funds available to do this?
It's also worth considering if software and/or computer code needs to be retained. For example, if you have produced computer code or software to visualise or interrogate your research data you may wish to preserve your code (with full documentation on any dependencies) in order to enable others to verify your findings or reproduce your methodology. See Digital preservation and curation - the danger of overlooking software from the Software Sustainability Institute.
Among the outputs from the Jisc-funded PrePARe Project is a useful checklist: Selecting what to keep and what to bin. The DCC has a guide to help researchers select data for long-term storage, Five steps to decide what data to keep: a checklist for appraising research data.
University policy is to preserve research data and records for a minimum of 10 years, unless otherwise required. This may be longer where the data is actively used. Legislative and regulatory needs, including any stipulated by your funder, may also change this retention period.
It is best practice to define the retention period before you create or receive data. Knowing how long the data is needed, and what your preservation requirements are, will also help with other choices such as medium and format. Many research funders now require a Data Management Plan where you will be expected to define your strategies for deposit and the long-term preservation of your data, including how preservation will be funded.
Data which has fulfilled its purpose and does not need to be kept for long-term preservation needs to be disposed of securely. Remember, you have a legal responsibility for the information you store and must ensure information security.
The University's Research Data Management Policy states:
“3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained."
The University's Records Management Guide (login required) provides advice about:
reasons for disposing of records
what needs to be disposed of securely
keeping a record of what you destroy
outsourcing the storage and disposal of information
making disposal easier: act at the point of creation
specific requirements for the disposal of certain information.
As well as specifying how and where data are to be stored and accessed, contracts governing the provision of access to research data and the funding of research often specify how data is to be disposed of. For example, users who obtain access to Special Licence data from the UK Data Archive must follow the advice in the document Microdata Handling and Security: Guide to Good Practice [PDF], which includes guidance on how to permanently destroy copies of data files.
Some projects, agreements and research contracts may specify disposal of data to a particular standard. In some cases this standard for destruction may differ from or exceed that recommended in university guidance and therefore special attention should be paid to such obligations.
Further advice and support on the disposal of digital data is available from IT Services, email firstname.lastname@example.org.
Research data is a valuable resource and can often be put to significant use beyond its original purpose. There are benefits to you as a researcher and to the wider community of sharing data.
The UK Data Service's guide Why share data? lists the following benefits. Sharing data:
encourages scientific enquiry and debate
promotes innovation and potential new data uses
leads to new collaborations between data users and data creators
maximises transparency and accountability
enables scrutiny of research findings
encourages the improvement and validation of research methods
reduces the cost of duplicating data collection
increases the impact and visibility of research
provides credit to the researcher as a research output in its own right
provides great resources for education and training.
Sharing data also helps you:
Meet your funder’s requirements
Meet publisher requirements. Some publishers require data to be made available as a condition of publishing
Potentially extend your impact (see Sharing detailed research data is associated with increased citation rate: Piwowar, et al (2007) and Data reuse and the open data citation advantage: Piwowar and Vision (2013))
It is not always possible or desirable to share data.
Licence restrictions - You are using data owned by others, such as commercial entities or authors, and don't have the rights to share the data.
Commercial value - Your data has financial value or a patent is pending. Contact the Intellectual Property & Legal Team if you need help in determining the value of your research data.
In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied.
The UK Data Service provide guidance on legal and ethical issues, including sections on consent for data sharing (with example consent forms) and anonymisation of quantitative data and qualitative data.
The Information Commissioner's Office provide useful information around anonymisation [PDF], removing hidden personal data from datasets [PDF] and big data [PDF] (note: this guidance has not been updated since the Data Protection Act 2018 became law but will be updated soon to reflect the changes).
For more information see our Ethical and legal issues web page.
It is easier to meet data sharing requests if your data is organised and effectively documented and if you plan for data sharing at the beginning of your project you decide how your data will be shared. An entertaining and informative Data sharing and management video (created by NYU Health Sciences Library) highlights what can go wrong if you don’t manage your data for sharing.
The rights relating to and ownership of research data should be established at the start of a project to avoid later confusion (see the Intellectual Property Rights information on the Ethical and legal issues web page).
In most cases, licensing your data can help clarify the terms of its use. The DCC guide on How to license research data provides a range of information on why and how to use licences. The University of Glasgow's Information Guide provides guidance in choosing a licence for research data.
Researchers are strongly encouraged to deposit their data in a subject-specific repository or with the University of York Research Data York service.
Research data repositories provide the best option for storing and publishing research data in the long term:
Your funder may have expectations about where your data is deposited, along with expectations on how open it should be and a timescale for it to be made available. For more information see the Funder data policies page.
Digital repositories make your data available to more people helping you make a contribution to the development of your research area.
Making your data available could raise the impact of your research and your research profile.
You have a number of options available to you. Your decision on which repository to use will be informed by who your funder is and what they require, and the reputation of the repository.
Some funders will require data deposit in specific data repositories to ensure that it is preserved and remains accessible for future use. For example:
The University has its own research data service, Research Data York, which researchers from any discipline may wish to use. Research Data York can provide ongoing access to research data for extended periods of time and can issue unique Digital Object Identifiers (DOIs) for deposited datasets.
Research Data York is a good option for publishing your datasets, unless there is a subject-specific repository commonly used in your research field.
You should choose a recognised data repository for your subject or discipline if one exists; unless your funder requires otherwise. Specialised services dealing with discipline- and subject-specific data are best placed to manage and provide appropriate access to your data for the long-term.
You can check re3data.org, an international registry that lists repositories and their characteristics, to see if there is an appropriate repository for your data.
Note: All University of York researchers must record the data they have selected for long-term retention in PURE, irrespective of where the data is deposited. See our guidance on Recording datasets in PURE.
Some work is being undertaken on defining criteria for the accreditation of repositories and what constitutes a Trustworthy repository. To assess whether a repository is a suitable home for your data, you should consider:
Thinking about depositing your data as part of your data management planning will help ensure that your data is ready for deposit at the appropriate time. For example, data repositories may ask you to meet minimum quality standards to make sure that your data can be understood and reused by other researchers.
The UK Data Service's guide Depositing shareable survey data was specifically developed to support new depositors of large-scale surveys but the principles apply to a wide range of significant deposits.
Data citation aids data discovery, enables data reuse, recognises and can reward data creators, and allows the impact of data to be tracked.
There are two elements to data citation. If you are publishing a dataset as part of your research output you will be expected to provide an accessibility statement, often referred to as a data access statement, in your published paper. If you are using existing or a third party dataset as part of your research you will be expected to cite the dataset.
Data access statements are required for most publications that are publicly-funded. They are a requirement of many funders' data policies and are a requirement of the UKRI Open Access Policy which states:
"14. UKRI requires in-scope research articles to include a Data Access Statement, even where there are no data associated with the article or the data are inaccessible."
Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. One objective of the statement is to aid data discovery. Accordingly, data access statements should include a persistent URL (e.g. a Digital Object Identifier (DOI)) which links directly to the dataset or to supporting documentation that describes the data in detail.
A simple 'contact the author' instruction is not typically considered sufficient.
A data access statement should be included in submitted papers, even if a persistent URL or DOI has not yet been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.
The same principles of attribution and credit apply to research data as they do to other research outputs such as journal articles or books. Good data citation will acknowledge the original author/producer and will provide the information necessary to identify and locate the data.
The format and placement of a data citation may vary by journal or publisher. The core elements to include are:
Where appropriate you may also wish to include information about the Version and Resource Type.
More information on citing data can be found in the DCC guide to How to cite datasets and link to publications. DataCite also provide more information on how and Why to cite data. The ESRC guidelines on Data citation: what you need to know are also useful.
It is important to give credit to other researchers for software they have developed which you have used in your research. Moreover, some software publishers may require the use of software to be acknowledged or cited in published research outputs. Citing software that has played an integral part in your research can help others to understand, reproduce or reuse your research data.
In general, software should be cited in a similar manner to research data and research papers. If guidelines from your publisher or citation styles exist follow them, or check with your editor if you are writing for publication. Many software packages also give guidance on how they want to be cited. If no guidance exists, various organisations have been working to develop guidelines for software citation; examples include:
You should also ensure that it is possible for other people to cite your code/software. GitHub and other version control platforms are increasingly offering tools for making your software easier to cite, including assigning DOIs to your code. For further guidance see DataCite's software citation workflows.
A DOI is a persistent, unique identifier for digital objects such as journal articles or datasets. Using a DOI in a data access statement or data citation enables users to find and cite the data, even if its online location is moved.
If depositing your data with an external service (funder/subject/publisher repository), you should ask the repository for a persistent identifier (such as a DOI) to cite within your published papers.
If depositing your data with Research Data York, a DOI is minted when a PURE dataset record is validated by library staff.