Sharing, preserving and depositing your data
These pages focus on activities which often take place at the end of a research project, including:
- choosing which data to retain and archive for long-term preservation
- disposing of data appropriately
- sharing data for future reuse
- depositing data with a data centre or repository
- citing data
- further training
Remember that your research funder may have specific requirements around data retention and archiving.
Data retention
The University's Research Data Management Policy states:
"3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained.
3.6 In the absence of the other provisions described in 3.5, the default period for research data retention is 10 years from date of last requested access."
Selecting data for long-term preservation
Challenges arise when selecting data to retain and archive, and in turn identifying data that can be disposed of.
As each research project is unique it’s impossible to provide a one-size-fits-all approach but careful consideration, meeting funder and institutional requirements, and documenting all the decisions made and why, should mitigate against many unforeseen issues that may arise later.
The University's Preserving Information (PDF , 282kb) Records Management Guide offers advice on the preservation challenges facing digital and manual records: Why preserve information, preservation measures, how long different media last, and confidence in document formats for long-term preservation.
Why you shouldn't keep everything
It’s important to remember that not everything should be retained for long-term preservation. Just because you can preserve, doesn't always mean you should. There are costs to preserving data (time, technology, space, maintenance) and risks in keeping things (storing massive amounts of data makes it difficult to find meaningful data easily), just as there are in not keeping them. It’s also important to note that under the Freedom of Information Act, what we keep must be disclosed if requested.
Deciding what data to keep
The DCC’s How to appraise and select research data for curation is a useful guide to the process of appraising and selecting data for long-term preservation.
Questions to consider when making decisions on what data to keep:
- What data am I required to keep by my research funder, my institution or legally?
-
Does the data underpin a research publication?
This data should be kept to allow others to validate and build on your research results. Some publishers now require data to be made available as a condition of publication. -
Can the data be reused? Do I hold the intellectual property and legal rights to keep and reuse this data or can I negotiate these rights?
-
Is the data effectively documented to allow it to be found wherever it is to be stored, and for reuse?
-
Can the data be replicated, easily and cost-effectively?
-
What are the costs associated with keeping the data? Do I have the funds available to do this?
It's also worth considering if software and/or computer code needs to be retained. For example, if you have produced computer code or software to visualise or interrogate your research data you may wish to preserve your code (with full documentation on any dependencies) in order to enable others to verify your findings or reproduce your methodology. See Digital preservation and curation - the danger of overlooking software from the Software Sustainability Institute.
Among the outputs from the Jisc-funded PrePARe Project is a useful checklist: Selecting what to keep and what to bin. The DCC has a guide to help researchers select data for long-term storage, Five steps to decide what data to keep: a checklist for appraising research data.
How long should the data be kept?
University policy is to make available research data selected for preservation and sharing for a minimum of 10 years, unless otherwise required. This may be longer where the data is actively used. Legislative and regulatory needs, including any stipulated by your funder, may also change this retention period.
It is best practice to define the retention period before you create or receive data. Knowing how long the data is needed, and what your preservation requirements are, will also help with other choices such as medium and format. Many research funders now require a Data Management Plan where you will be expected to define your strategies for deposit and the long-term preservation of your data, including how preservation will be funded.
Data disposal
Data which has fulfilled its purpose and does not need to be kept for long-term preservation needs to be disposed of securely. Remember, you have a legal responsibility for the information you store and must ensure information security.
The University's Research Data Management Policy states:
“3.5 Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical, research funder and collaborator requirements and with particular concern for the confidentiality and security of the data. Research data that underpins published results or is considered to have long-term value should be retained."
The University's Records Management Guide (login required) provides advice about:
-
reasons for disposing of records
-
what needs to be disposed of securely
-
disposal methods
-
keeping a record of what you destroy
-
remote working
-
outsourcing the storage and disposal of information
-
making disposal easier: act at the point of creation
-
specific requirements for the disposal of certain information.
Funder and contractual obligations
As well as specifying how and where data are to be stored and accessed, contracts governing the provision of access to research data and the funding of research often specify how data is to be disposed of. For example, users who obtain access to Special Licence data from the UK Data Archive must follow the advice in the document Microdata Handling and Security: Guide to Good Practice [PDF], which includes guidance on how to permanently destroy copies of data files.
Some projects, agreements and research contracts may specify disposal of data to a particular standard. In some cases this standard for destruction may differ from or exceed that recommended in university guidance and therefore special attention should be paid to such obligations.
Further advice and support on the disposal of digital data is available from IT Services, email itsupport@york.ac.uk.
Sharing data
Benefits of sharing
Research data is a valuable resource and can often be put to significant use beyond its original purpose. There are benefits to you as a researcher and to the wider community of sharing data.
The UK Data Service's guide Why share data? lists the following benefits. Sharing data:
-
encourages scientific enquiry and debate
-
promotes innovation and potential new data uses
-
leads to new collaborations between data users and data creators
-
maximises transparency and accountability
-
enables scrutiny of research findings
-
encourages the improvement and validation of research methods
-
reduces the cost of duplicating data collection
-
increases the impact and visibility of research
-
provides credit to the researcher as a research output in its own right
-
provides great resources for education and training.
Sharing data also helps you:
-
Meet your funder’s requirements (see What is research data management?: Funder data policies)
-
Meet publisher requirements. Some publishers require data to be made available as a condition of publishing
-
Potentially extend your impact (see Sharing detailed research data is associated with increased citation rate: Piwowar, et al (2007) and Data reuse and the open data citation advantage: Piwowar and Vision (2013))
When not to share data
It is not always possible or desirable to share data.
- Legal requirements - Your data cannot be shared under the Data Protection Act.
- Ethical concerns - Your data includes sensitive or confidential data where no consent for data sharing has been given.
-
Licence restrictions - You are using data owned by others, such as commercial entities or authors, and don't have the rights to share the data.
-
Commercial value - Your data has financial value or a patent is pending. Contact the Intellectual Property & Legal Team if you need help in determining the value of your research data.
In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied.
The UK Data Service provide guidance on legal and ethical issues, including sections on consent for data sharing (with example consent forms) and anonymisation of quantitative data and qualitative data.
The Information Commissioner's Office provide useful information around anonymisation [PDF], removing hidden personal data from datasets [PDF] and big data [PDF] (note: this guidance has not been updated since the Data Protection Act 2018 became law but will be updated soon to reflect the changes).
For more information see our Ethical and legal issues web page.
Plan for data sharing
It is easier to meet data sharing requests if your data is organised and effectively documented and if you plan for data sharing at the beginning of your project you decide how your data will be shared. An entertaining and informative Data sharing and management video (created by NYU Health Sciences Library) highlights what can go wrong if you don’t manage your data for sharing.
Rights and licensing
The rights relating to and ownership of research data should be established at the start of a project to avoid later confusion (see the Intellectual Property Rights information on the Ethical and legal issues web page).
In most cases, licensing your data can help clarify the terms of its use. The DCC guide on How to license research data provides a range of information on why and how to use licences. The University of Glasgow's Information Guide provides guidance in choosing a licence for research data.
Depositing your data
Researchers are strongly encouraged to deposit their data in a subject-specific repository or with the University of York Research Data York service.
Research data repositories provide the best option for storing and publishing research data in the long term:
- Depositing your data with a digital repository will ensure that the data is maintained in a readable format and remains usable over a longer period of time.
-
Your funder may have expectations about where your data is deposited, along with expectations on how open it should be and a timescale for it to be made available. For more information see What is research data management?: Funder data policies.
-
Digital repositories make your data available to more people helping you make a contribution to the development of your research area.
-
Making your data available could raise the impact of your research and your research profile.
Where to deposit your data?
You have a number of options available to you. Your decision on which repository to use will be informed by who your funder is and what they require, and the reputation of the repository.
Funder recommended data repository
Some funders will require data deposit in specific data repositories to ensure that it is preserved and remains accessible for future use. For example:
- Several funding bodies recommend the Archaeology Data Service for archaeological data
- ESRC funds the UK Data Service
- NERC has a network of environmental data centres
- BBSRC lists data sharing resources
- Wellcome Open Research maintains a curated list of approved repositories suitable for Wellcome-funded research
You can check your funder's data archiving policy (and Open Access requirements) using the Sherpa Juliet database.
University Research Data York service
The University has its own research data service, Research Data York, which researchers from any discipline may wish to use. Research Data York can provide ongoing access to research data for extended periods of time and can issue unique Digital Object Identifiers (DOIs) for deposited datasets.
Research Data York is a good option for publishing your datasets, unless there is a subject-specific repository commonly used in your research field.
Subject or disciplines-specific data repositories
You should choose a recognised data repository for your subject or discipline if one exists; unless your funder requires otherwise. Specialised services dealing with discipline- and subject-specific data are best placed to manage and provide appropriate access to your data for the long-term.
You can check re3data.org, an international registry that lists repositories and their characteristics, to see if there is an appropriate repository for your data.
Note: All University of York researchers must record the data they have selected for long-term retention in PURE, irrespective of where the data is deposited. See our guidance on Recording datasets in PURE.
Is a repository suitable?
Some work is being undertaken on defining criteria for the accreditation of repositories and what constitutes a Trustworthy repository. To assess whether a repository is a suitable home for your data, you should consider:
- Does the repository have a good reputation in your field? Have you seen favourable references to it? Is it recommended by your funder or journal?
- What metadata requirements are there? Will others be able to find and cite your data?
- Will a persistent identifier (eg a Digital Object Identifier (DOI)) be assigned to your data, that you can include in your data access statement?
- Can you apply access restrictions or an embargo period if you need to?
- Will the repository ensure that confidential or personal data are secured if that is required?
- Under what licence terms are datasets made available for reuse? Will the licence terms fit with your funder requirements and with the University's Research Data Management Policy?
- Are you required to assign any copyright in the data to the archive? Note: We recommend avoiding using repositories that require transfer of rights. See the University's Policy on Intellectual Property.
- Can you rely on it to preserve your data in 10 years time? Is it established and well funded?
Preparing for deposit
Thinking about depositing your data as part of your data management planning will help ensure that your data is ready for deposit at the appropriate time. For example, data repositories may ask you to meet minimum quality standards to make sure that your data can be understood and reused by other researchers.
The UK Data Service's guide Depositing shareable survey data was specifically developed to support new depositors of large-scale surveys but the principles apply to a wide range of significant deposits.
Data citation
Data citation aids data discovery, enables data reuse, recognises and can reward data creators, and allows the impact of data to be tracked.
There are two elements to data citation. If you are publishing a dataset as part of your research output you will be expected to provide an accessibility statement, often referred to as a data access statement, in your published paper. If you are using existing or a third party dataset as part of your research you will be expected to cite the dataset.
Data access statements
Data access statements are required for most publications that are publicly-funded. They are a requirement of many funders' data policies and are a requirement of the UKRI Open Access Policy which states:
"14. UKRI requires in-scope research articles to include a Data Access Statement, even where there are no data associated with the article or the data are inaccessible."
Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. One objective of the statement is to aid data discovery. Accordingly, data access statements should include a persistent URL (e.g. a Digital Object Identifier (DOI)) which links directly to the dataset or to supporting documentation that describes the data in detail.
What to include in the statement:
- If data is openly available, the name of the data repository should be provided along with any persistent identifier.
- If there are legal or ethical reasons why data cannot be made available, described them.
- If the data are not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted.
- If you did not collect the research data yourself but instead used existing data obtained from another source, this source should be credited.
A simple 'contact the author' instruction is not typically considered sufficient.
A data access statement should be included in submitted papers, even if a persistent URL or DOI has not yet been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.
Example data access statements:
- The data supporting this research is openly available from the [insert repository name] repository at [insert DOI].
- All data supporting this study are provided as supplementary information accompanying this paper [insert DOI].
- All data are provided in full in the results section of this paper.
- Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration, at [insert DOI].
- Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the [insert repository name] repository at [insert DOI].
- Due to the (commercially, politically, ethically) sensitive nature of the research, no participants consented to their data being retained or shared. Additional details relating to other aspects of the data are available from the [insert repository name] repository at [insert DOI].
- Supporting data will be available from the [insert repository name] repository at [insert DOI] after a 6-month embargo from the date of publication to allow for commercialisation of research findings.
- This is a review article, therefore all data underlying this study is cited in the references.
- No new data were created during this study.
Citing datasets
The same principles of attribution and credit apply to research data as they do to other research outputs such as journal articles or books. Good data citation will acknowledge the original author/producer and will provide the information necessary to identify and locate the data.
The format and placement of a data citation may vary by journal or publisher. The core elements to include are:
- Creator(s) of the Data (Publication Date): Title. Publisher. Persistent Identifier.
Where appropriate you may also wish to include information about the Version and Resource Type.
More information on citing data can be found in the DCC guide to How to cite datasets and link to publications. DataCite also provide more information on how and Why to cite data. The ESRC guidelines on Data citation: what you need to know are also useful.
Software citation
It is important to give credit to other researchers for software they have developed which you have used in your research. Moreover, some software publishers may require the use of software to be acknowledged or cited in published research outputs. Citing software that has played an integral part in your research can help others to understand, reproduce or reuse your research data.
In general, software should be cited in a similar manner to research data and research papers. If guidelines from your publisher or citation styles exist follow them, or check with your editor if you are writing for publication. Many software packages also give guidance on how they want to be cited. If no guidance exists, various organisations have been working to develop guidelines for software citation; examples include:
- Force11 Software Citation Principles
- DataCite DataCite Metadata Schema 4.1 with new additions to describe software and example software citations
- Software Sustainability Institute How to cite and describe software.
You should also ensure that it is possible for other people to cite your code/software. GitHub and other version control platforms are increasingly offering tools for making your software easier to cite, including assigning DOIs to your code. For further guidance see DataCite's software citation workflows.
DOIs
A DOI is a persistent, unique identifier for digital objects such as journal articles or datasets. Using a DOI in a data access statement or data citation enables users to find and cite the data, even if its online location is moved.
How to get a DOI
If depositing your data with an external service (funder/subject/publisher repository), you should ask the repository for a persistent identifier (such as a DOI) to cite within your published papers.
If depositing your data with Research Data York, a DOI is minted when a PURE dataset record is validated by library staff.
Training
- University of Edinburgh, Research Data MANTRA
- Sharing, preservation, and licensing a training module from the MANTRA research data management online course. This module covers a range of issues relating to the long-term preservation, licensing and sharing of data with other researchers.