Watch Ben Goldacre (author of Bad Science) explain the importance of making data open and publicly accessible.
Increasingly, research data are coming to be seen as research outputs in their own right with reuse value beyond the research project for which they were created or collected. Because of this many funders, journal publishers and academics now argue for the importance of data access and sharing.
The benefits of data sharing include:
Data sharing will not prevent you from getting recognition for the data you create or from maximising its value for your research. While most funders expect researchers to make their data available in a "timely manner", they also allow researchers a period of exclusive use of their data prior to publication.
Furthermore, while most funders are keen for researchers to make their data as open and accessible as possible, they also acknowledge that not all data can be shared, and that sometimes it might be necessary to set conditions restricting how data can be accessed and by whom (see "Do I have to share my data?"below).
Options for publishing your data include:
Depositing your data in a data centre or repository increases the discoverability of your data as well as satisfying funder and journal publisher data policy requirements. Some repositories will also issue a DOI for your data making it easier to cite and link to related resources such as the published research that the data supports. Help with choosing a repository is available here. If there is no suitable external repository for your data you might be able to deposit your data in the King's RDM System.
If you publish your data with an external data centre or repository, send us the DOI or URL and we will create a record for the dataset in our forthcoming research data catalogue.
Sometimes it is possible to include supporting data as supplementary material accompanying published research (e.g. in the form of graphs or tables). However, there are likely to be restrictions on the types of materials which can be submitted (e.g. limitations on file size or acceptable file formats. See, for example, the guidance provided by the journal Science on preparing supplementary materials).
A growing number of journals now have data sharing policies which stipulate that data which supports published research should be made publicly available and/or that the published article should provide information about how and under what conditions the data which supports the paper's findings can be accessed. Most journal data sharing policies also specify that the data supporting published research papers be deposited in a data centre or repository.
Journals with data sharing policies include:
A provisional list of journals with data sharing policies is available from the Open Access Directory.
By itself this is unlikely to comply with funder expectations regarding long term preservation of data. Ideally you should deposit copies of the data in a data repository or data centre as well.
Again, this might not comply with funder mandates or enable long term preservation or data access. For example, the EPSRC, do not consider it sufficient to direct potential future users to a personal email address in your data access statement. Guidance on how to comply with the EPSRC's expectations and what to include in a data access statement is available on our webpage Meeting the EPSRC expectations.
Most funders expect data to be made available no later than the publication of the research findings but will usually also allow researchers a period of exclusive use to enable them to benefit from the data they have generated. You should include details of any proposed embargo period in your data management plan. Details of funder timescales for publishing data can be found on our webpage Funder policies on data management and sharing.
Most publishers also recognise that there are occasions when data cannot be shared or when access to the data has to be restricted e.g. for reasons of confidentiality, commercial sensitivity or copyright and IPR. However, they also expect researchers to demonstrate that they have taken reasonable measures to make the data available, even if access is controlled or otherwise mediated. This could be done by making access to the data request only or via a data sharing agreement. For help with creating a data sharing agreement contact the Information Compliance team: email firstname.lastname@example.org.
You can also set terms and conditions governing reuse if you apply a license to your data.
Again, check your funder's policy for details of their data sharing guidelines.
Additional guidance can be found on our webpages: License your data, Managing IPR and copyright, Ethics and confidentiality
See also Welcome Open Research: How to Publish - Data Guidelines
Great care has to be taken when working with data that contains personal or sensitive information, but it is still possible to share even highly sensitive data by following the strategies outlined below:
Informed consent is an ethical requirement for most research and must be considered and implemented throughout the research lifecycle, from planning to publication. Gaining consent must include making provision for sharing data and take into account any immediate or future uses of data.
– (UK Data Service - Consent for Data Sharing)
Before any data can be collected you must obtain "informed consent" from research participants. Informed consent means that the participants must understand what they are consenting to. Consent will typically be obtained in writing using a consent form, though it can be obtained verbally as well. Where possible, consent should extend beyond the use of data during the lifetime of the research project to include long-term preservation and future re-use. It is not necessary to obtain consent to use obtained anonymised data - providing that the data are 'robustly anonymised' and re-identification is not possible (see anonymisation below) - but if you are planning to anonymise and share personal data collected during your research you will need to inform participants that you intend to do this when you obtain informed consent.
Anonymisation is a process of removing identifying information to allow data to be more widely used. It requires that identifiers are "removed, obscured, aggregated and/or altered in some way" (UKAN - UK Anonymisation Network). Anonymised data are exempt from the General Data Protection Regulation (GDPR) but it is important to keep in mind that personally identifiable information can include both direct identifiers e.g. participant's name, address or national security number and indirect identifiers e.g. information about workplace, occupation or salary which, in combination with other information could make it possible to identify an individual, organisation or business (UK Data Service - Anonymisation). If your data contain information that could result in the re-identification of participants when combined with other information, then the data are not considered to be robustly anonymised and should not be shared without consent or appropriate safeguards.
The Information Compliance team have produced guidance on Anonymisation under GDPR.
The UK Data Archive's web pages offer detailed information about anonymisation techniques for both quantitative and qualitative data.
The National Centre for Research Methods (NCRM) has produced three video tutorials providing an outline of the Anonymisation Decision making Framework.
There may be cases where data cannot be anonymised. However, this does not necessarily mean that the data cannot be shared. A third option for making sensitive data available for reuse is to implement strict procedures and regulations that set conditions for how the data can be accessed and used. This can be achieved by asking users to sign an End User Agreement (for an example see the UK Data Service End User License) or enter into a data sharing agreement. For guidance on creating a data sharing agreement please contact the Information Compliance team: email@example.com. Any controlled or restricted access must be in line with data sharing permissions agreed during the consent process.
Examples of access controls include:
Source: UK Data Service - Access Controls
Intellectual property rights refer to intellectual works for which exclusive rights are recognised. They allow people to own the works they create and grant certain controls over the exploitation of those works.
The four main types of intellectual property rights are patents, trademarks, designs and copyright.
For more guidance on copyright visit our Copyright webpages
For enquiries about copyright in general email firstname.lastname@example.org
A database may be protected by both copyright and database rights.
Before data can be shared it is important to be clear about who owns the intellectual property rights. Normally the creator of a work is automatically assigned copyright ownership, although ownership can subsequently be transferred or waived. For works created during employment, the employer is usually the IPR owner, but to enable the dissemination of scholarly outputs the university, like most HEIs, permits researchers to publish their data, e.g. in a repository or data journal.
Please see the university's Code of Practice for Intellectual Property, Commercial exploitation and financial benefits for more details.
If you are the rights owner, you can set conditions for how the data can be accessed and used by applying a licence to the data. Guidance on how to license your data is available on our webpages here.
If you are using data collected from other sources, you will need to be very clear about who owns the rights to the data and aware of any conditions that might place restrictions on the re-use of the data. This is particularly important if you are planning to deposit your data in a repository or data centre (or if your funder or journal publisher requires you to do this).
The University of Southampton have produced a very useful Copyright Flow Chart to guide researchers through the process of managing copyright.
It is important to clarify ownership of IPR and copyright for the data you will collect or otherwise access during your project as early as possible. You can use your data management plan to capture this information and help you develop strategies for managing rights ownership throughout the lifetime of your project.
Rights ownership for databases can be quite complex as databases often contain multiple authors and can be drawn from many sources. If in doubt contact the Research Data Team: email@example.com
Many research projects use secondary data to answer their hypothesis or research questions. Using secondary data allows you to save time and effort, while making use of freely available material. However, secondary data may be limited to their relevancy, scope and sufficiency in relation to your research. It is therefore important to be able to find the correct type of data for your project by searching in recognised repositories (also known as archives or data centres).
A good starting point to find data for reuse is the Registry of Research Data Repositories (re3data), a global registry allowing you to search by subject area or discipline.
Research Pipeline is a guide to the world’s free data, while the Open Access Directory maintains a smaller list of trustworthy repositories categorised by subject area.
If you don’t have a search strategy for secondary data, Michigan State University have published some useful guidance on how to search for research datasets and find data and statistics.
One place to start searching for datasets is a repository that focuses specifically on your discipline and the type of data you wish to work with. Below you can find a selected list of domain specific repositories categorised by discipline:
Life Sciences (including Medical and Health Sciences)
Ecological and Environmental Sciences
Social and Political Sciences
For what to consider when choosing a domain or disciplinary specific repository or data centre see our Deposit your data webpage.
If a discipline specific repository does not meet your research needs, you can search for data in general purpose or multi-disciplinary repositories. Below you can find a list of trusted generic data centres:
Funders and publishers have also recommended established repositories including below:
Even though larger amounts of data are becoming increasingly available in open data archives, some data can only be accessed through subscribed datasets. Library Guides provide a list of databases that the College subscribes to.
What am I allowed to do with secondary data?
All data you decide to reuse for your research should have a licence – a legal document setting the terms and conditions governing what you can do with the data and how the data should be attributed.
Licence information is normally available in the metadata record of the dataset or can be obtained from the data repository when seeking access to the data. For more information of the different licences available please see our License your data webpages.
Data are increasingly considered legitimate research outputs and should therefore be cited. It is best practice, and usually a condition of the licence under which data are shared, for the data to be referenced. If your department recommends a specific referencing style follow the appropriate form for citing data. Alternatively you can find further guidance on why and how to cite your data in our Citation tab.
If you include a recommendation on how to cite the data that supports your research in your publications it will make it easier for others to access, reuse and cite your data. It also benefits you as a researcher. Adding a data citation to your research papers can increase the visibility of both the data and the research it supports as well as making it easier to measure the impact of your data. It is also good practice to correctly cite any dataset that you refer to or use during your own research.
A data citation should provide the reader with enough information to locate and access a dataset.
When adding a data citation to a paper (or citing secondary datasets that you reference in your paper) you should first check to see if your journal or publisher require you to use a specific citation style. If not, follow a recommended standard data citation style such as:
DataCite recommend that data citations should include the following elements:
Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855
Recommended optional elements:
The Digital Curation Centre (DCC) have published a "superset" of elements of a data citation derived from a number of published papers on the topic, of which the most important are:
They also recommend the use of a persistent identifier where possible.
Taken together these elements "give due credit, allow the reader to judge the relevance of the data, and permit access to the data..."
NB - If there are several versions of the dataset make sure that you cite the correct version used in your paper.
Again, this might depend upon your publisher or journal. The most logical place is in the bibliography, works cited or reference section. If the publisher's format includes a section specifically dedicated to describing the datasets used, the data citation could be included there. Another option might be to add the citation to the acknowledgments section (source: DCC - How to Cite Datasets and Link to Publications)
+44 (0)20 7848 1030
Browser does not support script.