Share | Research support | King’s College London

Watch Ben Goldacre (author of Bad Science) explain the importance of making data open and publicly accessible.

Increasingly, research data are coming to be seen as research outputs in their own right with reuse value beyond the research project for which they were created or collected. Because of this many funders, journal publishers and academics now argue for the importance of data access and sharing. 

The benefits of data sharing include:

Prevention of data duplication
Reducing the risk of data loss
Greater scrutiny of published research
New forms or collaborative research and data re-use
Increased visibility of research data & associated publications

Data sharing will not prevent you from getting recognition for the data you create or from maximising its value for your research. While most funders expect researchers to make their data available in a "timely manner", they also allow researchers a period of exclusive use of their data prior to publication.

Furthermore, while most funders are keen for researchers to make their data as open and accessible as possible, they also acknowledge that not all data can be shared, and that sometimes it might be necessary to set conditions restricting how data can be accessed and by whom (see "Do I have to share my data?"below). 

How should I publish my data?

Options for publishing your data include:

In a repository or data centre

Depositing your data in a data centre or repository increases the discoverability of your data as well as satisfying funder and journal publisher data policy requirements. Some repositories will also issue a DOI for your data making it easier to cite and link to related resources such as the published research that the data supports. Help with choosing a repository is available here, including information about the King's research data repository, KORDS.

If you publish your data with an external data centre or repository, send us the DOI or URL and we will create a record for the dataset in our forthcoming research data catalogue. 

As supplementary materials attached to a journal

Sometimes it is possible to include supporting data as supplementary material accompanying published research (e.g. in the form of graphs or tables). However, there are likely to be restrictions on the types of materials which can be submitted (e.g. limitations on file size or acceptable file formats. See, for example, the guidance provided by the journal Science on preparing supplementary materials). 

A growing number of journals now have data sharing policies which stipulate that data which supports published research should be made publicly available and/or that the published article should provide information about how and under what conditions the data which supports the paper's findings can be accessed. Most journal data sharing policies also specify that the data supporting published research papers be deposited in a data centre or repository. 

Journals with data sharing policies include:

Nature - authors are expected to make their supporting data publicly available at the point of publication, adding that  the "preferred way to share large data sets is via public repositories". 
PLOS ONE - "All data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository, unless already provided as part of the submitted article."
Science - "... appropriate data sets (including microarray data, protein or DNA sequences, atomic coordinates or electron microscopy maps for macromolecular structures, and climate data) must be deposited in an approved database."

A provisional list of journals with data sharing policies is available from the Open Access Directory.

Via a project, departmental, or personal web site

By itself this is unlikely to comply with funder expectations regarding long term preservation of data. Ideally you should deposit copies of the data in a data repository or data centre as well. 

Allow access via email request

Again, this might not comply with funder mandates or enable long term preservation or data access. For example, the EPSRC, do not consider it sufficient to direct potential future users to a personal email address in your data access statement. Guidance on how to comply with the EPSRC's expectations and what to include in a data access statement is available on our webpage Meeting the EPSRC expectations. 

When should I publish my data?

Most funders expect data to be made available no later than the publication of the research findings but will usually also allow researchers a period of exclusive use to enable them to benefit from the data they have generated. You should include details of any proposed embargo period in your data management plan. Details of funder timescales for publishing data can be found on our webpage Funder policies on data management and sharing. 

Do I have to share my data?

Most publishers also recognise that there are occasions when data cannot be shared or when access to the data has to be restricted e.g. for reasons of confidentiality, commercial sensitivity or copyright and IPR. However, they also expect researchers to demonstrate that they have taken reasonable measures to make the data available, even if access is controlled or otherwise mediated. This could be done by making access to the data request only or via a data sharing agreement. For help with creating a data sharing agreement contact the Information Compliance team: email info-compliance@kcl.ac.uk. 

You can also set terms and conditions governing reuse if you apply a license to your data - please see the IP, Copyright & Licensing tab at the top of this page for more guidance. 

Again, check your funder's policy for details of their data sharing guidelines. 

Further guidance:

Additional guidance on confidentiality, ethics, IP, copyright and licensing can be found on the other tabs on this page.

Sharing personal data

Great care has to be taken when working with data that contains personal or sensitive information, but it is still possible to share such data by following the strategies outlined below:

Informed consent

Informed consent is an ethical requirement for most research and must be considered and implemented throughout the research lifecycle, from planning to publication. Gaining consent must include making provision for sharing data and take into account any immediate or future uses of data. – (UK Data Service - Consent for Data Sharing)

Before any data can be collected you must obtain "informed consent" from research participants. Informed consent means that  the participants must understand what they are consenting to. Consent will typically be obtained in writing using a consent form, though it can be obtained verbally as well. Where possible, consent should extend beyond the use of data during the lifetime of the research project to include long-term preservation and future re-use. It is not necessary to obtain consent to use obtained anonymised data - providing that the data are 'robustly anonymised' and re-identification is not possible (see anonymisation below) - but if you are planning to anonymise and share personal data collected during your research you will need to inform participants that you intend to do this when you obtain informed consent.   

Anonymisation

Anonymisation is a process of removing identifying information to allow data to be more widely used. It requires that identifiers are "removed, obscured, aggregated and/or altered in some way" (UKAN - UK Anonymisation Network). Anonymised data are exempt from the General Data Protection Regulation (GDPR) but it is important to keep in mind that personally identifiable information can include both direct identifiers e.g. participant's name, address or national security number and indirect identifiers e.g. information about workplace, occupation or salary which, in combination with other information could make it possible to identify an individual, organisation or business (UK Data Service - Anonymisation). If your data contain information that could result in the re-identification of participants when combined with other information, then the data are not considered to be robustly anonymised and should not be shared without consent or appropriate safeguards. 

The Information Compliance team have produced guidance on Anonymisation under GDPR. 

The UK Data Archive's web pages offer detailed information about anonymisation techniques for both quantitative and qualitative data.

The National Centre for Research Methods (NCRM) has produced three video tutorials providing an outline of the Anonymisation Decision making Framework.

Controlled access and data sharing agreements

There may be cases where data cannot be anonymised. However, this does not necessarily mean that the data cannot be shared. A third option for making sensitive data available for reuse is to implement strict procedures and regulations that set conditions for how the data can be accessed and used. This can be achieved by asking users to sign an End User Agreement (for an example see the UK Data Service End User License) or enter into a data sharing agreement. For guidance on creating a data sharing agreement please contact the Research Grants and Contracts team. Any controlled or restricted access must be in line with data sharing permissions agreed during the consent process. 

Examples of access controls include:

needing specific authorisation from the data owner to access data
placing confidential data under embargo for a given period of time until confidentiality is no longer pertinent
providing access to approved researchers only
providing secure access to data by enabling remote analysis of confidential data but excluding the ability to download data

Source: UK Data Service - Access Controls

What are intellectual property rights?

Intellectual property rights refer to intellectual works for which exclusive rights are recognised. They allow people to own the works they create and grant certain controls over the exploitation of those works.

The four main types of intellectual property rights are patents, trademarks, designs and copyright.

Copyright is an intellectual property right that is automatically assigned when a work is created. In the UK no formal registration process is required.
For something to be protected by copyright it must be "original".  Here original does not mean "imaginative", only that it must have required effort and is not simply a copy of any other work. 
A work can only be subject to copyright if it is "fixed" in some medium. Ideas and facts cannot in themselves be copyrighted, they must be expressed in some tangible form. 
Copyright only exists for limited time periods. For literary and artistic works (and since 2014, sound recordings) the duration is usually the author's life plus 70 years, for typographic works 25 years from the data of publication, and for Crown copyright, 50 years from date of publication or 125 years from date of creation. (For a full list visit  Gov UK - How Long Copyright Lasts).

For more guidance on copyright visit our  Copyright webpages 

For enquiries about copyright in general email  copyright@kcl.ac.uk

Copyright and databases

A database may be protected by both copyright and database rights.

Copyright

Copyright might exist in the content of a database, independent of the structure and arrangement, i.e. the database may contain image, text, audio, video or other files that are protected by copyright - including third party copyright if the database contains secondary data - providing they are original and not copies of other works.
A database might also have restricted copyright in its structure if the selection and arrangement of its contents are the result of the author's "intellectual creation". The  Copyright, Designs and Patents Act 1988 (CPDA 1988)  states that "a literary work consisting of a database is original if, and only if, by reason of the selection or arrangement of the contents of the database the database constitutes the author’s own intellectual creation".

Databases rights

If the contents or structure of the database are not original the database may still be protected under the database right.
A database qualifies for the database right if there has been a "substantial investment in obtaining, verifying or presenting the contents of the database" (The Copyright and Rights in Databases Regulations 1997).
A database right is infringed if a person extracts or re-utilises all or a substantial part of the contents of the database without the consent of the rights owner.
The database right remains in force for 15 years from the creation of the database or from the date of publication. However, the term is renewed each time a "substantial change" is made, so for databases that are continually updated or renewed the right could, in principle, last indefinitely. As with copyright, the database right arises automatically.

Copyright and data sharing 

Before data can be shared it is important to be clear about who owns the intellectual property rights. Normally the creator of a work is automatically assigned copyright ownership, although ownership can subsequently be transferred or waived. For works created during employment, the employer is usually the IPR owner, but to enable the dissemination of scholarly outputs the university, like most HEIs, permits researchers to publish their data, e.g. in a repository or data journal.

Please see the university's  Code of Practice for Intellectual Property, Commercial exploitation and financial benefits for more details. 

Licensing your data

If you are the rights owner, you can set conditions for how the data can be accessed and used by applying a licence to the data. Guidance is available in our Licensing Your Data guide (pdf).

Copyright and secondary data 

If you are using data collected from other sources, you will need to be very clear about who owns the rights to the data and aware of any conditions that might place restrictions on the re-use of the data. This is particularly important if you are planning to deposit your data in a repository or data centre (or if your funder or journal publisher requires you to do this).

Are there any licensing conditions or end user agreements that place restrictions on how the data can be re-used? 
If you are planning to deposit the data in a repository or data centre have you obtained the necessary permissions from the rights owners?
Does the data contain information about human subjects? If so, do existing consent agreements allow the data to be used beyond the purposes the information was originally collected for?

The University of Southampton have produced a very useful  Copyright Flow Chart  to guide researchers through the process of managing copyright.

Plan ahead...

It is important to clarify ownership of IPR and copyright for the data you will collect or otherwise access during your project as early as possible. You can use your  data management plan  to capture this information and help you develop strategies for managing rights ownership throughout the lifetime of your project.  

Rights ownership for databases can be quite complex as databases often contain multiple authors and can be drawn from many sources. If in doubt contact the Research Data Team:  research.data@kcl.ac.uk

Further support and resources:

Data discovery and reuse

Many research projects use secondary data to answer their hypothesis or research questions. Using secondary data allows you to save time and effort, while making use of freely available material. However, secondary data may be limited to their relevancy, scope and sufficiency in relation to your research. It is therefore important to be able to find the correct type of data for your project by searching in recognised repositories (also known as archives or data centres).

Searching for data

A good starting point to find data for reuse is the Registry of Research Data Repositories (re3data), a global registry allowing you to search by subject area or discipline.

Research Pipeline is a guide to the world’s free data, while the Open Access Directory maintains a smaller list of trustworthy repositories categorised by subject area.

If you don’t have a search strategy for secondary data, Michigan State University have published some useful guidance on how to search for research datasets and find data and statistics.

Types of Repositories and Data Centres

1. Domain or discipline specific repositories and data centres

One place to start searching for datasets is a repository that focuses specifically on your discipline and the type of data you wish to work with. Below you can find a selected list of domain specific repositories categorised by discipline:

Life Sciences (including Medical and Health Sciences)

National Centre for Biotechnology Information (NCBI)- advances access to biomedical and genomic information
Protein Data Bank (PDB)- repository including information about the 3D structures of proteins, nucleic acids, and complex assemblies
NIH Data Sharing Repositories- NIH-supported data repositories that make data accessible for reuse
Data Sharing for Demographic Research (DSDR)- datasets relevant to population studies
Global Health Data Exchange (GHDx)- a comprehensive catalogue of surveys, censuses, vital statistics, and other health-related data
OpenfMRI- free open sharing of raw magnetic resonance imaging (MRI) datasets
European Bioinformatics Institute (EMBL-EBI)- explore dozens of biological data resources
MEDMI- connecting health and environmental data

Ecological and Environmental Sciences

The Natural Environment Research Council (NERC) runs seven centres for environment-related data and also supports the Archaeology Data Service for science based archaeology data
Knowledge Network for Biocomplexity (KNB)- international repository for ecological and environmental research
Data Observation Network for Earth (DataONE) – search and discovery of Earth observational data and metadata
MEDMI- connecting health and environmental data

Social and Political Sciences

The UK Data Archive- the largest collection of social and economic datasets in the UK funded by the Economic and Social Research Council (ESRC)
CESSDA ERIC: The Consortium of European Social Science Data Archives/European Research Infrastructure brings together social science data archives across Europe
The Interuniversity Consortium for Political and Social Research (ICPSR)- access to over 500,000 datasets in the social and behavioural sciences. You need to create an account in order to download data.

For what to consider when choosing a domain or disciplinary specific repository or data centre see our Deposit your data webpage.

2. Generic or multi-disciplinary repositories and data centres

If a discipline specific repository does not meet your research needs, you can search for data in general purpose or multi-disciplinary repositories. Below you can find a list of trusted generic data centres:

Figshare and Zenodo - multi-purpose repositories holding datasets from a range of subject areas and disciplines
Dryad- international repository hosting data supporting scientific and medical publications. Dryad is affiliated to a list of journals and supports the Joint Data Archiving Policy (JDAP) requiring that data supporting publications is made publicly available
European Union Open Data Portal- access open data produced by European institutions and bodies
Research Data Australia- access data from over one hundred Australian research organisations, government agencies and cultural institutions across sciences, social sciences, arts and humanities
FAIRsharing.org- a directory of databases and reporting standards across all disciplines
Data Citation Index (DCI) on Web of Science- access datasets in the sciences, social sciences and arts and humanities

3. Funder and publisher repositories

Funders and publishers have also recommended established repositories including below:

Wellcome Trust- Data Guidelines
Nature.com - Recommended data repositories categorised by subject area
Public Library of Science (PLOS) - Recommended Repositories

Even though larger amounts of data are becoming increasingly available in open data archives, some data can only be accessed through subscribed datasets. Library Guides provide a list of databases that the College subscribes to.

4. Accessing sensitive and secure data

Some data, including government, population and health datasets, may be too sensitive and confidential to be made available via download. Secure (or controlled) access to analyse these data is provided to accredited researchers through a safe environment or trusted research environment. Secure access may be provided remotely using your organisational desktop, by visiting the relevant data centre's safe room, or in one of a network of SafePods located in universities across the UK.

The Office for National Statistics (ONS) - Secure Research Service (SRS) provides access to de-identified, unpublished ONS data. Guidance for King's researchers requesting IT help with an application to access the ONS Secure Research Service is provided by IT Assurance.
UK Data Service provides access to controlled data (social, economic and population data) through SecureLab
SAIL Databank holds anonymised person-level health and administrative data about the population of Wales (and the wider UK)
Administrative Data Research UK includes de-identified data 'created when people interact with public services, such as schools, the NHS, the courts or the benefits system, and collated by government'

A SafePod where these data may be accessed is located in central London, near to King's, in the Library at the London School of Economics.

What am I allowed to do with secondary data?

All data you decide to reuse for your research should have a licence – a legal document setting the terms and conditions governing what you can do with the data and how the data should be attributed.

Licence information is normally available in the metadata record of the dataset or can be obtained from the data repository when seeking access to the data. For more information of the different licences available please see our License your data webpages.

Citing Data

Data are increasingly considered legitimate research outputs and should therefore be cited. It is best practice, and usually a condition of the licence under which data are shared, for the data to be referenced. If your department recommends a specific referencing style follow the appropriate form for citing data. Alternatively you can find further guidance on why and how to cite your data in our Citation tab.

Data Access Statements

A Data Access Statement, or Data Availability Statement, is a short statement added to a research publication to inform readers of the existence, location, and availability of underlying datasets. If access to datasets is restricted, the reasons for this should be explained and any terms and conditions of access to the data should be stated.

The statement should include a link to the dataset, where that exists, ideally using a persistent link. such as a DOI. For example:

The dataset supporting this article is openly available from the King's College London research data repository at http://doi.org/doi:10.18742/RDM01-ABC

Data Access Statements should be included in all research articles, even if there is no data supporting the paper, or if all data is included in the paper.

They are required by many research funders and detailed in policies for both research data and open access publications. They are also required by many publishers and their use is recommended in the KCL Research Data Management Policy.

Some publishers have a dedicated section in research articles for Data Access Statements. If such a section doesn’t exist, it is recommended to include the statement in or after the acknowledgements section.

For further information and support for data sharing, including details of funder policies and the King’s research data repository, KORDS, where you can deposit your data and reserve a DOI, please see information elsewhere on these pages, or contact us at research.data@kcl.ac.uk.

Template Data Access Statements

Provided here are some template statements for common scenarios of data availability. Where a publisher provides templates or guidance, we recommend using those. A list of links to publisher guidance can be found below this table.

Scenario	Template
Data openly available in a data repository	The data supporting this article is openly available from (repository name) at (DOI or other link/reference)
Data in a repository with access temporarily embargoed	The data supporting this article has been deposited in (repository name) at (DOI or other link/reference) with an access embargo until (date)
Data in a repository with controlled access due to legal, ethical, regulatory, contractual, intellectual property restrictions	The data supporting this article has been deposited in (repository name) at (DOI or other link/reference). It is not openly available due to (reason for restriction) and may be shared on request (where appropriate, include method and conditions of granting access – e.g. completing a data access agreement; only to academic or clinical researchers)
Data is not able to be shared due to legal, ethical, regulatory, contractual, intellectual property protection restrictions	Data supporting this article cannot be shared due to (reason for restriction). A descriptive record can be found in (repository name) at (DOI or other link/reference)
Data is available in the article’s supplementary material	Data supporting this article is available in the supplementary material
No data	There is no data associated with this article

Links to publisher guidance and templates for Data Access Statements

Data citation

It is good practice to also cite any other dataset that you refer to or use during your own research.

What should I include in a data citation?

A data citation should provide the reader with enough information to locate and access a dataset.

When adding a data citation to a paper (or citing secondary datasets that you reference in your paper) you should first check to see if your journal or publisher require you to use a specific citation style. If not, follow a recommended standard data citation style such as:

DataCite recommend that data citations should include the following elements:

Creator
Publication Year
Title
Publisher
Persistent Identifier (e.g. a DOI)

e.g.

Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855

Recommended optional elements:

Version
resource type

The Digital Curation Centre (DCC) have published a "superset" of elements of a data citation derived from a number of published papers on the topic, of which the most important are:

author
title
date
location
publisher

They also recommend the use of a persistent identifier where possible.

Taken together these elements "give due credit, allow the reader to judge the relevance of the data, and permit access to the data..."

NB - If there are several versions of the dataset make sure that you cite the correct version used in your paper.

Where should I place my data citation?

Again, this might depend upon your publisher or journal. The most logical place is in the bibliography, works cited or reference section. If the publisher's format includes a section specifically dedicated to describing the datasets used, the data citation could be included there. Another option might be to add the citation to the acknowledgments section (source: DCC - How to Cite Datasets and Link to Publications)

research.data@kcl.ac.uk

To speak to a member of the team, please email us using the address above to arrange a Teams call