While there is no single definition of data, research data can generally be defined as any representations or other objects that are created or gathered for the purposes of producing research or scholarship, and which can be used to validate or reproduce original research findings.
- Research data can cover a broad diversity of form and content, including numbers, text, still and/or moving images, audio, simulations, models, or interview recordings.
- Data objects can be physical or print as well as digital.
- Data can be created or collected using a variety of methods including observation, experimentation, simulation, or derived from already existing datasets.
- Data can also exist in different states of readiness e.g. raw, processed, cleaned, summarised, or finalised and ready for archiving.
Definitions of research data do not usually include research records such as correspondence, grant applications, ethics applications, technical reports or signed consent forms, but these still need to be managed during the course of the project.
Additional guidance: Introduction to Research Data Management
Research data management covers all aspects of looking after data throughout the research lifecycle and beyond, from planning the investigation through to preparing the data for long term preservation and future access and reuse.
Most UK and overseas funders now have data management policies that require or expect researchers to take steps to ensure that data that supports published research findings or has long term value can be preserved beyond the lifetime of the research project and made available to future users.
At the same time, data management should be seen as good research practice that has benefits for the scholarly community and wider public.
Research data management:
- reduces the risk of data loss
- enables greater scrutiny of published research
- facilitates data sharing and reuse
- maximises the impact of data and published research findings
- promotes open research and encourages public interest in publicly funded research
- makes it easier for researchers to get credit for their data
Additional guidance: Introduction to Research Data Management
A data management plan is a document that describes how you will look after your data both during the research project and after. You will typically be asked to provide details of how much and what kind(s) of data you expect to generate; any legal or ethical issues that need to be addressed; how the data will be organised, documented and described; how the data will be securely stored and backed up during the project; what steps you will take to ensure the long term preservation, accessibility and reuse of the data, and any extra costs and resources needed to meet your data management requirements.
Having a data management plan can
- ensure compliance with funder and institutional policies*
- save time and resources in the long run
- make it easier for you to locate and use your data during your project
- manage or avoid risk (e.g. data loss or accidental or malicious disclosure)
- help you identify any tasks or responsibilities that need to be planned for in advance (e.g. obtaining consent or clarifying copyright and other rights permissions)
*It is a requirement of the King's Research Data Management Policy that all King's researchers should create and maintain a data management plan for all projects handling research data.
Data management planning is also good research practice - a data management plan should be thought of as a "living" document that can be revised and updated throughout the course of your research.
Additional guidance: How to Create a Data Management Plan.
Where possible you should store your data in open rather than proprietary or closed file formats. The software in proprietary formats is often licensed or privately owned and access may be restricted by copyright or patents. If a company ceases trading or the software is discontinued, then the format may become obsolete leaving the data inaccessible. Examples of open formats include: ASCII, Open Document Format, CSV, HTML, FLAC, PDF.
However, some proprietary formats have become standard and likely to be in use for the foreseeable future (e.g. Microsoft Office applications and SPSS) or are widely used within a particular research domain or discipline.
Things to consider include:
- Which formats are best suited for data collection and analysis?
- Which formats have you and your colleagues used in the past?
- Is there a risk of file format obsolescence?
- Are there any disciplinary-specific requirements?
- Is the format suitable for conversion?
- Are you planning to deposit your data in a repository or data centre? If so, will there be restrictions on the types of file formats they will accept?
Additional guidance: Formatting your data
Metadata is sometimes defined as "data about data" or "information about data". Although the terms documentation and metadata are sometimes used interchangeably, metadata is also used in the more restricted sense of structured information that is both human and machine readable.
Many funders expect you to publish metadata alongside your research data. For example, the EPSRC's Policy Framework on Research Data includes the expectation that researchers will provide sufficient metadata "to allow others to understand what research data exists, why, when and how it was generated, and how to access it."
The amount of metadata you need to add to your data will depend on the level of description and type of data being described. When thinking about what metadata you might need to collect, ask yourself what information would you require to be able to make sense of your data at a later date, but also what information would other researchers need to be able to understand and reuse your data?
Things to consider include:
- Have you provided sufficient documentation and metadata to enable others to discover, understand, access and reuseyour data?
- What metadata does your chosen repository or data centre expect you to provide before they will accept your data for deposit?
- Are there any metadata standards that are widely used within your discipline or research domain?
Additional guidance: Documentation and metadata. Metadata for research data (a short glossary)
All King’s research staff have 2GB storage space with their personal file store, i.e. the ‘Documents' folder accessed via the desktop or RemoteApp. University file servers are managed by IT and provide regular backups.
All staff and students at King’s have access to 1TB of space on OneDrive for Business. Contact IT Services for assistance f you are planning to use OneDrive for Business to store data which contains personal information. King's employees can also apply for a SharePoint site with SharePoint Online, a file sharing service that provides storage space and allows collaborators to share files and information, including personally identifiable information. Click here for details of how to request a SharePoint site.
Some schools or departments may also provide local storage facilities for storing data during the lifetime of the research project.
If you require further help with managing your data storage requirements please contact IT Services: email 8888@kcl.ac.uk or telephone 020 7848 8888
Additional guidance: Store Your Data
There are a number of factors which can influence your decision whether to retain or dispose of your data once your research project has ended.
- Are there data you need to keep?
- Does the data support published research findings?
- Are the data needed to verify or reproduce those findings?
- Do you need to keep the data to satisfy funder or publisher requirements?
- Are there any university requirements regarding data retention? (Timescales for retention and disposal of research data are included in the College Records and Data Retention Schedule, pp31-34).
- Which data would you like to keep?
- Is the data likely to be reused by yourself or others?
- Is the data unique or difficult to replace?
- Does the data have scientific or historical value?
- Are there data that shouldn't be kept or isn't worth keeping?
- Is there sufficient documentation and metadata to allow others to read and interpret the data?
- Is future reuse of the data restricted or prohibited because of obsolete or proprietary software?
- Does the cost of preserving the data outweigh the benefits?
- Are there legal or regulatory obligations to take into account (e.g. The General Data Protection Regulation, The Freedom of Information Act)?
Additional guidance: Appraisal and selection
There are a number of factors which will determine how long you should keep your data for.
- What does your funder require?
Most research councils and many other funding bodies have published guidelines on how long data should kept once a project has ended. These vary from funder to funder. For an overview of funder policies and links to relevant policy documents and guidance see our web page Funder Policies on Managing and Sharing data.
- What does the university require?
Timescales for retention and disposal of research data are included in the university Records Retention Schedule . Periods of retention vary according to how the data is classified. The Corporate Records Management team can provide further assistance if you are unsure of which category your data belongs to: records-management@kcl.ac.uk.
NOTE - where the university's retention schedule differs from funder policy requirements, the latter takes precedent.
- Are there any legal or regulatory requirements?
The General Data Protection Regulation (GDPR) requires that personal data should not be kept for longer than is necessary or for purposes other than those for which it was collected. However, the regulation does include an exemption which permits the archiving of personal data collected for 'scientific or historical research purposes' provided that appropriate measures are in place to safeguard privacy (GDPR, Article 89/1).
If you are planning to hold on to data which you are otherwise not required to retain, keep in mind that datasets held by public authorities, including universities, are subject to requests for access under the Freedom of Information Act (2000). Access can only be withheld where one of the Act's exemptions applies. Once an FOI request has been made the data must not be deleted. Destroying or deleting information once an FOI request has been made is a criminal offence.
Additional guidance: Appraisal and selection
Most funders expect you to deposit your data in a reputable repository or data centre.
Types of repository and data centre include:
- Domain or disciplinary specific
National data centres and domain specific repositories have the specialist knowledge and resources to look after particular types of data. Some funders also support domain or disciplinary specific repositories or data centres and expect researchers to deposit their data in these once the project has ended (e.g. the UK Data Archive (ESRC) and NERC data centres).
- Generic or multi-disciplinary
If you are unable to find a disciplinary specific repository you might be able to deposit your data in a general purpose or multi-disciplinary repository or data centre (e.g. Figshare, Zenodo).
- Institutional, e.g. the King's RDM System
If there are no suitable external repositories for your research data you can deposit your dataset with the university's data repository service (see the FAQ "Does King's have a data repository or archive?" and our web page Deposit your data with King's for more details).
re3data.org is a registry of research data repositories, which allows you to search by discipline.
Additional guidance: Deposit Your Data
The university provides a data repository service offering long term storage facilities for datasets that support published research and/or are ready for archiving. If you deposit your data with King's we will also issue a DOI and publish a metadata record for the dataset.
Please note, datasets containing sensitive or confidential information will be reviewed on a case-by-case basis.
Please email research.data@kcl.ac.uk for further assistance.
Additional guidance: Deposit your data with King's
There are a number of ways that you can share/publish your data -
- in a repository or data centre (see our web pages Deposit Your Data and Deposit your data with King's)
- as supplementary materials accompanying a journal article
- via a project or departmental web site
- by request via an email address
Be aware that choosing to publish your data via a web site or by email request might not, by themselves, satisfy funder requirements or guarantee long term preservation and access.
If you deposit your data in a repository it will most likely be issued with a DOI or other persistent identifier. This makes it easier for others to cite your data. You can also include a citation for your data in your data access statement to encourage others to cite your data (see the FAQ "What is a data access statement?").
Additional guidance: Publish your data, Licence your data, Cite your data
If your data will be supporting published research findings it is a good idea to include a data access statement in your published research paper(s). Some funding bodies and journal publishers have made this a requirement of their data policies. A data access statement is a short statement added to your published paper giving details of how the data can be accessed, and any terms and conditions which might restrict or prevent access to your data. You should include a web link such as a DOI or URL where possible or direct interested parties to contact the Research Data Management team via our email address research.data@kcl.ac.uk. A personal email address is not usually considered sufficient.
Example:
“The data supporting this research are openly available from the King's College London research data archive at (insert DOI or URL)"
More examples of data access statements can be found on these web pages:
Additional guidance: Meeting the EPSRC expectations
The College encourages postgraduate researchers to make data which supports published findings openly available with as few restrictions as possible unless there are ethical, legal or contractual reasons not to.
Also, if you are in receipt of external funding, your funders' data policy may include a requirement that you make your data publicly available once the project has ended as long, again, there are no ethical or legal reasons for restricting or prohibiting access to the data. Details of funder policies on data sharing is available on our web page Funder policies on managing and sharing data.
Likewise, some journal publishers now have data policies which include an expectation or requirement that data which supports published research will be made publicly available. Additional information can be found on our web page Data Sharing, Access and Reuse.
Some funders include software and code within the scope of their data management and sharing policies. The EPSRC define data as "recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings”, so if your software or source code is needed to validate your research findings then you would be expected to archive and share it unless there are good reasons not to (e.g. due to IP or contractual obligations). The EPSRC also recognise that that in some cases - e.g. simulated data or outputs of models - it might be preferable to preserve the source code rather than the data itself.
Even if you are not required by your funder to preserve the software, it is still good practice to make the software and associated documentation available so that others can validate your research findings providing that, again, there are no IP or contractual reasons not to.
If you are sharing software or code, you should consider issuing a licence for it. An open-source licence grants users of your software certain rights and determines whether future users can change, copy, modify or redistribute your data. It also asserts your authorship of the data.
GitHub is widely used for sharing software during a research project but does not meet the EPSRC's expectations for long-term preservation/accessibility. Therefore, for long-term preservation it is recommended you use a data repository such as Zenodo which provides a plug-in which allows researchers to share their GitHub data via Zenodo. Guidance is available here.
Sources:
The UKRI funding bodies allow for data management costs to be included in grant applications, but only if the costs are incurred or allocated before the award has ended. Information on how to support your research data management costs through UKRI grant funding can be found on the UKRI''s web page 'Supporting research data management costs through funding', which includes a link to a pdf document 'Guidance on best practice in the management of research data' (see pp10-12 for guidance on including data management costs in your funding application). A blog entry on supporting research data management costs is also available here.
The UK Data Archive have created a useful Data Management Costing Tool to help researchers cost their data management needs in their grant applications.
If you are unsure what costs your funder will cover either contact them directly for clarification or email the research data management team research.data@kcl.ac.uk.