Organise

File naming, folder structure, metadata, and regular and consistent quality checks will ensure data can be found, understood and relied upon.

You should think about how you are going to organise your files and folders from the outset. Files can very quickly become disorganised and unmanageable if file names and folder structures are not organised in a consistent and logical way. Well organised files and folders make it easier to locate and retrieve your data and save time and frustration.  

File Structure

Files should be structured hierarchically (an example of a hierarchical file structure can be found on the UK Data Service web site)
A file structure should have files covering broad topics at the highest level with more specific folders nested within them
You should try to avoid letting your file structure become too complicated (the UK Data Service recommend restricting the level of folders to three or four deep and not to have more than ten items in each list)
Review your file structure regularly and remove any unnecessary files

File Naming

Naming files in a consistent and logical way will help you to distinguish between similar files and make finding your data easier. To ensure consistency and avoid confusion you should choose a system of naming conventions at the outset of your project and stick with it.

Keep names brief and meaningful to you and your colleagues
Name folders after an area of work or study rather than the individual responsible to avoid confusion if there are staff changes
Avoid long file names
Where they exist, use file and folder naming conventions already established in your research group or department
Write dates in reverse from larger to smaller units - use the international ISO 8601 standard: YYYY-MM-DD. Place dates at the beginning of your files if you wish to sort them chronologically
Avoid using spaces and specialist characters in file names such as “ ? * < >
Use hyphens or underscores to separate words

Versioning

Your files will probably go through various drafts and versions. If you are engaged in collaborative research your files may be revised by more than one person. How will you keep track of who made which changes or identify which is the current or final version? Version control allows you to manage and record the changes your documents go through as they are redrafted and amended.

Keep a single master file to avoid confusion caused by multiple versions
Keep master files in a separate location
To sort files sequentially and keep track of different versions use consistent numbering and identification systems

For example:

Use whole numbers for major changes: v01, v02, v03
Increased decimal figures for minor changes: v01_01, v01_02, v01_03
Add a version control table to important documents to keep track of changes made, when and by whom. It can be added at the beginning or end of the document itself. An example of a version control table, along with further information on file versioning, is available from the UK Data Service.

Sources:

UK Data Service – Organising data & Version control and authenticity
University of Edinburgh - Standard Naming Conventions For Electronic Records: The Rules
Online tutorial - University of Edinburgh, MANTRA: Organising Data

What is data documentation?

Data documentation provides information about how and why data files were created, their content and structure, and what processes and transformations the data have undergone during the lifetime of the project. Data documentation also provides information that enables the data to be accessed and interpreted by future users.

A crucial part of making data user friendly, shareable and with long lasting usability is to ensure they can be understood and interpreted by any user. This requires clear data description, annotation, contextual information and documentation.– Document Your Data, UK Data Service

What is metadata?

Metadata is sometimes defined as "data about data" or "information about data". Although the terms documentation and metadata are sometimes used interchangeably, metadata is also used in the more restricted sense of structured information that is both human and machine readable. 

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.– Understanding Metadata, NISO, 2004

Why are data documentation and metadata important?

Providing adequate documentation and metadata to your data is essential. Documentation and metadata add context to your data and provide information necessary for its discovery and reuse. Adding metadata makes it easier for you to find and understand your own data as well as enabling your data to be accessed and shared - where appropriate - with others. 

Exhaustive documentation and metadata compliant with your discipline's standards and schema will make your data Findable, Accessible, Interoperable and Reusable. In one word: they will make them FAIR.

Without adequate documentation and metadata your data are potentially meaningless. Ideally, you should begin documenting your data as it is created or collected rather than leaving it until the end of the project.

What should I include in my metadata?

This will vary according to the type of data being described and the level of description. For most datasets you will usually be required to provide at least some basic descriptive information. Ideally, you should provide enough contextual information to allow others to discover, understand, access and reuse your data.

... the metadata must be sufficient to allow others to understand what research data exists, why, when and how it was generated, and how to access it.– EPSRC, Clarifications on Research Data Management

At a minimum you should provide information about:

What data have been collected
How the data were collected
Why the data were collected
When the data were collected
How the data can be accessed

Data can be described at different levels of "granularity":

Project or collection level metadata - describes a dataset or collection of files associated with a project or publication
File level metadata - describes individual data files

Source: Archaeology Data Service: Guide to Good Practice

The UK Data Archive's recommendations for good metadata practice include:

Project or study-level description:

the context of data collection: project history, aim, objectives and hypotheses 
data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used 
dataset structure and relationships between files
data validation, checking, proofing, cleaning and quality assurance procedures carried out
changes made to data over time since their original creation and identification of different versions of data files
information on access and use conditions or data confidentiality

Data or file-level description:

names, labels and descriptions for variables, records and their values
explanation or definition of codes and classification schemes used 
definitions of specialist terminology or acronyms used
codes of, and reasons for, missing values
derived data created after collection, with code, algorithm or command file
weighting and grossing variables created
data listing of annotations for cases, individuals or items

Source: Van Van den Eynden, et al (2012) Managing and Sharing Data, UK Data Archive, p9.

If you deposit your data in a repository you will almost certainly be expected to provide a minimal amount of project level metadata and some repositories might ask you to add file level descriptions as well.

How do I add metadata to my data files?

Documentation and metadata can be added to data in a variety of ways:

Automatically - some file formats automatically add metadata as the file is created (e.g. some MS Office applications). In some cases metadata are also automatically recorded by the instruments used to generate the data. 
Manually embedded in data files -  some file formats allow you to add or change metadata embedded in the file (again, some MS Office applications allow you to do this - click here for information on how to change properties in MS Office files). If the data is recorded on a spreadsheet or a text file, metadata can be added to the file as extra sheets or pages. 
As separate files - metadata and associated documentation can also be added as separate files such as  codebooks, data dictionaries, laboratory notebooks or readme files.  
Some data analysis software (e.g. NVivo, SPSS) include facilities for annotating and describing data.

Are there any disciplinary or domain specific metadata standards I should use?

If you are depositing your data in a domain or disciplinary specific repository you might also be asked to provide information about your data that is specific to your domain or discipline. It is a good idea to make sure that you are familiar with any metadata standards that are widely used within your area of research.

*Some examples of disciplinary metadata standards*
Standard	Discipline / Type of Data
CIF	Crystallographic science data
Darwin Core	Biology
DDI	Statistical and social science data
EML	Ecology
EAD	Archival materials
FITS	Astronomy
VRA Core	Images and works of culture

See the Digital Curation Centre's web site for a more comprehensive list of disciplinary metadata standards as well as information about disciplinary metadata tools. 

Where can I find examples of published metadata records for research data?

Examples of published metadata records for datasets and data files can be found by browsing the catalogues of data repositories or data centres (e.g. UK Data Archive, Dryad, Archaeology Data Centre). Re3data.org is a registry of research data repositories.  

Additional guidance and resources:

UK Data Service – Document Your Data
University of Edinburgh – Documentation, Metadata, Citation
The DataCite Metadata Schema is a widely used metadata schema specifically designed for the purpose of describing data

What is meant by quality assurance and quality control for research data?

Quality assurance and quality control are the measures which researchers can adopt to prevent errors from entering or remaining in a dataset. Ensuring the quality and integrity of research data is an integral part of good research data management across the whole research life cycle, from collecting data to preparing data for analysis and publication. In addition, many funders expect researchers to include details of the measures they will adopt to safeguard data quality and integrity in their data management plan. 

How do I assure the quality of my data? 

Here are some examples of best practices for assuring data quality and integrity adapted from guidance provided by the UK Data Service.

1. Before and during data collection/creation

You can use the following procedures to make sure that the data recorded reflect the actual facts, events, responses and observations:

Define standards prior to data collection:
- calibration of instruments to check the precision, bias and/or units, scale of measurement
- decide the format of data collection (i.e. electronic or paper? Digital format?)
- use standardised methods and protocols for capturing observations, alongside recording forms with clear instructions
- create metadata in unison with data to be collected - see Digital Curation Centre’s guide for disciplinary metadata
Check the truth of the record with an expert
Employ computer-assisted interview software to:
- standardise interviews
- verify response consistency
- route and customise responses so that only appropriate questions are asked
- confirm responses against previous answers or detect inadmissible responses
Assign responsibility of quality assurance to a member of the research team

2. Digitisation and data entry

You can use the following methods to ensure accurate, standardised and consistent data transcription, digitisation or entry in a database or spreadsheet:

set up validation rules or input masks in data entry software
use data entry screens
use controlled vocabularies, codes lists and choice lists to minimise manual data entry
use consistent terminology and detailed labelling of variables and record names to avoid confusion
double entry- have two people entering the same data independently and use a computer to check for agreement of data entered
use a text-to-speech software to read the data back
design a purpose-built database structure to organise data and data files
accompany your data with detailed documentation and notes
document any modifications to the dataset to avoid duplicate error checking
ensure there’s efficient storage for the data - see our Store and secure your data web page for more information

Guidance on interview transcription methods and quality control can be found on the UK Data Service website.

After data collection/entry: data checking

Checking your data is a vital stage of ensuring quality usually performed after data are edited, cleaned, verified, cross-checked and validated. The following procedures can apply at this stage:

ensure data columns and rows line up properly
double-check coding of observations or responses and out-of-range values
check for missing, irregular data entries
add variable and value labels where appropriate
verify random samples of the digital data against the original data
perform statistical analyses – frequencies, means, ranges or clustering - to detect errors or outliers
correct errors made during transcription
peer review

Additional resources

UK Data Service - Quality Assurance

An introduction to FAIR research data and metadata

The acronym FAIR stands for Findable, Accessible, Interoperable, Reusable. It is a set of principles for research data and metadata, to improve its discovery and access, how it can interact with other datasets and systems, and ultimately how it can be reused by others.

The term ‘FAIR’ was launched at a Lorentz workshop in 2014, and the resulting FAIR principles were published in 2016. They have been widely accepted and promoted by researchers, institutions, funders, publishers, and political leaders. There are a number of initiatives committed to developing, understanding and meeting them. The principles are summarised by the GO FAIR initiative as:

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

Accessible

Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

Within each principle there are several steps to work towards - and work towards is a good way of looking at it, as achieving FAIR isn’t a binary state. Rather, it is a spectrum along which you can meet different aspects or degrees of making data FAIR. Realistically you might not meet every measure of FAIR, but all that you meet will help to enable data reuse. As nicely described in the Turing Way handbook for reproducible data science, FAIR applies not just to data files or datasets themselves, but to different entities in the storing and sharing infrastructure:

“The FAIR principles refer to three types of entities: data (as any digital object), metadata (information about that digital object), and infrastructure (i.e. software, repositories). For instance, the findability principle F4 defines that both metadata and data are registered or indexed in a searchable resource (e.g. a data repository).”

Furthermore, the responsibility and contribution towards making data FAIR are shared by researchers, institutions, technology providers, funders and publishers, with some examples being:

Individual Researchers will strive to: Document data to agreed community standards that describe provenance and enable discovery, assessment of reliability, and reuse

Funding agencies and organizations will strive to: Review data management plan requirements regularly to validate support of open and FAIR standards and promulgate leading practices.

Societies, communities, and institutions will strive to: Promote open and FAIR data activities as important criteria in promotion, awards, and honours.

Publishers will strive to: Adopt a shared set of author guidelines that support FAIR principles, providing a common set of expectations for authors

Repositories will strive to: Ensure that research outputs curated by repositories are open and FAIR, have essential documentation, and include human-readable and machine-readable metadata (e.g. on landing pages) in standard formats that are exposed and publicly discoverable.

These are among the principles contained in the Commitment Statement in the Earth, Space, and Environmental Sciences, which is one of many initiatives by groups with a disciplinary or process driven approach.

Where do I start?

The How FAIR are your data? checklist is a good place to start, to think in advance about what you might need to do to make your data FAIR, and to assess it before it is archived and shared.

If you’d like advice about making your data FAIR, please get in touch with us at research.data@kcl.ac.uk.

Organise

File Structure

File Naming

Versioning

For example:

Sources:

What is data documentation?

What is metadata?

Why are data documentation and metadata important?

What should I include in my metadata?

At a minimum you should provide information about:

Data can be described at different levels of "granularity":

The UK Data Archive's recommendations for good metadata practice include:

Project or study-level description:

Data or file-level description:

How do I add metadata to my data files?

Are there any disciplinary or domain specific metadata standards I should use?

Where can I find examples of published metadata records for research data?

Additional guidance and resources:

What is meant by quality assurance and quality control for research data?

How do I assure the quality of my data?

1. Before and during data collection/creation

2. Digitisation and data entry

After data collection/entry: data checking

Additional resources

An introduction to FAIR research data and metadata

Where do I start?

Further reading and resources:

Metadata Glossary

Study at King’s

Information for

Facilities

Discover King’s

Contact us

Organise

File Structure

File Naming

Versioning

For example:

Sources:

What is data documentation?

What is metadata?

Why are data documentation and metadata important?

What should I include in my metadata?

At a minimum you should provide information about:

Data can be described at different levels of "granularity":

The UK Data Archive's recommendations for good metadata practice include:

Project or study-level description:

Data or file-level description:

How do I add metadata to my data files?

Are there any disciplinary or domain specific metadata standards I should use?

Where can I find examples of published metadata records for research data?

Additional guidance and resources:

What is meant by quality assurance and quality control for research data?

How do I assure the quality of my data?

1. Before and during data collection/creation

2. Digitisation and data entry

After data collection/entry: data checking

Additional resources

An introduction to FAIR research data and metadata

Where do I start?

Further reading and resources:

Metadata Glossary

The UK Data Archive's recommendations for good metadata practice include:

How do I assure the quality of my data?