Researchers from King’s College London have developed ExeTera, a fully open-source software package to provide analytics on data sets from the ZOE COVID Study (ZCS) App that approach terabyte scales. The software is now a critical component of a data curation pipeline that enables reproducible research across an international research group for the ZOE Covid Study.
ExeTera has recently been published in Nature: Scientific Data and is under active development by The School of Biomedical Engineering & Imaging Sciences at King’s College London released under the Apache 2.0 license.
The success of the ZOE Covid Study creates signiﬁcant technical challenges around effective data curation.
Thanks to the software developed by our engineers at King's we’ve been able to manage really huge time series datasets on millions of people logging on the app to make significant scientific contributions to understanding of COVID-19 and at speed. This shows the power of working across disciplines with big data to make big impact.– Dr Claire Steves, Lead researcher in the ZOE COVID study
As a researcher with a clinical background, ExeTera allows me to directly access our large ZCS dataset myself. The syntax is understandable, so I am therefore confident that I am using our scripts appropriately and producing meaningful results, showing associations of key data within the app with participants who develop Long Covid.– Dr Marc Österdahl, Clinical Fellow in the Department of Twin Research and Genetic Epidemiology, King's College London
As of 22 November 2021, over 5 million participants have collectively logged over 456 million self-assessments of symptoms since the ZOE COVID Study App’s introduction in March 2020.
The resulting dataset is at a scale where merely handling the data is one of the key challenges to carrying out research upon it.
For the ZOE Covid Study, the researchers developed ExeTeraCovid, a repository of scripts and notebooks that uses ExeTera to create reproducible end-to-end data curation workﬂows.
ExeTera and ExeTeraCovid together provide a data analytics and data curation solution that has underpinned over thirty publications and enables consistent, reproducible science to be carried out on the ZOE Covid Study dataset.
It also provides the ability to work on tabular datasets approaching a billion rows on standard desktops and laptops, taking advantage of the hidden supercomputer-like capabilities of modern CPUs.
ExeTera and ExeTeraCovid have allowed me to build on analyses and models so that my research stays consistent with the wide body of research produced by KCL on the ZOE COVID study dataset.– Dr Ronan Whiston, Postdoctoral Bioinformatician
ExeTera is a new useful tool that allows the extraction, curation and assessment of large datasets, which would be otherwise computationally expensive. ExeTera was deployed to study COVID-19, supporting the development of new analyses to understand this illness during crucial times of the pandemic.– Dr Liane Dos Santos Canas, researcher in the School of Biomedical Engineering & Imaging Sciences