When it comes to food, wine or art the idea of provenance is widely understood. It’s a way of verifying that a painting really is by a famous artist or that Cornish pasties come from Cornwall. We know because someone can trace back the painting or the pasty to its origins. For the person who is buying a pasty (or Jersey Royal potatoes, or French champagne) or the museum or collector acquiring a valuable painting , provenance is what allows them to trust that they are getting the genuine article.
Exactly the same principles apply in the digital world. Each year we rely on data to drive more and more areas of human activity. As processes become ever more sophisticated it is important to be able to trace data back to its origins, to allow people to trust what the data is telling them.
Provenance is a fundamental building block of data governance. It tells us about the entities, activities, and people involved in producing a piece of data , which can be used to form assessments about its quality, reliability or trustworthiness. This information is typically obtained by tracing the relationship between agent, entity, and activity, and the time that the activity started and ended (who did what when, where and how). Academic research on provenance has led to the development of a computerised standard called the PROV data model, or PROV-DM (https://www.w3.org/TR/prov-dm/). The PROV standard is published by the World Wide Web Consortium.
King’s has worked in this area for more than a decade, laying groundwork for the adoption of the PROV standard, and conducting world-leading research that builds on PROV to deepen the understanding of provenance, develop software engineering methodologies and techniques to deploy provenance, and to conceive provenance-based techniques for systems to produce explanations about their decisions. What started as conceptual work is now widely used in practice, in sectors including health management systems, automated decision systems in finance and command and control systems in the US Navy.
The need became more acute In the UK with the introduction of new General Data Protection Regulation (GDPR) in 2018. Organisations managing data have comprehensive responsibilities for ensuring the accuracy and protection of data. This is overseen by the Information Commissioners’ Office (ICO) , the government body responsible for upholding individuals’ information rights. It has drawn on work led by Luc Moreau, in setting out its guidance to companies on explaining decisions made with AI. – a fundamental step in improving trust in AI and the processes it drives.
In health care, Imosphere Ltd is a software company that develops tools for health, care and education organisations. Its flagship project, Atmolytics produces interactive reports from patient cohort data. Atmolytics incorporates elements of provenance template technology developed by Professor Vasa Curcin, Reader in Health Informatics, to add new functionality to the product. He said:
“Provenance templates can be used to generate a model-driven service interface for domain software tools to routinely capture the provenance of their data and tasks. It is exciting to see templates being widely adopted to support health analytics.”
The new, provenance-enabled version of Atmolytics was launched in 2018 and by November 2020 had more than 4,000 users in the UK and USA, managing data for more than one million patients. The product allows care providers and services to use reliable, trusted data to identity patients at risk of certain conditions, track disease markers for patient cohorts, identify gaps in provision and assess the impact of new services. Among its many uses the South London Stroke Registry is using the programme as a front end providing sophisticated data exploration, visualisation and data analytics.
The CEO of Imosphere, says:
“Using the PROV standard and provenance templates…. saved us years in design and development time and ensured we are standard compliant for any further extensions, reducing time to market by approximately one year…. It has also improved the software engineering aspect of our data analytics portal, as it promotes good practice in reusing and documenting analytical components across reports, avoiding duplication.”
The term provenance was first used in the English language in the late 18th century, to describe the origins of historical artefacts. Now, 250 years later, the concept has been extended to protect and enable the newest data driven innovations.