Skip to main content


State-of-the-art text analytics providing unprecedented access to health data

A unique information extraction and semantic search system developed by researchers at King’s College London Institute of Psychiatry, Psychology and Neuroscience is set to revolutionise access to information in electronic health records.

BioYODIE and SemEHR use state-of-the-art text mining techniques to access and analyse the ‘untapped goldmine’ of information contained in the unstructured part of medical records (e.g., about patients’ experiences and history).

Until now, most of this data has been inaccessible. By unlocking the entire electronic health record, these tools can aid clinical decision making, generate data for research, and speed up recruitment to clinical trials—ultimately improving the health of patients.

They were originally developed as part of the KConnect programme, funded by the European Union Research and Innovation programme Horizon 2020, and have already been used in a wide range of research studies. Examples include: finding and recruiting patients for the Genomics England 100,000 Genomes Project, and identifying and extracting information on a range of health conditions experienced by people with mental health disorders.

Importantly, BioYODIE and SemEHR have been adopted into the information and retrieval platform Cogstack. Potentially, they could be used in other similar systems that create, process, retrieve, and aggregate health data and information, as well as in informatics research.

Moving forward, several NHS Trusts (South London and Maudsley, King’s College Hospital, University College London Hospitals, Norfolk and Norwich University Hospitals) are using the tools in their electronic health record systems to support clinical and research applications. These include a large project analysing autoimmune conditions as risk factors in treatment-resistant depression, and a project investigating the prevalence, management, and prognosis of hepatitis C in people with substance abuse disorders.

Project status: Ongoing