Skip to main content

About the Group

The Precision Medicine & Statistical Learning Group is a research group within the Department of Biostatistics of interested statisticians within the Department at the Institute of Psychiatry, Psychology and Neuroscience headed by Daniel Stahl. Our research focus is to combine machine learning methods from computer science and statistical models (which make probabilistic assumptions about the underlying phenomena) to improve the development of methods in gaining knowledge, making predictions or decisions and constructing models from a set of data. We are particular interested in developing and applying new statistical learning methods to develop prediction models of treatment success (Personalised or stratified medicine) and future outcomes (Prognosis).

Statistical learning

In statistics the prime focus is usually in understanding the data and relationships in terms of models by estimating parameters and quantifying uncertainty of these estimates. Machine learning uses computer-intensive learning -algorithms and focus on prediction and classification and less on mechanisms. Statistical learning theory tries to unify the two approaches and thus studies within a statistical framework the properties of learning algorithms commonly used in machine learning.

Prediction modelling

We are mainly interested to develop statistical learning models to analyse analysing high-dimensional data sets (large number of variables in relation to sample size) in order to develop prediction models. Prediction modelling in psychiatry provides many methodological challenges including unbalanced groups, population substructure, multi-centre trials, missing data, multicollinearity or validating predictive models. Furthermore, trials databases can contain different measures of the same underlying construct and therefore calibration methods need to be developed before the identification of predictors can proceed. Our aim is to develop prediction models with good prediction accuracy and the ability to understand the underlying process by which data was generated.