Show/hide main menu

Courses offered

Prediction Modelling (Winter School)

Key Details

Date: 10th December – 14th December 2018
Time: 09:30-17:30
Venue: Computer Room A & B, Main Buliding, Institute of Psychiatry, Psychology & Neurosciences (IoPPN). View Map.


Course Aim

This 5 day course provides a comprehensive introduction to the fundamentals of clinical prediction modelling using modern statistical modelling techniques for health research. It will cover all steps of developing and accessing a prediction model. Computer based teaching introduces students the theory and practical implementation of cutting-edge predictive statistical and machine learning modelling techniques using the R statistical software.

Course Overview

Clinical prediction research develops models that try to predict the chances of a clinical outcome (such as death, diagnosis, treatment success or other future outcomes) based on characteristics related to the patient. Such models can be used to help clinician communicate the chances of clinical outcomes to their patients and to improve their management. It is therefore of crucial importance that such models are developed and tested appropriately. This 5 day course is aimed to PhD students and researchers in health research and will provide an introduction to key components of prognosis and stratified medicine research using cutting edge statistical and machine learning modelling techniques.

The course covers all major steps of developing and accessing a clinical prediction model, including study design and data preparation, the problem of over-fitting in regression models, how to overcome over-fitting using penalized regression and cross-validation methods, how to deal with missing data, feature variable selection, performance assessment and clinical usefulness of a model. An introduction to other machine learning techniques for prediction modelling, such as random forests and support vector machines, will be provided.Each day a short presentation of an application in prediction modelling will be presented. Teaching will be through lecturers and practical computer lab session interspersed with short presentations of prediction modelling researchers on current work. Practical sessions will involve the analyses and interpretation of practice datasets using the software R. Syntax of all procedures will be provided and explained but some familiarity with a syntax-based software (R, STATA, SAS) is advised. A short 1.5 h introduction to R will be provided at the beginning of the course


This workshop will assume that participants have a good knowledge of regression analyses (as can be obtained from the BHI Introduction to Statistical Modelling Course in January) and some experience with R  or any other syntax based statistical software, such as STATA (An introduction to R can be obtained from the BHI Introduction to Programming course running in October or the Intro to R course running in February 2019). Participants will need to bring their own laptop computer with R installed ( We recommend to further install RStudio, a very handy user interface for R (free download from

Learning Outcomes

Subject-specific: Knowledge, Understanding and Skills

At the end of the course the students should be able to demonstrate subject-specific knowledge, understanding and skills and have the ability to:

  • Have a good understanding of core clinical prediction concepts, such as prognosis, prognostic factors, prognostic models, and stratified medicine and will be able to apply this understanding to the design, conduct, and interpretation of clinical prediction modelling research studies;
  • Be able to describe how modern statistical concepts, regression and machine learning methods can be applied in medical prediction problems;
  • Be familiar with the principles that play a role in internal validation such as over-fitting, optimism and shrinkage and understand key components of internal validation methods such as cross-validation or bootstrapping;
  • Be able to develop simple prediction models, assess their quality and validate them using R software;
  • Be able to critically assess the general applicability of a developed model to predict future outcomes;
  • Be equipped with a range of statistical and machine learning skills, including problem -solving, project work and presentation, which will enable students to take prominent roles in a wide spectrum of employment and research. 

General: Knowledge, Understanding and Skills.

On successful completion of this course the student should be able to:

  • to show initiative and the ability to work autonomously and independently with minimal guidance from others;
  • to effectively communicate and critically assess own work in discussion groups;
  • to successfully work in a team during computer group lab sessions;
  • to show confidence in the use of programming software to implement prediction models.
Recommended Literatures

A good summary of building clinical prediction model is the paper "Towards better clinical prediction models: seven steps for development and an ABCD for validation" by E.W. Steyerberg and V. Vergouwe (2014) Eur Heart J. 35(29):1925-31. doi: 10.1093/eurheartj/ehu207. We will explain most concepts in details in the course.

An excellent introduction is provided by E.W. Steyerberg's 2009 book "Clinical Prediction models: A practical approach to development, validation and updating". New York: Springer. Another very useful textbook is Max Kuhn's 2013 book "Applied prediction modelling". New York: Springer. Both books present examples using R.

Tentative Time Table
Monday, 10th December
 Time Session TitleSpeaker
9:00-9:30 Registration  
9:30-11:00 1. Introduction to R (1.5 h practical)  
11:00-11:15 Coffee Break  
11:15-12:45 2. Welcome (15 min)
Introduction to Prediction modelling (1:15 h lecture)
12:45-13:45 Lunch  
13:45-15:15  3. Data collection and Pre-Processing (45 min lecture + 45 min practical)  
15:15-15:30  Coffee Break  
15:30-16:30 4. GLM (I) (30 min lecture + 30 min practical)  
16:30-17:00 Talk 1: The role of neuro-imaging in the prediction of recurrence in major depressive disorder Dr. Roland Zahn
Tuesday, 11th December
 Time Session Title Speaker
9:30-11:00 5. GLM (II) (45 min lecture + 45 min practical)  
11:00-11:15 Coffee Break  
11:15-12:45 6. GLM III: Overfitting (45 min lecture + 45 min practical)  
12:45-13:45 Lunch  
13:45-15:15  7. Regularized Regression I: Ridge and Lasso (45 min lecture + 45 min practical)  
15:15-15:30  Coffee Break  
15:30-16:30 8.Regularized Regression II: Elastic net (30 min lecture + 30 min practical)  
16:30-17:00 Talk 2: Predicting risk of psychoses Dr. Paolo Fusar-Poli, IoPPN
Wednesday, 12th December
 Time Session Title Speaker
9:30-11:00 9. Model assessment I Validation of Prediction models (45 min lecture + 45 min practical)  
11:00-11:15 Coffee Break  
11:15-12:45 10. Regularized Regression III:
Logistic regression (45 min lecture + 45 min practical)
12:45-13:45 Lunch  
13:45-15:15  11. 1 h: Model assessment II
Discrimination  (45 min lecture + 45 min practical)
15:15-15:30  Coffee Break  
15:30-16:30 14. Model assessment Calibration and Clinical usefulness (45 min lecture + 45 min practical)  
16:30-17:00 Talk 3: Text mining and NLP Dr. Angus Roberts, IoPPN
Thursday, 13th December
 Time Session Title Speaker
9:30-11:00 30 min Missing data I
Mechanism (30 min lecture)
Missing data II Prediction modelling (30 min lecture + 30 min practical)
11:00-11:15 Coffee Break  

15. Random Forest I
Classification (45 min lecture + 45 min practical)

12:45-13:45 Lunch  
13:45-15:15  16.Random Forest I
Regression (45 min lecture + 45 min practical)
15:15-15:30  Coffee Break  
15:30-16:30 17. Random Forest III
Feature variable selection (30 min lecture + 30 min practical)
16:30-17:00 Talk 4 Sophie Smart, IoPPN

Friday, 14th December
 Time Session Title Speaker
9:30-11:00 19. Support Vector Machines (Intro) (45 min lecture + 45 min practical)  
11:00-11:15 Coffee Break  
11:15-12:45 45 min: More about SVMs
45 min: Seminar/Lecture: Problems of prediction modelling in Psychiatry 
12:45-13:45 Lunch  

Discussion groups:
Open problems and Q&A with all lecturers (1.5 h seminar)

15:15-15:30  Coffee Break  
15:30-16:30 Talk 5: Future directions (1 h lecture) Dr. Daniel Stamate, Goldsmith University
16:30-17:00 21. Wrapping-up
Questions time and final remarks 

Cost and Booking

  • External Early bird: £855 (till 18/10/18, price thereafter £950)
  • KCL Staff Early bird: £641.25 (till 18/10/18, price thereafter £712.5)
  • KCL Student Early bird: £427.5 (till 18/10/18, price thereafter £475)
  • Other student Early bird: ££641.25 (till 18/10/18, price thereafter £712.5)

That is, 50% discount to King's College London PhD students, 25% discount to other students and staff at King's College London and King's Health Partners.

Booking / Application

Booking for this course has now closed.

To apply please email with the following details:

Subject: Application for Prediction Modelling 2018


Email Address:

Contact Phone Number:

  1. Are you affiliated with KCL and/or King's Health Partners?
    1. If Yes, indicate how you are affiliated with KCL and/or King's Health Partners
    2. Indicate your education/employee status: KCL PhD, KCL student, KCL staff, King's Health Partners affiliate, External Student or External
  2. In 100 words, state why you wish to enrol/participate in this course:
  3. In 100 words, state which skills you hope to acquire:

Once your application has been approved, you will be sent a link to payment and a discount code if one is to be applied.

Course Team

Professor Daniel Stahl (Academic Lead)

Daniel-StahlDaniel is a Professor of Medical Statistics and Statistical Learning and lead of the Precision Medicine and Statistical Learning Group.

I started my academic career as a behavioural biologist at the German Primate Center in Germany. During my PhD, I became aware of the importance of statistics and data science. I attended an MSc in Biostatistics and worked since then as a statistician in academic research institutions in Germany, Scotland and - since 2006 – at King’s College in London. I am now lead of the "Precision Medicine and Statistical Learning Group". A primary focus of the group is to develop tools to aid clinical decision using predictors which can be easily, reliably and cost-effectively collected from mental health service users.

My interest is applying statistical and machine learning methods to identify predictors, mediators, and moderators of treatment success and using model-based cluster analysis methods to identify subgroups among psychiatric patients. My methodological research concerns the correct treatment of missing data in machine learning procedures and the assessment of subgrouping in prediction modelling. 

As a Lead Trial Statistician, I have been responsible for overseeing the statistical aspects of a number of clinical trials within the IoPPN. I am further interested in model selection problem, improving the low reproducibility of medical studies and- a blast from my past - in the evolution of social system in primates.

Research profile

See Daniel's research profile here.

Dr Cedric Ginestet

CedricGinestetCedric has received a PhD in Biostatistics from Imperial College London.

Cedric has been affiliated to the Neuroimaging Department in King's College London, as well as to the Mathematics and Statistics Department in Boston University, before joining the Department of Biostatistics and Health Informatics in the Institute of Psychiatry and Neuroscience in 2014.

Research interests

Causal inference; network analysis; object data analysis; statistical learning.

Research profile

See Cedric's research profile here.

Dr Raquel Iniesta

RaquelIniestaRaquel is a BRC Lecturer in statistical learning and precision medicine. Her main research is focused on identifying clinical and genetic predictors of risk to complex disorders and response to treatment.

She has been doing research in big data analysis and personalised medicine since 2003. After getting graduates in mathematics and statistics by the Autònoma University of Barcelona, she got a PhD in Biomedical research by the Catalan Institute of Oncology in 2010. Her activity since then has also included consultancy and teaching.

Research interests

Computational statistics & machine learning; High-dimensional data modelling; Bioinformatics; Genetics and Pharmacogenetics of complex diseases (Cancer, Schizophrenia, Major Depression, Hypertension).

She has also designed a website for Statistical Learning & Prediction Modelling Research Group.

Research profile

See Raquel's research profile here.

Dr Daniel Stamate, Goldsmith University

DanielStamateI am a Machine Learning scientist, Data Science team leader, Director of Data Science MSc Programme, and industry AI – Machine Learning expert speaker and consultant. I established and lead the Data Science & Soft Computing Lab which has collaborations with various research groups at King’s College London, University of Manchester, Imperial College London, Maastricht University, and National Research Tomsk State University, and with companies in the City of London such as Santander Bank, Mizuho Investment Bank, etc.

At Goldsmiths, I initiated, designed and run the MSc in Data Science - which inspired and was mostly replicated into similar online programme to come at University of London. I have a background in Computer Science and Mathematics, holding an MSc degree in Computer Science & Mathematics from University of Iasi - Faculty of Mathematics, and a PhD in Computer Science from University of Paris-Sud - LRI Computer Science Laboratory.

Research profile

See Daniel's research profile here.


Dr Mizanur Khondoker, University of East Anglia

MizanurKhondokerMizanur is a Senior Lecturer in Medical Statistics, Norwich Medical School, University of East Anglia (UEA).

Research profile

See Mizanur's research profile here.


Mr Dominic Stringer

DominicStringerDominic is a statistician working primarily on the set up, conduct and analysis of clinical trials supported by the King’s Clinical Trials Unit.

He works on trials across several health domains including Psychosis and Renal failure. He has a Bachelor’s degree in Mathematics from the University of Bath and a Master’s degree in Medical Statistics from the London School of Hygiene and Tropical Medicine.

He also has a background in data management in the clinical trials setting. Dominic's other research interests include predictive modelling using statistical learning methods.

Research profile

See Dominic's research profile here.

Dr Deborah Agbedjro

Deborah-AgbedjroDeborah's project aims to develop a personalized medicine prediction model for people with schizophrenia treated with Cognitive Remediation Therapy (CRT) by combining statistical learning methods, missing data imputation techniques and model validation procedures.

The model is trained and validated on several randomised controlled trials individual participant data on the use of CRT.

Research profile

See Deborah's research profile here.


Your place will not be confirmed until payment has been made. Failure cancel without sufficient notice will forfeit your course fee and access to future courses. If you would like to pay by internal transfer, please contact

Sitemap Site help Terms and conditions  Privacy policy  Accessibility  Modern slavery statement  Contact us

© 2019 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454