Combining classical and machine-learning methods in Survival Analysis to boost predictive performance and preserve interpretability

Please note: this event has passed

Abstract: Survival analysis deals with the longitudinal data and estimates both the distribution of time-to-event in a population over the observation time, and how the time-to-event depends on the risk factors. Various statistical models were developed for such analysis, including Cox proportional hazards (Cox PH) model, accelerated failure time model, parametric models, and others. With the development of machine learning, the toolset for survival analysis has further expanded with models such as survival random forest, survival xgboost and others. Although the new models are effective in dealing with large and complex data, they can be prone to overfitting and hard to interpret.

Here we explored two different approaches combining the classical Cox PH model with tree-based algorithms, similar to the ideas presented in (1–4). Our aim was not limited to boosting the predictive performance of the Cox regression with ML techniques, but also an attempt to 1) preserve interpretability of the results, 2) quantify contribution of linear and non-linear dependencies and 3) get insight into non-linear relationships. The first approach included the underlying Cox PH model, ensembled with the random forest in which survival probabilities obtained by the Cox PH model were added as a new risk factor along with the original factors. The second approach used survival tree algorithm to cluster the data, then separate Cox PH models were trained for each of the clusters.

In this work-in-progress we found that classical models may outperform combined methods if applied to the data with predominantly linear relationships. However, the combination methods were effective in predicting survival outcomes with strong non-linear dependencies, and our second combination method was able to give some insight into the identified non-linearity.

Mini-bio: Diana Shamsutdinova is a PhD student at Biostatistics and Health Informatics department at IoPPN, KCL and focuses on prediction modelling and survival analysis in health research.

Diana Shamsutdinova graduated with Distinction from Moscow State University, with major in mathematics and applied mathematics, after which she worked for more than a decade in the financial industry. After completing a part-time MSc in Neuroscience and Psychology of Mental Health at IoPPN while working in finance, she pursued her interests in the new field. Presently she develops her PhD research at IoPPN under the supervision of Professor Daniel Stahl and Dr Angus Robert, and her main project is on predicting diabetes onset among people with severe mental illness.

Event details

22 September 2021 15:00 to 16:00

King's College London

Combining classical and machine-learning methods in Survival Analysis to boost predictive performance and preserve interpretability

Event details

Related departments