Software to provide simulation-based minimum sample size calculations for prediction modelling
Background NHS services use predictive tools to help decide how to treat patients and run hospitals. A predictive tool makes predictions about the future based on a statistical model (number-based calculator). It uses individual patient information like age or blood pressure.
Developing these models needs lots of data (information). Models built with small amounts of data are inaccurate. They might make incorrect predictions or recommend inappropriate treatments. They cannot represent diverse groups and may be biased against underrepresented communities.
While tools exist to help researchers decide how much data they need, they only work for simple models. Many newer prediction models use complicated methods like “machine learning”. No tools exist to calculate how much data these complicated methods need.
Patient and public involvement
Feedback from patients and service users has shaped our proposal. We will continue to learn from patients throughout the study through discussions with the advisory group and wider engagement with charities and patient groups.
Aims
We will create a free tool for researchers that calculates the amount of data required for any model.
We will:
- Meet patients to gather their views on predictive tools, highlight the risks of using small amounts of information for model fairness, and learn how best to communicate our work.
- Create free software to calculate data required by complicated models.
- Provide training to encourage researchers to use the software we create.
- Our work will help prevent models developed with insufficient information. It will benefit patients by enabling more accurate predictions and safer, fairer predictive tools.
Methods
Our project involves no new data collection.
- In part 1, we will form an advisory group of patients and researchers. The group will meet regularly to share updates and get feedback (e.g., how to describe our software in an accessible way). We have already created a simple prototype tool.
- In part 2, we will conduct a study to understand how well this simple version works.
- In part 3, we will create the software tool and freely share it online for anyone to use. Importantly, our tool will only calculate how much data is needed; it cannot make predictions.
- In part 4, we will test the software to ensure it works as expected and gives reliable answers.
- In part 5, we raise awareness among patients and the public and provide free researcher training to encourage uptake.
Full details on this project's methods can be found on the NIHR website.
Summary of Findings
With our patient advisors, we will co-create an accessible project website and share it via social media. The website will provide clear guidance on the data required for prediction modelling and serve as a reference point. We will engage with researchers, particularly in the machine learning community, via conferences, seminars, and publications.
Principal Investigators
Investigators
Affiliations
Funding
Funding Body: NIHR
Amount: £254,872.00
Period: January 2025 - July 2026