Skip to main content
Technology & Science

Improving exam questions to decrease the impact of Large Language Models (eg chatGPT)


The key objective of the proposed project is to explore ways in which exam questions can be better designed in order to decrease the impact of Large Language Models (eg ChatGPT, Claude-2 and Google Bard). 

The expected outcome of this project will be the production of a framework for a workflow that could be used when writing exam questions that could allow students to demonstrate their deep learning, originality and critical reading above and beyond what can be normally achieved just by using a generative AI tool.  

Indeed, generative AI is just the last one in a sequence of improvements in exploring the scientific literature, starting with Google searches, PubMed searches, Wikipedia articles etc. 

The problem to be addressed is that essay-type written assignments that are widely used for assessment may be impacted by the misuse of generative AI tools. Conversely, most other university assessments (such as in-person invigilated written exams, OSCE assessments and oral presentations) appear to be less affected. Research progress reports are also relatively safe because they entail supervisors reading initial student-proposed abstracts, then early/advanced drafts of the assignments. If the style and/or quality of those drafts unexplainably changes between drafts, supervisors would be in a good position to explore the potential misuse of generative AI tools in their subsequent meetings with the students. 

 The persons involved in this project will be Dr Victor Turcanu (staff) and Amelia Johnston (year 3 MBBS student) who is currently doing a systematic review on the use of Large Language Models (LLMs) for medical school assessments as part of her scholarly project module.  


The following steps will be followed for the proposed project:  

  1. Ask generative AIs to answer past exam questions from the Introduction to Clinical Research MBBS year 2 module.  
  2. Compare the chatGPT answers, Claude-2 answers, (possibly also Bard answers, Llama 2 answers etc) with real assignments written by past cohorts of students and analyse the differences – either as a narrative analysis or using nVIVO or other methodologies. Triangulate the commonalities and the differences and write new exam questions that go beyond the answers provided by ChatGPT. 
  3. Design a framework workflow, in which starting with an initial exam question, successive steps of refinement and rewriting exploit the strengths of LLM answers whilst allowing students to demonstrate their originality and critical reading of the scientific literature. 

We will find out whether and how the work is meeting those objectives by using this workflow to design future exam questions for the Introduction to Clinical Research MBBS year 2 module led by Dr Victor Turcanu. Should this workflow prove to be successful, it will be presented to the wider faculty, for example at an annual teaching excellence conference, so that other modules that use similar assessment strategies can benefit from these findings. 


Project status: Ongoing