Show/hide main menu



Data Mining

Lecturer: Dr Elizabeth Sklar and Dr Isabel Sassoon (office hours)

Semester: 2

Credit level: 7

Credit value: 15


MSc Data Science

Learning aims & outcomes

The aim of this module is for students to gain a comprehensive understanding of the field of "Data Mining" and its application to real-world problems and data sets. Data mining methodologies include classification, clustering and modelling of data, as well as testing and prediction based on data. Students will learn how to make effective use of these methodologies, determining which algorithms are appropriate for given data sets and problems, pre-processing raw data for analysis and implementing algorithms using a high-level programming language (such as C/C++ or Python). Upon successful completion of the module, students will be prepared to perform data mining tasks using real-world data sets.

On successful completion of this module, students will be able to:

  • Analyse and formulate the objectives of a data mining problem.
  • Select appropriate data mining algorithm(s) to address given research   questions, based on an understanding of the scope of the data set(s) to be analysed, the computational requirements and limitations of the data mining algorithms, and expectations about the performance of the algorithms with respect to the question(s) being asked.
  • Assess the available data critically and apply appropriate data preparation techniques to enable the application of data mining and machine learning techniques.
  • Implement at least one of each of the primary data mining tasks (classification, clustering, modelling, testing, and prediction), using a high-level programming language (e.g., C/C++ or Python).
  • Assess the success (or failure) of an algorithm to answer a given research question.
  • Draw appropriate and justifiable conclusions from the results of applying a data mining


Data Mining is a field that involves application of a range of statistical and analytical techniques to sets of data in order to understand the content of the data and draw meaning from that content. This module covers the fundamental techniques comprised by the field, focussing on analysis of numeric and text data. The topics covered include rule-based and linear algorithms, testing and prediction, classification and clustering, pattern recognition, scoring and optimisation. Aspects of machine learning that are relevant to data mining are also discussed. Students will work with real-world, publicly available data sets to apply techniques in laboratory exercises, using a standard programming language (e.g.,C/C++ or Python).

Weekly teaching arrangements

Lecture: 2 hours

Tutorial: 1 hour

Practical: 2 hours



Summative assessment

Details of the module's summative assessment/s
 Type Weighting Marking model
Written examination (2 hours), May 80% Model 2-Double Marking
Coursework-see KEATS for details 20% Model 5-Single Marking

Formative assessment


Module Pass Mark: 50%

Note: students must achieve an overall module mark of at least 50%; if a mark lower than 50% is obtained, students must only resit the assessed components they have failed.

e-Learning: 7CCSMDM1 on KEATS 

Suggested reading/resources

01 December 2017


Sitemap Site help Terms and conditions  Privacy policy  Accessibility  Modern slavery statement  Contact us

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454