Lecturer: Dr Elizabeth Sklar and Dr Isabel Sassoon (office hours)
Credit level: 7
Credit value: 15
MSc Data Science
Learning aims & outcomes
The aim of this module is for students to gain a comprehensive understanding of the field of "Data Mining" and its application to real-world problems and data sets. Data mining methodologies include classification, clustering and modelling of data, as well as testing and prediction based on data. Students will learn how to make effective use of these methodologies, determining which algorithms are appropriate for given data sets and problems, pre-processing raw data for analysis and implementing algorithms using a high-level programming language (such as C/C++ or Python). Upon successful completion of the module, students will be prepared to perform data mining tasks using real-world data sets.
On successful completion of this module, students will be able to:
- Analyse and formulate the objectives of a data mining problem.
- Select appropriate data mining algorithm(s) to address given research questions, based on an understanding of the scope of the data set(s) to be analysed, the computational requirements and limitations of the data mining algorithms, and expectations about the performance of the algorithms with respect to the question(s) being asked.
- Assess the available data critically and apply appropriate data preparation techniques to enable the application of data mining and machine learning techniques.
- Implement at least one of each of the primary data mining tasks (classification, clustering, modelling, testing, and prediction), using a high-level programming language (e.g., C/C++ or Python).
- Assess the success (or failure) of an algorithm to answer a given research question.
- Draw appropriate and justifiable conclusions from the results of applying a data mining
Data Mining is a field that involves application of a range of statistical and analytical techniques to sets of data in order to understand the content of the data and draw meaning from that content. This module covers the fundamental techniques comprised by the field, focussing on analysis of numeric and text data. The topics covered include rule-based and linear algorithms, testing and prediction, classification and clustering, pattern recognition, scoring and optimisation. Aspects of machine learning that are relevant to data mining are also discussed. Students will work with real-world, publicly available data sets to apply techniques in laboratory exercises, using a standard programming language (e.g.,C/C++ or Python).
Weekly teaching arrangements
Lecture: 2 hours
Tutorial: 1 hour
Practical: 2 hours
Module Pass Mark: 50%
Note: students must achieve an overall module mark of at least 50%; if a mark lower than 50% is obtained, students must only resit the assessed components they have failed.
e-Learning: 7CCSMDM1 on KEATS
01 December 2017