Practical Data Science with R - MAT00058H
- Department: Mathematics
- Credit value: 10 credits
- Credit level: H
- Academic year of delivery: 2022-23
Related modules
Module will run
Occurrence | Teaching period |
---|---|
A | Spring Term 2022-23 |
Module aims
This module builds on the concepts introduced in Statistical Pattern Recognition and provides students with the ability to implement statistical and machine learning methods. The aim is to allow students to perform statistical data analyses of real data, from the thorough formulation of the question to be investigated up to the presentation of the analysis' results.
It is a practical and project-oriented module with an overview of a range of methods, followed by their use in the statistical software environment R.
Students will complete three assessed exercises during the term and carry out a data analysis project in an open assessment over 2 weeks and summarize the analysis in a written report.
Module learning outcomes
By completion of this module, students should be able to:
- Perform independent statistical data analysis on a real data set with a particular research question
- Use various statistical tools to analyse real data sets in R
- Select appropriate machine learning and statistical approaches for specific applications
- Understand the basis of the statistical models and tools discussed
- Write up the results of statistical data analysis, employing tables and graphs as appropriate
General academic and graduate skills to be obtained:
- Problem solving skills
- Computational skills
- Presentation skills
Module content
-
Data exploration and visualisation, including interpretation of Principal Component Analysis (PCA).
-
Implementation of machine learning techniques covered in Statistical Pattern Recognition, such as Linear Discriminant Analysis (LDA), Decision trees, Neural networks, Learning Vector Quantisation (LVQ).
-
Introduction and implementation of further supervised multivariate methods, such as Support Vector Machines (SVMs) and Partial Least Squares Regression (PLSR).
-
Statistical report writing.
-
Practical statistical analysis of real data sets, reporting and presentation of the obtained results.
Indicative assessment
Task | % of module mark |
---|---|
Essay/coursework | 10.0 |
Essay/coursework | 10.0 |
Essay/coursework | 10.0 |
Essay/coursework | 70.0 |
Special assessment rules
None
Additional assessment information
Coursework: Three exercises which are counted towards assessment. Each requires 6 hours of work.
Data Analysis Project: Open assessment, requiring 16 hours of work
(Made available at the end of the Spring term, to be handed in at the end of week 1 of the Summer term.)
Indicative reassessment
Task | % of module mark |
---|---|
Essay/coursework | 100.0 |
Module feedback
Current Department policy on feedback is available in the undergraduate student handbook. Coursework and examinations will be marked and returned in accordance with this policy.
Indicative reading
James G, Witten D, Hastie T and Tibshirani R (2013). An Introduction to Statistical Learning with Applications in R. Springer
Everitt B and Hothorn T (2011). An Introduction to Applied Multivariate Analysis with R. Springer