Accessibility statement

Practical Data Science with R - MAT00058H

« Back to module search

  • Department: Mathematics
  • Module co-ordinator: Prof. Julie Wilson
  • Credit value: 10 credits
  • Credit level: H
  • Academic year of delivery: 2021-22
    • See module specification for other years: 2022-23

Related modules

Co-requisite modules

  • None

Prohibited combinations

  • None

Module will run

Occurrence Teaching period
A Spring Term 2021-22

Module aims

This module builds on the concepts introduced in Statistical Pattern Recognition and provides students with the ability to implement statistical and machine learning methods. The aim is to allow students to perform statistical data analyses of real data, from the thorough formulation of the question to be investigated up to the presentation of the analysis' results.

It is a practical and project-oriented module with an overview of a range of methods, followed by their use in the statistical software environment R.

Students will complete three assessed exercises during the term and carry out a data analysis project in an open assessment over 2 weeks and summarize the analysis in a written report.

Module learning outcomes

By completion of this module, students should be able to:

  • Perform independent statistical data analysis on a real data set with a particular research question
  • Use various statistical tools to analyse real data sets in R
  • Select appropriate machine learning and statistical approaches for specific applications
  • Understand the basis of the statistical models and tools discussed
  • Write up the results of statistical data analysis, employing tables and graphs as appropriate

General academic and graduate skills to be obtained:

  • Problem solving skills
  • Computational skills
  • Presentation skills

Module content

  • Data exploration and visualisation, including interpretation of Principal Component Analysis (PCA).

  • Implementation of machine learning techniques covered in Statistical Pattern Recognition, such as Linear Discriminant Analysis (LDA), Decision trees, Neural networks, Learning Vector Quantisation (LVQ).

  • Introduction and implementation of further supervised multivariate methods, such as Support Vector Machines (SVMs) and Partial Least Squares Regression (PLSR).

  • Statistical report writing.

  • Practical statistical analysis of real data sets, reporting and presentation of the obtained results.

Assessment

Task Length % of module mark
Essay/coursework
Coursework
N/A 30
Essay/coursework
Data Analysis Project
N/A 70

Special assessment rules

None

Additional assessment information

Coursework: Three exercises which are counted towards assessment. Each requires 6 hours of work.

Data Analysis Project: Open assessment, requiring 16 hours of work

(Made available at the end of the Spring term, to be handed in at the end of week 1 of the Summer term.)

Reassessment

Task Length % of module mark
Essay/coursework
Reassessment Data Analysis Project
N/A 100

Module feedback

Current Department policy on feedback is available in the undergraduate student handbook. Coursework and examinations will be marked and returned in accordance with this policy.

Indicative reading

James G, Witten D, Hastie T and Tibshirani R (2013). An Introduction to Statistical Learning with Applications in R. Springer

Everitt B and Hothorn T (2011). An Introduction to Applied Multivariate Analysis with R. Springer



The information on this page is indicative of the module that is currently on offer. The University is constantly exploring ways to enhance and improve its degree programmes and therefore reserves the right to make variations to the content and method of delivery of modules, and to discontinue modules, if such action is reasonably considered to be necessary by the University. Where appropriate, the University will notify and consult with affected students in advance about any changes that are required in line with the University's policy on the Approval of Modifications to Existing Taught Programmes of Study.