Accessibility statement

# Practical Data Science with R - MAT00058H

« Back to module search

• Department: Mathematics
• Module co-ordinator: Prof. Julie Wilson
• Credit value: 10 credits
• Credit level: H
• Academic year of delivery: 2022-23

• None

• None

## Module will run

Occurrence Teaching cycle
A Spring Term 2022-23

## Module aims

This module builds on the concepts introduced in Statistical Pattern Recognition and provides students with the ability to implement statistical and machine learning methods. The aim is to allow students to perform statistical data analyses of real data, from the thorough formulation of the question to be investigated up to the presentation of the analysis' results.

It is a practical and project-oriented module with an overview of a range of methods, followed by their use in the statistical software environment R.

Students will complete three assessed exercises during the term and carry out a data analysis project in an open assessment over 2 weeks and summarize the analysis in a written report.

## Module learning outcomes

By completion of this module, students should be able to:

• Perform independent statistical data analysis on a real data set with a particular research question
• Use various statistical tools to analyse real data sets in R
• Select appropriate machine learning and statistical approaches for specific applications
• Understand the basis of the statistical models and tools discussed
• Write up the results of statistical data analysis, employing tables and graphs as appropriate

General academic and graduate skills to be obtained:

• Problem solving skills
• Computational skills
• Presentation skills

## Module content

• Data exploration and visualisation, including interpretation of Principal Component Analysis (PCA).

• Implementation of machine learning techniques covered in Statistical Pattern Recognition, such as Linear Discriminant Analysis (LDA), Decision trees, Neural networks, Learning Vector Quantisation (LVQ).

• Introduction and implementation of further supervised multivariate methods, such as Support Vector Machines (SVMs) and Partial Least Squares Regression (PLSR).

• Statistical report writing.

• Practical statistical analysis of real data sets, reporting and presentation of the obtained results.

## Assessment

Task Length % of module mark
Essay/coursework
Coursework
N/A 30
Essay/coursework
Data Analysis Project
N/A 70

None

### Additional assessment information

Coursework: Three exercises which are counted towards assessment. Each requires 6 hours of work.

Data Analysis Project: Open assessment, requiring 16 hours of work

(Made available at the end of the Spring term, to be handed in at the end of week 1 of the Summer term.)

### Reassessment

Task Length % of module mark
Essay/coursework
Reassessment Data Analysis Project
N/A 100

## Module feedback

Current Department policy on feedback is available in the undergraduate student handbook. Coursework and examinations will be marked and returned in accordance with this policy.