Accessibility statement

Programming for data science - CHE00044M

« Back to module search

  • Department: Chemistry
  • Module co-ordinator: Prof. Kevin Cowtan
  • Credit value: 20 credits
  • Credit level: M
  • Academic year of delivery: 2024-25
    • See module specification for other years: 2023-24

Module summary

Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. You will learn to use the Python computer language and the Pandas extension. You will then apply this knowledge to create computer programs to read, process, interpret and present complex data in different ways. You will learn how to work collaboratively on computation problems, and how to record and report what you have done.

Module will run

Occurrence Teaching period
A Semester 1 2024-25

Module aims

Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. We will learn to perform data analysis using the Python language and Pandas extension.

The teaching of computer programming is often done in a way which maintains it as an elite activity. A key focus of this course is to teach programming in an inclusive way which makes it accessible to groups who have traditionally been marginalised in the computational sciences. We achieve this by closely linking programming concepts with familiar problems from different fields at every stage, and by delaying the introduction of more complex concepts until they are obviously required to address real world challenges.

Module learning outcomes

Students will be able to:

  • Implement python code to read, manipulate and analyse datasets

  • Apply features of programming languages including data structures, loops, conditions, functions

  • Apply software engineering principles including documentation, testing and collaboration tools

  • Develop python notebooks for data analysis and data management

  • Create and organise GIT version control repositories

  • Use shell scripting and high performance computing

  • Evaluate different programming languages for a given data science application

Module content

Module Content Detail

  • Python programming for data science problems

  • Collaborative software engineering

  • Managing complex data

  • Data visualisation

  • Shell scripting and high performance computing

  • How to learn other programming languages

Assessment

Task Length % of module mark
Essay/coursework
Freeform programming exercise
N/A 50
Essay/coursework
Individual programming exercise
N/A 50

Special assessment rules

None

Additional assessment information

Structured programming exercise (50%):
Computer program
50%

Freeform programming exercise (50%) (35% for code and 15% for viva):
Computer program, documentation and oral presentation
50%

Reassessment

Task Length % of module mark
Essay/coursework
Individual programming exercise
N/A 50
Essay/coursework
Research, programming and documentation exercise
N/A 50

Module feedback

Feedback will be provided through workshops, online exercises and a formative assessment. Feedback on summative work will be provided within 25 working days of the assessment.

Indicative reading

  • Introduction to data science : a Python approach to concepts, techniques and applications
    Laura Igual, Santi Segui´. Springer 2017

  • Python for data analysis : data wrangling with Pandas, NumPy, and IPython
    Wes McKinney. O'Reilly 2017

  • Pro Git
    Scott Chacon, Ben Straub. Apress 2014

  • Python and Matplotlib essentials for scientists and engineers
    Matt A. Wood. Claypool Publishers 2015

  • Visualization for the Physical Sciences
    Lipsa et al. Computer graphics forum, 2012, Vol.31 (8), p.2317-2347

  • Introduction to scientific visualization
    Helen Wright. Springer 2007

  • Data Modeling Essentials
    Graeme Simsion, Graham Witt. Morgan Kaufmann 2004

  • Database design
    Adrienne Watt, Nelson Eng. BC Open Textbook Project 2014



The information on this page is indicative of the module that is currently on offer. The University is constantly exploring ways to enhance and improve its degree programmes and therefore reserves the right to make variations to the content and method of delivery of modules, and to discontinue modules, if such action is reasonably considered to be necessary by the University. Where appropriate, the University will notify and consult with affected students in advance about any changes that are required in line with the University's policy on the Approval of Modifications to Existing Taught Programmes of Study.