Accessibility statement

Data Science in Chemistry - CHE00047M

« Back to module search

  • Department: Chemistry
  • Module co-ordinator: Dr. Jon Agirre
  • Credit value: 20 credits
  • Credit level: M
  • Academic year of delivery: 2024-25
    • See module specification for other years: 2023-24

Module summary

This module will provide an overview of the application of data science and machine learning in chemistry and beyond by looking at four specific problem areas: the analysis of atomic structures, the simulation of molecular dynamics, the handling of atmospheric data, and the analysis of scientific image data. You will learn about different types of chemical data and the software methods used to validate, analyse and extract conclusions from them. You will apply this knowledge to some practical problems using real data and industry-standard tools. The diversity of data types, sources and approaches will give you enough experience to approach new problem domains with confidence.

Module will run

Occurrence Teaching period
A Semester 2 2024-25

Module aims

While data analysis methodology remains common to all disciplines, different methods are particularly suited to help with certain kinds and volume of available data. This module aims to provide relevant experience in the use of data analysis and machine learning techniques in four distinct areas of chemistry: atomic structure ('Molecular Structure Data'), atomistic simulations ('Machine Learning in Computational Chemistry'), atmospheric chemistry ('Atmospheric Data') and molecular property prediction and design ('Applications of Neural Networks in Chemistry').

Module learning outcomes

Students will be able to:

  • Analyse and evaluate large datasets from different sources.

  • Develop suitable validation criteria for different data types.

  • Create software that extracts chemical knowledge from computational representations of molecules.

  • Appreciate applications of supervised and unsupervised machine learning models in computational chemistry.

  • Implement feedforward, graph (GNNs), and recurrent neural networks (RNNs) for molecular property prediction and generative molecular design

Module content

Content separated by sub-module:

  • Macromolecular Structure: accessing and retrieving data from a molecular structure database; performing data validation; gathering statistical information about bond lengths, angles, and torsions.

  • Atmospheric Data: accessing and working with atmospheric data e.g. air pollution data; counterfactual analysis used in for example the analysis of interventions to improve air quality; parameterizations to support atmospheric chemistry modelling.

  • Machine Learning in Computational Chemistry: bypassing expensive computational chemical calculations using machine learning (ML); representing structures of molecules in computers; using these representations for unsupervised classification and clustering of molecular structures; using neural networks for rapid prediction of potential energies; and using kernel regression to predict molecular properties.

  • Applications of Neural Networks in Chemistry: working with molecules in computers [atomic simulation environment (ASE); RDKit]; molecular representations; molecular property prediction using feedforward (handcrafted features) and graph (learned features) neural networks (GNNs); generative molecular design using recurrent neural networks (RNNs).

Assessment

Task Length % of module mark
Essay/coursework
Data analysis : Machine learning & neural networks for chemistry applications - code + 2000 word approx.
N/A 50
Essay/coursework
Report, presentation : Data analysis & presentationfor chemistry problem domains - code + 2000 word approx.
N/A 50

Special assessment rules

None

Additional assessment information

Assessment 1

Essay/coursework (project report, data presentation including code): Data analysis and presentation for chemistry problem domains. Students to submit code in a compressed file, use up to 4 sides of an A4 to describe the results of their data analysis of one chemistry problem domain.

Assessment 2

Essay/coursework (project report, data presentation including code): Machine learning and neural networks for chemistry applications. Students to submit code in a compressed file, use up to 4 sides of an A4 to describe the results of the application of machine learning and neural networks to one chemistry problem domain.

Reassessment

Task Length % of module mark
Essay/coursework
Data analysis : Machine learning & neural networks for chemistry applications - code + 2000 word approx.
N/A 50
Essay/coursework
Report, presentation : Data analysis & presentationfor chemistry problem domains - code + 2000 word approx.
N/A 50

Module feedback

Feedback will be provided through workshops and online exercises. Feedback on summative work will be provided within 25 working days of the assessment.

Indicative reading

  • Introduction to Data Science : A Python Approach to Concepts, Techniques, and Applications
    Laura Igual, Santi Segui´. Springer 2017

  • Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython
    Wes McKinney. O'Reilly 2017

  • Pro Git
    Scott Chacon, Ben Straub. Apress 2014

  • Python and Matplotlib Essentials for Scientists and Engineers
    Matt A. Wood. Claypool Publishers 2015

  • Visualization for the Physical Sciences
    Lipsa et al. Computer Graphics Forum, 2012, Vol.31 (8), p.2317-2347

  • Introduction to Scientific Visualization
    Helen Wright. Springer 2007

  • Data Modeling Essentials
    Graeme Simsion, Graham Witt. Morgan Kaufmann 2004

  • Database Design - Adrienne Watt, Nelson Eng. BC Open Textbook Project 2014



The information on this page is indicative of the module that is currently on offer. The University is constantly exploring ways to enhance and improve its degree programmes and therefore reserves the right to make variations to the content and method of delivery of modules, and to discontinue modules, if such action is reasonably considered to be necessary by the University. Where appropriate, the University will notify and consult with affected students in advance about any changes that are required in line with the University's policy on the Approval of Modifications to Existing Taught Programmes of Study.