Skip to content Accessibility statement

Programming and Machine Learning for Chemistry - CHE00040I

«Back to module search

  • Department: Chemistry
  • Credit value: 20 credits
  • Credit level: I
  • Academic year of delivery: 2026-27

Module summary

Modern chemists use digital tools to solve complex chemical problems, from simulating reaction kinetics to predicting molecular properties using real-world data, to designing synthetic reactions. This module equips students with essential computational skills by introducing Python programming and key Machine Learning (ML) algorithms, and illustrating how these transferable skills can be applied to a wide range of chemical problems.

Related modules

Prohibited combinations

Module will run

Occurrence Teaching period
A Semester 2 2026-27

Module aims

The primary aim of this module is to integrate computational methods with core chemical principles, allowing students to move beyond traditional analysis and tackle complex, real-world chemical challenges. The module illustrates how research in almost every area of chemistry is being accelerated by routine incorporation of both classic programming and machine learning workflows. The module begins by looking at how traditional programming methods can be used to solve the complex systems of equations used in chemical kinetics and quantum mechanics. It then moves on to explore how widely used machine learning methods can be applied to predict a range of chemical properties, from reaction yields to toxicity. Finally, the module will introduce chemistry-specific machine learning techniques, focussing on how to encode chemical structures, and how to build interpretable models to gain chemical insights from the models trained. Through a blend of lectures and hands-on workshops, students will master fundamental programming concepts using Python and gain proficiency in implementing key Machine Learning (ML) algorithms using libraries like Scikit-Learn, providing a highly transferable skill set in the evolving landscape of chemistry, materials science, and data science.

Module learning outcomes

Students will be able to:

  • Apply features of programming languages including data structures, loops, conditions, functions

  • Interpret documentation of common Python libraries and implement new functions from those libraries.

  • Write programs to solve a variety of chemical problems, including simulations and data analysis.

  • Implement supervised machine learning algorithms using Scikit-Learn.

  • Evaluate the performance of a machine learning algorithm and implement techniques to improve it.

  • Describe how chemical structure and data can be encoded for use in machine learning models.

  • Recognise which chemical problems can be addressed using machine learning, and evaluate the effectiveness of machine learning in solving these problems.

Module content

Programming for Chemistry

Lectures:

  • Functions and control flow

  • Solving complex equations

  • Hypothesis Tests

  • Mathematical and Statistical Libraries

  • Compiled Languages

Workshops:

  • Functions and control flow

    • Programming LO: Implement and apply functions within a loop

    • Chemistry LO: Use the Nernst equation to calculate the lifetime of a battery.

  • Solving complex equations

    • Programming LO: Convert mathematical equations into code

    • Chemistry LO: Solving systems of kinetics equations

  • Mathematical and Statistical Libraries

    • Programming LO: Learn to read and apply the documentation of advanced Python libraries such as statsmodels, scipy.

    • Chemistry LO: Quantifying the effectiveness of candidate drug molecules.

  • Compiled Languages

    • Programming LO: Python is easy to use but slow; some chemical applications require efficient computing power best obtained from compiled languages.

    • Chemistry LO: Calculating and analysing crystal field splitting patterns and ligand field splitting parameters.

Introduction to Machine Learning

Lectures:

  • Introduction to Machine Learning

  • Perceptrons

  • Logistic Regression

  • SVMs and Decision Trees

  • Model Improvement

Workshops:

  • Perceptions and Logistic Regression

    • Programming LO: Implement Perceptron and Logistic Regression classifiers

    • Chemistry LO: Analyse the reaction conditions which affect the yields of cross-coupling reactions, and propose chemical reasons for the observed trends.

  • SVMs and Decision Trees

    • Programming LO: Implement SVM and DT regression models

    • Chemistry LO: ML can be used to predict hard to measure properties using easily measured properties (e.g. fatty acid content of olive oil).

  • Model Improvement

    • Programming LO: Identify and optimise hyperparameters, implement cross-validation

    • Chemistry LO: Estimate the uncertainties in machine-learned reaction yield predictions, and compare them to uncertainties in experimental data.

Machine Learning in Chemistry

Lectures:

  • Encoding chemical information

  • Feature selection and improvement

  • Machine-readable descriptors of molecules and materials

  • Interpreting ML models to gain chemical insights

Workshops:

  • Feature selection and improvement

    • Programming LO: Identify and encode chemically relevant features

    • Chemistry LO: Identify patterns within large atmospheric chemistry datasets, and explain them using chemical reaction schemes.

  • Machine-readable descriptors of molecules and materials

    • Programming LO: Encode 2D and 3D chemical structures in machine-readable formats

    • Chemistry LO: Predict chemical toxicity based on chemical structures, and identify responsible patterns within those structures

  • Interpreting ML models to gain chemical insights

    • Programming LO: Use SHAP analysis to identify the most important features in a dataset.

    • Chemistry LO: Identify molecular properties which best predict antimicrobial behaviour.

Indicative assessment

Task % of module mark
Essay/coursework 100.0

Special assessment rules

None

Additional assessment information

Students will be provided with a choice of datasets, and asked to select one and perform some data analysis, including the training of at least one machine learning model. They should summarise their key findings in a short presentation slidedeck, which will be submitted alongside the code written to perform the data analysis.

Indicative reassessment

Task % of module mark
Essay/coursework 100.0

Module feedback

Students will receive feedback on their performance in their coursework within 4 weeks. Oral feedback for the formative workshops will be given during the sessions.

Indicative reading

Texts will be recommended by lecturers for the component lecture courses.



The information on this page is indicative of the module that is currently on offer. The University constantly explores ways to enhance and improve its degree programmes and therefore reserves the right to make variations to the content and method of delivery of modules, and to discontinue modules, if such action is reasonably considered to be necessary. In some instances it may be appropriate for the University to notify and consult with affected students about module changes in accordance with the University's policy on the Approval of Modifications to Existing Taught Programmes of Study.