Programming and Machine Learning for Chemistry - CHE00040I
- Department: Chemistry
- Credit value: 20 credits
- Credit level: I
- Academic year of delivery: 2026-27
Module summary
Modern chemists use digital tools to solve complex chemical problems, from simulating reaction kinetics to predicting molecular properties using real-world data, to designing synthetic reactions. This module equips students with essential computational skills by introducing Python programming and key Machine Learning (ML) algorithms, and illustrating how these transferable skills can be applied to a wide range of chemical problems.
Related modules
Prohibited combinations
- Genes to Proteins (CHE00021I)
- The Material World: Chemistry & Applications (CHE00023I)
- Green & Sustainable Chemistry (CHE00030I)
Module will run
| Occurrence | Teaching period |
|---|---|
| A | Semester 2 2026-27 |
Module aims
The primary aim of this module is to integrate computational methods with core chemical principles, allowing students to move beyond traditional analysis and tackle complex, real-world chemical challenges. The module illustrates how research in almost every area of chemistry is being accelerated by routine incorporation of both classic programming and machine learning workflows. The module begins by looking at how traditional programming methods can be used to solve the complex systems of equations used in chemical kinetics and quantum mechanics. It then moves on to explore how widely used machine learning methods can be applied to predict a range of chemical properties, from reaction yields to toxicity. Finally, the module will introduce chemistry-specific machine learning techniques, focussing on how to encode chemical structures, and how to build interpretable models to gain chemical insights from the models trained. Through a blend of lectures and hands-on workshops, students will master fundamental programming concepts using Python and gain proficiency in implementing key Machine Learning (ML) algorithms using libraries like Scikit-Learn, providing a highly transferable skill set in the evolving landscape of chemistry, materials science, and data science.
Module learning outcomes
Students will be able to:
-
Apply features of programming languages including data structures, loops, conditions, functions
-
Interpret documentation of common Python libraries and implement new functions from those libraries.
-
Write programs to solve a variety of chemical problems, including simulations and data analysis.
-
Implement supervised machine learning algorithms using Scikit-Learn.
-
Evaluate the performance of a machine learning algorithm and implement techniques to improve it.
-
Describe how chemical structure and data can be encoded for use in machine learning models.
-
Recognise which chemical problems can be addressed using machine learning, and evaluate the effectiveness of machine learning in solving these problems.
Module content
Programming for Chemistry
Lectures:
-
Functions and control flow
-
Solving complex equations
-
Hypothesis Tests
-
Mathematical and Statistical Libraries
-
Compiled Languages
Workshops:
-
Functions and control flow
-
Programming LO: Implement and apply functions within a loop
-
Chemistry LO: Use the Nernst equation to calculate the lifetime of a battery.
-
-
Solving complex equations
-
Programming LO: Convert mathematical equations into code
-
Chemistry LO: Solving systems of kinetics equations
-
-
Mathematical and Statistical Libraries
-
Programming LO: Learn to read and apply the documentation of advanced Python libraries such as statsmodels, scipy.
-
Chemistry LO: Quantifying the effectiveness of candidate drug molecules.
-
-
Compiled Languages
-
Programming LO: Python is easy to use but slow; some chemical applications require efficient computing power best obtained from compiled languages.
-
Chemistry LO: Calculating and analysing crystal field splitting patterns and ligand field splitting parameters.
-
Introduction to Machine Learning
Lectures:
-
Introduction to Machine Learning
-
Perceptrons
-
Logistic Regression
-
SVMs and Decision Trees
-
Model Improvement
Workshops:
-
Perceptions and Logistic Regression
-
Programming LO: Implement Perceptron and Logistic Regression classifiers
-
Chemistry LO: Analyse the reaction conditions which affect the yields of cross-coupling reactions, and propose chemical reasons for the observed trends.
-
-
SVMs and Decision Trees
-
Programming LO: Implement SVM and DT regression models
-
Chemistry LO: ML can be used to predict hard to measure properties using easily measured properties (e.g. fatty acid content of olive oil).
-
-
Model Improvement
-
Programming LO: Identify and optimise hyperparameters, implement cross-validation
-
Chemistry LO: Estimate the uncertainties in machine-learned reaction yield predictions, and compare them to uncertainties in experimental data.
-
Machine Learning in Chemistry
Lectures:
-
Encoding chemical information
-
Feature selection and improvement
-
Machine-readable descriptors of molecules and materials
-
Interpreting ML models to gain chemical insights
Workshops:
-
Feature selection and improvement
-
Programming LO: Identify and encode chemically relevant features
-
Chemistry LO: Identify patterns within large atmospheric chemistry datasets, and explain them using chemical reaction schemes.
-
-
Machine-readable descriptors of molecules and materials
-
Programming LO: Encode 2D and 3D chemical structures in machine-readable formats
-
Chemistry LO: Predict chemical toxicity based on chemical structures, and identify responsible patterns within those structures
-
-
Interpreting ML models to gain chemical insights
-
Programming LO: Use SHAP analysis to identify the most important features in a dataset.
-
Chemistry LO: Identify molecular properties which best predict antimicrobial behaviour.
-
Indicative assessment
| Task | % of module mark |
|---|---|
| Essay/coursework | 100.0 |
Special assessment rules
None
Additional assessment information
Students will be provided with a choice of datasets, and asked to select one and perform some data analysis, including the training of at least one machine learning model. They should summarise their key findings in a short presentation slidedeck, which will be submitted alongside the code written to perform the data analysis.
Indicative reassessment
| Task | % of module mark |
|---|---|
| Essay/coursework | 100.0 |
Module feedback
Students will receive feedback on their performance in their coursework within 4 weeks. Oral feedback for the formative workshops will be given during the sessions.
Indicative reading
Texts will be recommended by lecturers for the component lecture courses.