Artificial Intelligence and Machine Learning in Biomedical Research

Artificial Intelligence (AI) and Machine Learning (ML) are becoming commonplace in our everyday lives and are an important tool for researchers. Their use in YBRI is no exception with a range of ML approaches being applied across a diverse range of biomedical applications.

There are many different types of AI and ML available, often classified into three main types - supervised, unsupervised and reinforcement learning. The choice of algorithm depends on the nature of the data and problem at hand, and is usually determined following experimental investigation and comparison of algorithm performance. Within YBRI a number of projects have successfully used AI and ML to achieve notable advances in understanding the underlying biomedical mechanisms and characteristics.

Deep Neural Networks are a supervised ML approach that has been used within YBRI to investigate the importance of global symmetry in detecting the “gist of the abnormal” in mammographic images, which has shown to be important for the early detection of abnormalities (Evans). Work is exploring the effect of bilateral differences by developing and training a neural network model which can reliably detect whether a set of mammograms is composed of images taken from the same woman, or two different women. Detection of bilateral asymmetry remains even when mammograms are balanced by size, age, density and machine of acquisition; indicative that a “symmetry signal” exists and is relevant for breast cancer detection. Identifying this symmetry signal will allow researchers to characterise the perceptual signal of very early signs of cancer allowing for development of measures of early risk assessment and the improvement of computer aided detection systems.

Deep Neural Networks are used to investigate the importance of global symmetry in detecting the “gist of the abnormal” in mammographic images

Another type of supervised learning, a Convolutional Neural Network (CNN) called You Only Look Once (YOLO), offers exciting opportunities for high-throughput data analysis in microscopy. Work within YBRI has used this ML to provide insight into biological systems, particularly from the way that cells move and interact, and the role of motility in infection (Wilson, Matthews). Holographic microscopy permits the capture of 3D information about a sample in a single image. By repeating this process across multiple images it can (e.g.) track cells or map the shape of organelles, giving insight into a cell's internal signalling state during processes such as chemotaxis and phototaxis. AI speeds up the analysis of holographic images 100-fold, so the technique can be implemented at the microscope instead of offline at a computer. This offers the chance for real-time observation of stimulus response, and greater control of experiments as they happen.

3D tracks of swimming cells obtained from a holographic microscope (published in Thornton et al.)

The unsupervised ML technique of clustering of many data types has been employed in both individual- and pan-cancer analyses to stratify patient tumours. Whilst DNA and RNA sequencing technologies provide data rich, high dimension descriptions of tumour phenotype, these technologies have yet to dramatically improve outcomes in the clinic, even as the health service moves to a sequencing-based diagnostic future. Research within YBRI aims to combine multiple data types at the point of stratification, rather than stratifying on a single technology and then correlating with other metrics (Mason, Southgate, Ungureanu, Halliday, Smith). The aim of the work is to better stratify muscle-invasive bladder cancer utilising novel data generated in house, and harnessing public data from large international consortia.

Urothelial gene co-regulatory network built using RNA sequencing data from healthy human tissues. This provides a baseline for identifying and understanding regulatory network perturbations in cancer.

Most machine learning approaches, including deep learning and convolutional neural networks, are often termed “black box” as it can be difficult to understand precisely how input data is manipulated to achieve the result obtained. Cartesian Genetic Programming, a so-called “white box” approach, can represent the evolved solution as a precise mathematical expression and is being applied to understand better the underlying biological systems and processes, such as characterising cell cultures from time-lapse microscopy and movements associated with Parkinson’s disease in zebrafish and humans (Southgate, Pownall, Smith).

Characterising cell cultures from time-lapse microscopy, movements associated with Parkinson’s disease in zebrafish and humans (Southgate, Pownall, Smith).