LIDo banner

Apply now

Find out more about the different routes to entry and our eligibility criteria

Alexander Kalian: Comparison of Optimised Geometric Deep Learning Architectures, over Varying Toxicological Assay Data Environments

alex kal
Geometric deep learning is an emerging technique in Artificial Intelligence (AI) driven cheminformatics, however the unique implications of different Graph Neural Network (GNN) architectures are poorly explored, for this space.

This study compared performances of Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs) and Graph Isomorphism Networks (GINs), applied to 7 different toxicological assay datasets of varying data abundance and endpoint, to perform binary classification of assay activation. Following pre-processing of molecular graphs, enforcement of class-balance and stratification of all datasets across 5 folds, Bayesian optimisations were carried out, for each GNN applied to each assay dataset (resulting in 21 unique Bayesian optimisations). 

Optimised GNNs performed at Area Under the Curve (AUC) scores ranging from 0.728-0.849 (averaged across all folds), naturally varying between specific assays and GNNs. GINs were found to consistently outperform GCNs and GATs, for the top 5 of 7 most data-abundant toxicological assays. GATs however significantly outperformed over the remaining 2 most data-scarce assays. This indicates that GINs are a more optimal architecture for data-abundant environments, whereas GATs are a more optimal architecture for data-scarce environments. Subsequent analysis of the explored higher-dimensional hyperparameter spaces, as well as optimised hyperparameter states, found that GCNs and GATs reached measurably closer optimised states with each other, compared to GINs, further indicating the unique nature of GINs as a GNN algorithm.

1.1 – Geometric Deep Learning in the Toxicological Sciences Geometric deep learning entails deep learning on non-Euclidean data structures, such as graphs and manifolds [1,2]. This differs from conventional deep learning algorithms, which primarily process grid-structured matrices of data (e.g. image pixel data, or matrices representing sequences of text) [1,2]. Geometric deep learning is emerging in application to data science in computational toxicology (as well as the wider field of cheminformatics), as approaches such as Graph Neural Networks (GNNs) may directly process molecules as molecular graphs which encode their native structures of bonded atoms, enriched with further physicochemical information about constituent atoms and bonds, via node and edge feature vectors [1-5]. GNNs hence enable seamless training, testing and predicting across molecular datasets, with algorithms that conserve the inherently graph-structured form of molecules [1- 5]. 

GNNs have emerged as an especially effective technique in Quantitative Structure-Activity Relationship (QSAR) modelling [3-5], which may be used to improve human, animal and environmental health, via safer and more effective chemicals, while simultaneously helping reduce the need for animal testing [4]. Beyond molecular graphs, GNNs are also prominent in use for predicting over knowledge graphs -general relational data structures that encode entities and their relations [1,2]. This is 3 also used in cheminformatics studies [6,19], however molecular graph-based approaches remain more typical and intuitively suited for chemical data representation [1-5]. A variety of different GNN architectures have been invented, by foundational researchers in Artificial Intelligence (AI), typically designed for broad applicability across different domains [7,8,10]. A common theme across GNN variants is the paradigm of message-passing, in which node (and potentially edge) features are altered to contain information from neighbours, to encode information about the surrounding connected environment and hence empower a model to de-facto discern the wider graph topology [1,2,7,8,10].

Read full article here