Using Machine Learning to Discover Highly Selective Catalysts
AI has recently made headlines due to ChatGPT’s language processing capability. Creating a similarly powerful tool for chemical reaction design remains a significant challenge, especially for complex catalytic reactions, the journal Angewandte Chemie International Edition reported.
To overcome this, the researchers created a Machine Learning method that utilises advanced and efficient 2D chemical descriptors to accurately predict highly selective asymmetric catalysts without requiring quantum chemical computations.
“There have been several advanced technologies which can predict catalyst structures, but those methods often required large investments of calculation resources and time, yet their accuracy was still limited,” said Nobuya Tsuji, a joint first author of the study. “In this project, we have developed a predictive model which you can run even with an everyday laptop PC.”
For a computer to learn chemical information, molecules are usually represented as a collection of descriptors, which often consist of small parts, or fragments, of those molecules. These are easier for AI to process and can be arranged and rearranged to construct different molecules, much like Lego pieces can be placed and connected in various ways to build other structures.
However, computationally cheaper 2D descriptors have struggled to accurately represent complex catalyst structures, leading to inaccurate predictions. To improve this issue, researchers developed new ‘Circular Substructure’ (CircuS) 2D descriptors that explicitly represent cyclic and branched hydrocarbon structures, which are common in catalysts.
Training data for the AI was obtained through experiments via a streamlined, semi-automatic process which utilised a synthesis robot. This experimental data was then converted into descriptors and used to train the AI model.
Researchers used the fully trained model to virtually test 190 catalysts that were not part of the training data. In this set, the AI model could predict highly selective catalysts after only having been trained on the data of catalysts with moderate selectivity, showing an ability to extrapolate beyond the training data.
The catalyst predicted to have the highest selectivity was then tested experimentally, exhibiting a selectivity nearly identical to that predicted by the AI model. Obtaining high selectivity is especially crucial for designing new medicines. This technique provides chemists with a robust framework for optimising selectivity that is efficient in both computational and labour costs.
“Often, chemists would use models based on quantum chemical calculations to predict new selective catalysts. However, such models are computationally costly, and when the number of compounds and the size of molecules increases, their application becomes limited,” commented Pavel Sidorov, another joint first author of the study.
Sidorov concluded: “Models based on 2D structures are much cheaper and, therefore, can process hundreds and thousands of molecules in seconds. This allows chemists to filter out the compounds they may not be interested in much more quickly.”
4155/v