Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Philippe Schwaller; Benjamin Hoover; Jean-Louis Reymond; Hendrik Strobelt; Teodoro Laino

doi:10.1126/sciadv.abe4166

Data-Driven Chemical Reaction Classification with Attention-Based Neural Networks

10.26434/chemrxiv.9897365.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Philippe Schwaller ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

Teodoro Laino

Keyword(s):

Neural Networks ◽

Chemical Reactions ◽

Chemical Reaction ◽

Structural Rearrangement ◽

Data Driven ◽

Classification Task ◽

Data Set ◽

Atom Mapping ◽

Multi Class Classification ◽

Insight Into

<div><div><div><p>Organic reactions are usually clustered in classes that collect entities undergoing similar structural rearrangement. The classification process is a tedious task, requiring first an accurate mapping of the rearrangement (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present a transformer-based model that infers reaction classes from the SMILES representation of chemical reactions. The model reaches an accuracy of 93.8 % for a multi-class classification task involving several hundred different classes. The attention weights provided by the model give an insight into what parts of the SMILES strings are taken into account for classification, based solely on data. We study the incorrect predictions of our model and show that it uncovers different biases and mistakes in the underlying data set.</p></div></div></div>

Download Full-text

Unsupervised Attention-Guided Atom-Mapping

10.26434/chemrxiv.12298559.v1 ◽

2020 ◽

Cited By ~ 4

Author(s):

Philippe Schwaller ◽

Benjamin Hoover ◽

Jean-Louis Reymond ◽

Hendrik Strobelt ◽

Teodoro Laino

Keyword(s):

Chemical Reactions ◽

Relevant Information ◽

Data Driven ◽

Rule Based ◽

Specific Data ◽

Domain Specific ◽

Textual Representations ◽

Theoretical Approaches ◽

Atom Mapping ◽

Np Hard Problem

Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.<div><br></div><div>Code is available at: https://github.com/rxn4chemistry/rxnmapper</div>

Download Full-text

Data-Driven Chemical Reaction Classification, Fingerprinting and Clustering using Attention-Based Neural Networks

10.26434/chemrxiv.9897365.v2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Philippe Schwaller ◽

Daniel Probst ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

Teodoro Laino ◽

...

Keyword(s):

Neural Networks ◽

Reaction Center ◽

Chemical Reactions ◽

Chemical Reaction ◽

Classification Accuracy ◽

Similarity Search ◽

Classification Model ◽

Data Driven ◽

Data Set ◽

Atom Mapping

<p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. The classification process is a tedious task, requiring first an accurate mapping of the reaction (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present two transformer-based models that infer reaction classes from the SMILES representation of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We study the incorrect predictions of the models and show that they reveal different biases and mistakes in the underlying data set. Using the embeddings of our classification model, we introduce reaction fingerprints that do not require knowing the reaction center or distinguishing between reactants and reagents. This conversion from chemical reactions to feature vectors enables efficient clustering and similarity search in the reaction space. We compare the reaction clustering for combinations of self-supervised, supervised, and molecular shingle-based reaction representations.</p>

Download Full-text

Unsupervised Attention-Guided Atom-Mapping

10.26434/chemrxiv.12298559 ◽

2020 ◽

Author(s):

Philippe Schwaller ◽

Benjamin Hoover ◽

Jean-Louis Reymond ◽

Hendrik Strobelt ◽

Teodoro Laino

Keyword(s):

Chemical Reactions ◽

Relevant Information ◽

Data Driven ◽

Rule Based ◽

Specific Data ◽

Domain Specific ◽

Textual Representations ◽

Theoretical Approaches ◽

Atom Mapping ◽

Np Hard Problem

Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.<div><br></div><div>Code is available at: https://github.com/rxn4chemistry/rxnmapper</div>

Download Full-text

Discussion on catalytic reactions at high pressures

Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character ◽

10.1098/rspa.1930.0054 ◽

1930 ◽

Vol 127 (805) ◽

pp. 240-267 ◽

Cited By ~ 2

Keyword(s):

Organic Chemistry ◽

Liquid Phase ◽

Chemical Synthesis ◽

Chemical Reactions ◽

High Pressures ◽

Catalytic Reactions ◽

Atmospheric Conditions ◽

Chemical Reagents ◽

Boiling Points ◽

Systematic Development

The study of catalytic reactions under high pressure began with the systematic development of organic chemistry and more especially in connection with the preparation of intermediates required in the production of synthetic colouring matters. The General Use of Pressure in Chemical Synthesis . Pressure is employed as an aid to chemical reactions for one or both of the following reasons :— 1. Pressure diminishes the volatility of chemical reagents, thus retaining them in the liquid phase even when the chemical reactions involved take place at temperatures above the boiling points of these reagents under atmospheric conditions.

Download Full-text

Machine Learning Based Interpretation of Microkinetic Data: A Fischer-Tropsch Synthesis Case Study

Reaction Chemistry & Engineering ◽

10.1039/d1re00351h ◽

2021 ◽

Author(s):

Anoop Chakkingal ◽

Pieter Janssens ◽

Jeroen Poissonnier ◽

Alan J Barrios ◽

Mirella Virginie ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Chemical Reactions ◽

Tropsch Synthesis ◽

Data Driven ◽

Fischer Tropsch ◽

Fischer Tropsch Synthesis ◽

Artificial Neural

Machine-Learning (ML) methods, such as Artificial Neural Networks (ANN) bring the data-driven design of chemical reactions within reach. Simultaneously with the verification of the absence of any bias in the...

Download Full-text

An Online Data-Driven Fault Diagnosis Method for Air Handling Units by Rule and Convolutional Neural Networks

Sensors ◽

10.3390/s21134358 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4358

Author(s):

Huanyue Liao ◽

Wenjian Cai ◽

Fanyong Cheng ◽

Swapnil Dubey ◽

Pudupadi Balachander Rajesh

Keyword(s):

Neural Networks ◽

Fault Diagnosis ◽

Convolutional Neural Networks ◽

Hvac Systems ◽

Data Driven ◽

Stable Operation ◽

Online Data ◽

Rule Based ◽

Air Handling Units ◽

Diagnosis Method

The stable operation of air handling units (AHU) is critical to ensure high efficiency and to extend the lifetime of the heating, ventilation, and air conditioning (HVAC) systems of buildings. In this paper, an online data-driven diagnosis method for AHU in an HVAC system is proposed and elaborated. The rule-based method can roughly detect the sensor condition by setting threshold values according to prior experience. Then, an efficient feature selection method using 1D convolutional neural networks (CNNs) is proposed for fault diagnosis of AHU in HVAC systems according to the system’s historical data obtained from the building management system. The new framework combines the rule-based method and CNNs-based method (RACNN) for sensor fault and complicated fault. The fault type of AHU can be accurately identified via the offline test results with an accuracy of 99.15% and fast online detection within 2 min. In the lab, the proposed RACNN method was validated on a real AHU system. The experimental results show that the proposed RACNN improves the performance of fault diagnosis.

Download Full-text

Transformer-based artificial neural networks for the conversion between chemical notations

Scientific Reports ◽

10.1038/s41598-021-94082-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Lev Krasnov ◽

Ivan Khokhlov ◽

Maxim V. Fedorov ◽

Sergey Sosnin

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Rapid Development ◽

Performance Level ◽

Rule Based ◽

Development Costs ◽

Overall Performance ◽

Artificial Neural

AbstractWe developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.

Download Full-text

Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes

Organic Chemistry Frontiers ◽

10.1039/d0qo01636e ◽

2021 ◽

Author(s):

Yun Zhang ◽

Ling Wang ◽

Xinqiao Wang ◽

Chengyun Zhang ◽

Jiamin Ge ◽

...

Keyword(s):

Organic Chemistry ◽

Deep Learning ◽

Drug Discovery ◽

Research And Development ◽

Learning Strategies ◽

Transfer Learning ◽

Chemical Reactions ◽

Data Augmentation ◽

Learning Method ◽

Reaction Prediction

An effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery.

Download Full-text

High accuracy data-driven heliostat calibration and state prediction with pretrained deep neural networks

Solar Energy ◽

10.1016/j.solener.2021.01.046 ◽

2021 ◽

Vol 218 ◽

pp. 48-56

Author(s):

Max Pargmann ◽

Daniel Maldonado Quinto ◽

Peter Schwarzbözl ◽

Robert Pitz-Paal

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

High Accuracy ◽

Data Driven ◽

State Prediction ◽

Accuracy Data

Download Full-text