scholarly journals Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

2021 ◽  
Vol 7 (15) ◽  
pp. eabe4166
Author(s):  
Philippe Schwaller ◽  
Benjamin Hoover ◽  
Jean-Louis Reymond ◽  
Hendrik Strobelt ◽  
Teodoro Laino

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.

Author(s):  
Philippe Schwaller ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
Teodoro Laino

<div><div><div><p>Organic reactions are usually clustered in classes that collect entities undergoing similar structural rearrangement. The classification process is a tedious task, requiring first an accurate mapping of the rearrangement (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present a transformer-based model that infers reaction classes from the SMILES representation of chemical reactions. The model reaches an accuracy of 93.8 % for a multi-class classification task involving several hundred different classes. The attention weights provided by the model give an insight into what parts of the SMILES strings are taken into account for classification, based solely on data. We study the incorrect predictions of our model and show that it uncovers different biases and mistakes in the underlying data set.</p></div></div></div>


Author(s):  
Philippe Schwaller ◽  
Benjamin Hoover ◽  
Jean-Louis Reymond ◽  
Hendrik Strobelt ◽  
Teodoro Laino

Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.<div><br></div><div>Code is available at: https://github.com/rxn4chemistry/rxnmapper</div>


Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
Teodoro Laino ◽  
...  

<p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. The classification process is a tedious task, requiring first an accurate mapping of the reaction (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present two transformer-based models that infer reaction classes from the SMILES representation of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We study the incorrect predictions of the models and show that they reveal different biases and mistakes in the underlying data set. Using the embeddings of our classification model, we introduce reaction fingerprints that do not require knowing the reaction center or distinguishing between reactants and reagents. This conversion from chemical reactions to feature vectors enables efficient clustering and similarity search in the reaction space. We compare the reaction clustering for combinations of self-supervised, supervised, and molecular shingle-based reaction representations.</p>


2020 ◽  
Author(s):  
Philippe Schwaller ◽  
Benjamin Hoover ◽  
Jean-Louis Reymond ◽  
Hendrik Strobelt ◽  
Teodoro Laino

Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.<div><br></div><div>Code is available at: https://github.com/rxn4chemistry/rxnmapper</div>


The study of catalytic reactions under high pres­sure began with the systematic development of organic chemistry and more especially in connection with the preparation of intermediates required in the production of synthetic colouring matters. The General Use of Pressure in Chemical Synthesis . Pressure is employed as an aid to chemical reactions for one or both of the following reasons :— 1. Pressure diminishes the volatility of chemical reagents, thus retaining them in the liquid phase even when the chemical reactions involved take place at temperatures above the boiling points of these reagents under atmospheric conditions.


Author(s):  
Anoop Chakkingal ◽  
Pieter Janssens ◽  
Jeroen Poissonnier ◽  
Alan J Barrios ◽  
Mirella Virginie ◽  
...  

Machine-Learning (ML) methods, such as Artificial Neural Networks (ANN) bring the data-driven design of chemical reactions within reach. Simultaneously with the verification of the absence of any bias in the...


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4358
Author(s):  
Huanyue Liao ◽  
Wenjian Cai ◽  
Fanyong Cheng ◽  
Swapnil Dubey ◽  
Pudupadi Balachander Rajesh

The stable operation of air handling units (AHU) is critical to ensure high efficiency and to extend the lifetime of the heating, ventilation, and air conditioning (HVAC) systems of buildings. In this paper, an online data-driven diagnosis method for AHU in an HVAC system is proposed and elaborated. The rule-based method can roughly detect the sensor condition by setting threshold values according to prior experience. Then, an efficient feature selection method using 1D convolutional neural networks (CNNs) is proposed for fault diagnosis of AHU in HVAC systems according to the system’s historical data obtained from the building management system. The new framework combines the rule-based method and CNNs-based method (RACNN) for sensor fault and complicated fault. The fault type of AHU can be accurately identified via the offline test results with an accuracy of 99.15% and fast online detection within 2 min. In the lab, the proposed RACNN method was validated on a real AHU system. The experimental results show that the proposed RACNN improves the performance of fault diagnosis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Lev Krasnov ◽  
Ivan Khokhlov ◽  
Maxim V. Fedorov ◽  
Sergey Sosnin

AbstractWe developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.


Author(s):  
Yun Zhang ◽  
Ling Wang ◽  
Xinqiao Wang ◽  
Chengyun Zhang ◽  
Jiamin Ge ◽  
...  

An effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery.


Solar Energy ◽  
2021 ◽  
Vol 218 ◽  
pp. 48-56
Author(s):  
Max Pargmann ◽  
Daniel Maldonado Quinto ◽  
Peter Schwarzbözl ◽  
Robert Pitz-Paal

Sign in / Sign up

Export Citation Format

Share Document