scholarly journals Data-Driven Chemical Reaction Classification, Fingerprinting and Clustering using Attention-Based Neural Networks

Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
Teodoro Laino ◽  
...  

<p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. The classification process is a tedious task, requiring first an accurate mapping of the reaction (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present two transformer-based models that infer reaction classes from the SMILES representation of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We study the incorrect predictions of the models and show that they reveal different biases and mistakes in the underlying data set. Using the embeddings of our classification model, we introduce reaction fingerprints that do not require knowing the reaction center or distinguishing between reactants and reagents. This conversion from chemical reactions to feature vectors enables efficient clustering and similarity search in the reaction space. We compare the reaction clustering for combinations of self-supervised, supervised, and molecular shingle-based reaction representations.</p>

Author(s):  
Philippe Schwaller ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
Teodoro Laino

<div><div><div><p>Organic reactions are usually clustered in classes that collect entities undergoing similar structural rearrangement. The classification process is a tedious task, requiring first an accurate mapping of the rearrangement (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present a transformer-based model that infers reaction classes from the SMILES representation of chemical reactions. The model reaches an accuracy of 93.8 % for a multi-class classification task involving several hundred different classes. The attention weights provided by the model give an insight into what parts of the SMILES strings are taken into account for classification, based solely on data. We study the incorrect predictions of our model and show that it uncovers different biases and mistakes in the underlying data set.</p></div></div></div>


2020 ◽  
Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
David Kreutter ◽  
...  

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>


2020 ◽  
Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
David Kreutter ◽  
...  

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>


Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
David Kreutter ◽  
...  

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>


2021 ◽  
Vol 7 (15) ◽  
pp. eabe4166
Author(s):  
Philippe Schwaller ◽  
Benjamin Hoover ◽  
Jean-Louis Reymond ◽  
Hendrik Strobelt ◽  
Teodoro Laino

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.


2021 ◽  
Vol 65 (1) ◽  
pp. 11-22
Author(s):  
Mengyao Lu ◽  
Shuwen Jiang ◽  
Cong Wang ◽  
Dong Chen ◽  
Tian’en Chen

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading


2021 ◽  
pp. 36-43
Author(s):  
L. A. Demidova ◽  
A. V. Filatov

The article considers an approach to solving the problem of monitoring and classifying the states of hard disks, which is solved on a regular basis, within the framework of the concept of non-destructive testing. It is proposed to solve this problem by developing a classification model using machine learning algorithms, in particular, using recurrent neural networks with Simple RNN, LSTM and GRU architectures. To develop a classification model, a data set based on the values of SMART sensors installed on hard disks it used. It represents a group of multidimensional time series. At the same time, the structure of the classification model contains two layers of a neural network with one of the recurrent architectures, as well as a Dropout layer and a Dense layer. The results of experimental studies confirming the advantages of LSTM and GRU architectures as part of hard disk state classification models are presented.


2021 ◽  
Author(s):  
Pavlos Karagiannidis ◽  
Nikolaos Themelis

The paper examines data-driven techniques for the modeling of ship propulsion that could support a strategy for the reduction of emissions and be utilized for the optimization of a fleet’s operations. A large, high-frequency and automated collected data set is exploited for producing models that estimate the required shaft power or main engine’s fuel consumption of a container ship sailing under arbitrary conditions. A variety of statistical calculations and algorithms for data processing are implemented and state-of-the-art techniques for training and optimizing Feed-Forward Neural Networks (FNNs) are applied. Emphasis is given in the pre-processing of the data and the results indicate that with a proper filtering and preparation stage it is possible to significantly increase the model’s accuracy. Thus, increase our prediction ability and our awareness regarding the ship's hull and propeller actual condition.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8453
Author(s):  
Rafia Nishat Toma ◽  
Farzin Piltan ◽  
Jong-Myon Kim

Fault diagnosis and classification for machines are integral to condition monitoring in the industrial sector. However, in recent times, as sensor technology and artificial intelligence have developed, data-driven fault diagnosis and classification have been more widely investigated. The data-driven approach requires good-quality features to attain good fault classification accuracy, yet domain expertise and a fair amount of labeled data are important for better features. This paper proposes a deep auto-encoder (DAE) and convolutional neural network (CNN)-based bearing fault classification model using motor current signals of an induction motor (IM). Motor current signals can be easily and non-invasively collected from the motor. However, the current signal collected from industrial sources is highly contaminated with noise; feature calculation thus becomes very challenging. The DAE is utilized for estimating the nonlinear function of the system with the normal state data, and later, the residual signal is obtained. The subsequent CNN model then successfully classified the types of faults from the residual signals. Our proposed semi-supervised approach achieved very high classification accuracy (more than 99%). The inclusion of DAE was found to not only improve the accuracy significantly but also to be potentially useful when the amount of labeled data is small. The experimental outcomes are compared with some existing works on the same dataset, and the performance of this proposed combined approach is found to be comparable with them. In terms of the classification accuracy and other evaluation parameters, the overall method can be considered as an effective approach for bearing fault classification using the motor current signal.


2019 ◽  
Vol 16 (1) ◽  
pp. 93-113 ◽  
Author(s):  
Shengye Pang ◽  
Guobing Zou ◽  
Yanglan Gan ◽  
Sen Niu ◽  
Bofeng Zhang

Web service classification has become an urgent demand on service-oriented applications. Most existing classification algorithms mainly rely on the original service descriptions. That leads to low classification accuracy, since it cannot fully reflect the semantic feature specific to a service category. To solve the issue, this article proposes a novel approach for web service classification, including service topic feature extraction, service functionality augmentation, and service classification model learning. The characteristic is that the original service descriptions can be semantically augmented, which is fed to deriving a service classifier via labeled probabilistic topic model. A benefit from this approach is that it can be applied to an online service management platform, where it assists service providers to facilitate the registration process. Extensive experiments have been conducted on a large-scale real-world data set crawled from ProgrammableWeb. The results demonstrate that it outperforms state-of-the-art methods in terms of service classification accuracy and convergence speed.


Sign in / Sign up

Export Citation Format

Share Document