Data-Driven Chemical Reaction Classification, Fingerprinting and Clustering using Attention-Based Neural Networks

<div><div><div><p>Organic reactions are usually clustered in classes that collect entities undergoing similar structural rearrangement. The classification process is a tedious task, requiring first an accurate mapping of the rearrangement (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present a transformer-based model that infers reaction classes from the SMILES representation of chemical reactions. The model reaches an accuracy of 93.8 % for a multi-class classification task involving several hundred different classes. The attention weights provided by the model give an insight into what parts of the SMILES strings are taken into account for classification, based solely on data. We study the incorrect predictions of our model and show that it uncovers different biases and mistakes in the underlying data set.</p></div></div></div>

Download Full-text

Mapping the Space of Chemical Reactions using Attention-Based Neural Networks

10.26434/chemrxiv.9897365.v3 ◽

2020 ◽

Author(s):

Philippe Schwaller ◽

Daniel Probst ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

David Kreutter ◽

...

Keyword(s):

Neural Networks ◽

Reaction Center ◽

Chemical Reactions ◽

Chemical Reaction ◽

Classification Accuracy ◽

Similarity Searching ◽

Reaction Space ◽

Fine Grained ◽

Visual Clustering ◽

Better Than

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>

Download Full-text

Mapping the Space of Chemical Reactions using Attention-Based Neural Networks

10.26434/chemrxiv.9897365.v4 ◽

2020 ◽

Author(s):

Philippe Schwaller ◽

Daniel Probst ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

David Kreutter ◽

...

Keyword(s):

Neural Networks ◽

Reaction Center ◽

Chemical Reactions ◽

Chemical Reaction ◽

Classification Accuracy ◽

Similarity Searching ◽

Reaction Space ◽

Fine Grained ◽

Visual Clustering ◽

Better Than

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>

Download Full-text

Mapping the Space of Chemical Reactions using Attention-Based Neural Networks

10.26434/chemrxiv.9897365 ◽

2020 ◽

Cited By ~ 1

Author(s):

Philippe Schwaller ◽

Daniel Probst ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

David Kreutter ◽

...

Keyword(s):

Neural Networks ◽

Reaction Center ◽

Chemical Reactions ◽

Chemical Reaction ◽

Classification Accuracy ◽

Similarity Searching ◽

Reaction Space ◽

Fine Grained ◽

Visual Clustering ◽

Better Than

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>

Download Full-text

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Science Advances ◽

10.1126/sciadv.abe4166 ◽

2021 ◽

Vol 7 (15) ◽

pp. eabe4166

Author(s):

Philippe Schwaller ◽

Benjamin Hoover ◽

Jean-Louis Reymond ◽

Hendrik Strobelt ◽

Teodoro Laino

Keyword(s):

Organic Chemistry ◽

Neural Networks ◽

Chemical Synthesis ◽

Unsupervised Learning ◽

Chemical Reactions ◽

Data Driven ◽

Experimental Task ◽

Rule Based ◽

Atom Mapping ◽

Mapping Information

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.

Download Full-text

Tobacco Leaf Grading Based on Deep Convolutional Neural Networks and Machine Vision

Journal of the ASABE ◽

10.13031/ja.14537 ◽

2021 ◽

Vol 65 (1) ◽

pp. 11-22

Author(s):

Mengyao Lu ◽

Shuwen Jiang ◽

Cong Wang ◽

Dong Chen ◽

Tian’en Chen

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Classification Accuracy ◽

Classification Model ◽

List Type ◽

Tobacco Leaves ◽

Tobacco Leaf ◽

Grading Model

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading

Download Full-text

MONITORING AND CLASSIFYING THE STATE OF HARD DISKS USING RECURRENT NEURAL NETWORKS

Kontrol Diagnostika ◽

10.14489/td.2021.10.pp.036-043 ◽

2021 ◽

pp. 36-43

Author(s):

L. A. Demidova ◽

A. V. Filatov

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Experimental Studies ◽

Machine Learning Algorithms ◽

Classification Model ◽

Dense Layer ◽

Hard Disks ◽

Destructive Testing ◽

Data Set ◽

State Classification

The article considers an approach to solving the problem of monitoring and classifying the states of hard disks, which is solved on a regular basis, within the framework of the concept of non-destructive testing. It is proposed to solve this problem by developing a classification model using machine learning algorithms, in particular, using recurrent neural networks with Simple RNN, LSTM and GRU architectures. To develop a classification model, a data set based on the values of SMART sensors installed on hard disks it used. It represents a group of multidimensional time series. At the same time, the structure of the classification model contains two layers of a neural network with one of the recurrent architectures, as well as a Dropout layer and a Dense layer. The results of experimental studies confirming the advantages of LSTM and GRU architectures as part of hard disk state classification models are presented.

Download Full-text

Data-Driven Ship Propulsion Modeling with Artificial Neural Networks

10.5957/some-2021-011 ◽

2021 ◽

Author(s):

Pavlos Karagiannidis ◽

Nikolaos Themelis

Keyword(s):

Neural Networks ◽

Data Driven ◽

Actual Condition ◽

Data Set ◽

Prediction Ability ◽

Ship Propulsion ◽

Preparation Stage ◽

Shaft Power ◽

Feed Forward Neural Networks ◽

Reduction Of Emissions

The paper examines data-driven techniques for the modeling of ship propulsion that could support a strategy for the reduction of emissions and be utilized for the optimization of a fleet’s operations. A large, high-frequency and automated collected data set is exploited for producing models that estimate the required shaft power or main engine’s fuel consumption of a container ship sailing under arbitrary conditions. A variety of statistical calculations and algorithms for data processing are implemented and state-of-the-art techniques for training and optimizing Feed-Forward Neural Networks (FNNs) are applied. Emphasis is given in the pre-processing of the data and the results indicate that with a proper filtering and preparation stage it is possible to significantly increase the model’s accuracy. Thus, increase our prediction ability and our awareness regarding the ship's hull and propeller actual condition.

Download Full-text

A Deep Autoencoder-Based Convolution Neural Network Framework for Bearing Fault Classification in Induction Motors

Sensors ◽

10.3390/s21248453 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8453

Author(s):

Rafia Nishat Toma ◽

Farzin Piltan ◽

Jong-Myon Kim

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Classification Accuracy ◽

Classification Model ◽

Data Driven ◽

Fault Classification ◽

Current Signal ◽

Bearing Fault ◽

Motor Current ◽

Diagnosis And Classification

Fault diagnosis and classification for machines are integral to condition monitoring in the industrial sector. However, in recent times, as sensor technology and artificial intelligence have developed, data-driven fault diagnosis and classification have been more widely investigated. The data-driven approach requires good-quality features to attain good fault classification accuracy, yet domain expertise and a fair amount of labeled data are important for better features. This paper proposes a deep auto-encoder (DAE) and convolutional neural network (CNN)-based bearing fault classification model using motor current signals of an induction motor (IM). Motor current signals can be easily and non-invasively collected from the motor. However, the current signal collected from industrial sources is highly contaminated with noise; feature calculation thus becomes very challenging. The DAE is utilized for estimating the nonlinear function of the system with the normal state data, and later, the residual signal is obtained. The subsequent CNN model then successfully classified the types of faults from the residual signals. Our proposed semi-supervised approach achieved very high classification accuracy (more than 99%). The inclusion of DAE was found to not only improve the accuracy significantly but also to be potentially useful when the amount of labeled data is small. The experimental outcomes are compared with some existing works on the same dataset, and the performance of this proposed combined approach is found to be comparable with them. In terms of the classification accuracy and other evaluation parameters, the overall method can be considered as an effective approach for bearing fault classification using the motor current signal.

Download Full-text

Augmenting Labeled Probabilistic Topic Model for Web Service Classification

International Journal of Web Services Research ◽

10.4018/ijwsr.2019010105 ◽

2019 ◽

Vol 16 (1) ◽

pp. 93-113 ◽

Cited By ~ 3

Author(s):

Shengye Pang ◽

Guobing Zou ◽

Yanglan Gan ◽

Sen Niu ◽

Bofeng Zhang

Keyword(s):

Web Service ◽

Classification Accuracy ◽

Topic Model ◽

Service Providers ◽

Classification Model ◽

Model Learning ◽

Data Set ◽

Management Platform ◽

Probabilistic Topic Model ◽

Web Service Classification

Web service classification has become an urgent demand on service-oriented applications. Most existing classification algorithms mainly rely on the original service descriptions. That leads to low classification accuracy, since it cannot fully reflect the semantic feature specific to a service category. To solve the issue, this article proposes a novel approach for web service classification, including service topic feature extraction, service functionality augmentation, and service classification model learning. The characteristic is that the original service descriptions can be semantically augmented, which is fed to deriving a service classifier via labeled probabilistic topic model. A benefit from this approach is that it can be applied to an online service management platform, where it assists service providers to facilitate the registration process. Extensive experiments have been conducted on a large-scale real-world data set crawled from ProgrammableWeb. The results demonstrate that it outperforms state-of-the-art methods in terms of service classification accuracy and convergence speed.

Download Full-text