STOUT: SMILES to IUPAC names using neural machine translation

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.

Download Full-text

STOUT: SMILES to IUPAC Names Using Neural Machine Translation

10.26434/chemrxiv.13469202.v1 ◽

2020 ◽

Author(s):

Kohulan Rajan ◽

Achim Zielesny ◽

Christoph Steinbeck

Keyword(s):

Machine Translation ◽

Chemical Compounds ◽

Test Accuracy ◽

Human Beings ◽

International Union ◽

Rule Based ◽

Neural Machine Translation ◽

Reverse Translation ◽

Rule Set ◽

String Representation

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.

Download Full-text

STOUT: SMILES to IUPAC Names Using Neural Machine Translation

10.26434/chemrxiv.13469202 ◽

2020 ◽

Author(s):

Kohulan Rajan ◽

Achim Zielesny ◽

Christoph Steinbeck

Keyword(s):

Machine Translation ◽

Chemical Compounds ◽

Test Accuracy ◽

Human Beings ◽

International Union ◽

Rule Based ◽

Neural Machine Translation ◽

Reverse Translation ◽

Rule Set ◽

String Representation

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.

Download Full-text

A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings

Technometrics ◽

10.1198/004017002317375064 ◽

2002 ◽

Vol 44 (2) ◽

pp. 110-119 ◽

Cited By ~ 87

Author(s):

Michael A Fligner ◽

Joseph S Verducci ◽

Paul E Blower

Keyword(s):

Similarity Index ◽

Chemical Compounds ◽

Tanimoto Similarity ◽

Binary Strings ◽

Selection Of

Download Full-text

Improving Neural Machine Translation Using Rule-Based Machine Translation

2019 7th International Conference on Smart Computing & Communications (ICSCC) ◽

10.1109/icscc.2019.8843685 ◽

2019 ◽

Author(s):

Muskaan Singh ◽

Ravinder Kumar ◽

Inderveer Chana

Keyword(s):

Machine Translation ◽

Rule Based ◽

Neural Machine Translation

Download Full-text

POS-Tagging based Neural Machine Translation System for European Languages using Transformers

WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS ◽

10.37394/23209.2021.18.5 ◽

2021 ◽

Vol 18 ◽

pp. 26-33

Author(s):

Preetham Ganesh ◽

Bharat S. Rawal ◽

Alexander Peter ◽

Andi Giri

Keyword(s):

Machine Translation ◽

Native Speaker ◽

Translation System ◽

Human Beings ◽

Multiple Sources ◽

Neural Machine Translation ◽

Pos Tagging ◽

European Languages ◽

Short Period ◽

Testing Stage

The interaction between human beings has always faced different kinds of difficulties. One of those difficulties is the language barrier. It would be a tedious task for someone to learn all the syllables in a new language in a short period and converse with a native speaker without grammatical errors. Moreover, having a language translator at all times would be intrusive and expensive. We propose a novel approach to Neural Machine Translation (NMT) system using interlanguage word similaritybased model training and Part-Of-Speech (POS) Tagging based model testing. We compare these approaches using two classical architectures: Luong Attention-based Sequence-to-Sequence architecture and Transformer based model. The sentences for the Luong Attention-based Sequence-to-Sequence were tokenized using SentencePiece tokenizer. The sentences for the Transformer model were tokenized using Subword Text Encoder. Three European languages were selected for modeling, namely, Spanish, French, and German. The datasets were downloaded from multiple sources such as Europarl Corpus, Paracrawl Corpus, and Tatoeba Project Corpus. Sparse Categorical CrossEntropy was the evaluation metric during the training stage, and during the testing stage, the Bilingual Evaluation Understudy (BLEU) Score, Precision Score, and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score were the evaluation metrics.

Download Full-text

Application of Translation Technologies in the Translation of IMTFE Transcripts

English Language and Literature Studies ◽

10.5539/ells.v11n1p51 ◽

2021 ◽

Vol 11 (1) ◽

pp. 51

Author(s):

Danhua Huang

Keyword(s):

Artificial Intelligence ◽

Machine Translation ◽

First Generation ◽

Service Industry ◽

Fourth Generation ◽

Rule Based ◽

Neural Machine Translation ◽

Translation Model ◽

Modern Language ◽

Text Features

Machine translation has grown very rapidly in recent times due to the developments in big data, artificial intelligence, and cloud computing software and techniques. The first generation of Rule-Based Machine Translation has been replaced by fourth generation of Neural Machine Translation based on complex deep learning and networking models. Translation models have also undergone tremendous changes. The traditional translation model fails to meet the needs of the modern language service industry. There still exists doubt whether Translation technologies could enhance the quality in the translation of non-technical texts. So far no one has discussed the application of the technologies in the translation of IMTFE Transcripts. This paper aims to prove that translation technologies can be applied to the translation of IMTFE Transcripts which the author works on. By analyzing the text features of IMTFE Transcripts, and applying different translation technologies in the translation, the author finds that the quality and efficiency of translation have been greatly enhanced. It is concluded that translation technologies can be used to facilitate translation of texts of different kinds. In spite of their drawbacks, translators can still benefit a lot from the adoption of translation technologies.

Download Full-text

Various Approaches of Machine Translation for Marathi to English Language

ITM Web of Conferences ◽

10.1051/itmconf/20214003026 ◽

2021 ◽

Vol 40 ◽

pp. 03026

Author(s):

Nilesh Shirsath ◽

Aniruddha Velankar ◽

Ranjeet Patil ◽

Shilpa Shinde

Keyword(s):

Natural Language ◽

Machine Translation ◽

English Language ◽

Human Intervention ◽

Rule Based ◽

Neural Machine Translation ◽

Comprehensive Review ◽

Generic Term ◽

Rule Based Approach

Machine Translation (MT) is a generic term for computerised systems that generate translations from one natural language to another, with or without human intervention. Text may be used to examine knowledge, and turning that information into pictures helps people to communicate and acquire information.There seems to be a lot of work conducted on translating English to Hindi, Tamil, Bangla and other languages. The important parts of translation are to provide translated sentences with correct words and proper grammar. There has been a comprehensive review of 10 primary publications used in research. Two separate approaches are proposed, one uses rule based approach and other uses neural-machine translation approach to translate basic Marathi phrases to English. While designed primarily for Marathi-English language pairs, the design can be applied to other language pairs with a similar structure.

Download Full-text