scholarly journals STOUT: SMILES to IUPAC names using neural machine translation

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

AbstractChemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.

2021 ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

<p>Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner.</p><p> </p><p>Here we present STOUT (<b>S</b>MILES-<b>TO</b>-I<b>U</b>PAC-name <b>t</b>ranslator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.</p>


2020 ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

<p>Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner.</p><p> </p><p>Here we present STOUT (<b>S</b>MILES-<b>TO</b>-I<b>U</b>PAC-name <b>t</b>ranslator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.</p>


2020 ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

<p>Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner.</p><p> </p><p>Here we present STOUT (<b>S</b>MILES-<b>TO</b>-I<b>U</b>PAC-name <b>t</b>ranslator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.</p>


Author(s):  
Preetham Ganesh ◽  
Bharat S. Rawal ◽  
Alexander Peter ◽  
Andi Giri

The interaction between human beings has always faced different kinds of difficulties. One of those difficulties is the language barrier. It would be a tedious task for someone to learn all the syllables in a new language in a short period and converse with a native speaker without grammatical errors. Moreover, having a language translator at all times would be intrusive and expensive. We propose a novel approach to Neural Machine Translation (NMT) system using interlanguage word similaritybased model training and Part-Of-Speech (POS) Tagging based model testing. We compare these approaches using two classical architectures: Luong Attention-based Sequence-to-Sequence architecture and Transformer based model. The sentences for the Luong Attention-based Sequence-to-Sequence were tokenized using SentencePiece tokenizer. The sentences for the Transformer model were tokenized using Subword Text Encoder. Three European languages were selected for modeling, namely, Spanish, French, and German. The datasets were downloaded from multiple sources such as Europarl Corpus, Paracrawl Corpus, and Tatoeba Project Corpus. Sparse Categorical CrossEntropy was the evaluation metric during the training stage, and during the testing stage, the Bilingual Evaluation Understudy (BLEU) Score, Precision Score, and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score were the evaluation metrics.


2021 ◽  
Vol 11 (1) ◽  
pp. 51
Author(s):  
Danhua Huang

Machine translation has grown very rapidly in recent times due to the developments in big data, artificial intelligence, and cloud computing software and techniques. The first generation of Rule-Based Machine Translation has been replaced by fourth generation of Neural Machine Translation based on complex deep learning and networking models. Translation models have also undergone tremendous changes. The traditional translation model fails to meet the needs of the modern language service industry. There still exists doubt whether Translation technologies could enhance the quality in the translation of non-technical texts. So far no one has discussed the application of the technologies in the translation of IMTFE Transcripts. This paper aims to prove that translation technologies can be applied to the translation of IMTFE Transcripts which the author works on. By analyzing the text features of IMTFE Transcripts, and applying different translation technologies in the translation, the author finds that the quality and efficiency of translation have been greatly enhanced. It is concluded that translation technologies can be used to facilitate translation of texts of different kinds. In spite of their drawbacks, translators can still benefit a lot from the adoption of translation technologies.


2021 ◽  
Vol 40 ◽  
pp. 03026
Author(s):  
Nilesh Shirsath ◽  
Aniruddha Velankar ◽  
Ranjeet Patil ◽  
Shilpa Shinde

Machine Translation (MT) is a generic term for computerised systems that generate translations from one natural language to another, with or without human intervention. Text may be used to examine knowledge, and turning that information into pictures helps people to communicate and acquire information.There seems to be a lot of work conducted on translating English to Hindi, Tamil, Bangla and other languages. The important parts of translation are to provide translated sentences with correct words and proper grammar. There has been a comprehensive review of 10 primary publications used in research. Two separate approaches are proposed, one uses rule based approach and other uses neural-machine translation approach to translate basic Marathi phrases to English. While designed primarily for Marathi-English language pairs, the design can be applied to other language pairs with a similar structure.


Author(s):  
Caio César Moraes de Oliveira ◽  
Thaís Gaudencio do Rêgo ◽  
Manuella Aschoff Cavalcanti Brandão Lima ◽  
Tiago Maritan Ugulino de Araújo

Sign in / Sign up

Export Citation Format

Share Document