scholarly journals Neuraldecipher - Reverse-Engineering ECFP Fingerprints to Their Molecular Structures

Author(s):  
Tuan Le ◽  
Robin Winter ◽  
Frank Noé ◽  
Djork-Arné Clevert

<p>Protecting molecular structures from disclosure against external parties is of great relevance for industrial and private associations, such as pharmaceutical companies. Within the framework of external collaborations, it is common to exchange datasets by encoding the molecular structures into descriptors. Molecular fingerprints such as the extended-connectivity fingerprints are frequently used for such an exchange, because they typically perform well on quantitative structure-activity relationship tasks. </p><p>ECFPs are often considered to be non-invertible due to the way they are computed.</p><p>In this paper, we present a reverse-engineering method to deduce the molecular structure given revealed ECFPs. Our method includes the <i>Neuraldecipher</i>, a neural network model that predicts a compact vector representation of compounds, given ECFPs. We then utilize another pre-trained model to retrieve the molecular structure as SMILES representation. We demonstrate that our method is able to reconstruct molecular structures to some extent, and improves, when ECFPs with larger fingerprint sizes are revealed. For example, given ECFP count vectors of length 4096, we are able to correctly deduce around 60% of molecular structures on a validation set (112K unique samples) with our method.</p>

2020 ◽  
Author(s):  
Tuan Le ◽  
Robin Winter ◽  
Frank Noé ◽  
Djork-Arné Clevert

<p>Protecting molecular structures from disclosure against external parties is of great relevance for industrial and private associations, such as pharmaceutical companies. Within the framework of external collaborations, it is common to exchange datasets by encoding the molecular structures into descriptors. Molecular fingerprints such as the extended-connectivity fingerprints are frequently used for such an exchange, because they typically perform well on quantitative structure-activity relationship tasks. </p><p>ECFPs are often considered to be non-invertible due to the way they are computed.</p><p>In this paper, we present a reverse-engineering method to deduce the molecular structure given revealed ECFPs. Our method includes the <i>Neuraldecipher</i>, a neural network model that predicts a compact vector representation of compounds, given ECFPs. We then utilize another pre-trained model to retrieve the molecular structure as SMILES representation. We demonstrate that our method is able to reconstruct molecular structures to some extent, and improves, when ECFPs with larger fingerprint sizes are revealed. For example, given ECFP count vectors of length 4096, we are able to correctly deduce around 60% of molecular structures on a validation set (112K unique samples) with our method.</p>


2020 ◽  
Author(s):  
Tuan Le ◽  
Robin Winter ◽  
Frank Noé ◽  
Djork-Arné Clevert

<p>Protecting molecular structures from disclosure against external parties is of great relevance for industrial and private associations, such as pharmaceutical companies. Within the framework of external collaborations, it is common to exchange datasets by encoding the molecular structures into descriptors. Molecular fingerprints such as the extended-connectivity fingerprints are frequently used for such an exchange, because they typically perform well on quantitative structure-activity relationship tasks. </p><p>ECFPs are often considered to be non-invertible due to the way they are computed.</p><p>In this paper, we present a reverse-engineering method to deduce the molecular structure given revealed ECFPs. Our method includes the <i>Neuraldecipher</i>, a neural network model that predicts a compact vector representation of compounds, given ECFPs. We then utilize another pre-trained model to retrieve the molecular structure as SMILES representation. We demonstrate that our method is able to reconstruct molecular structures to some extent, and improves, when ECFPs with larger fingerprint sizes are revealed. For example, given ECFP count vectors of length 4096, we are able to correctly deduce around 60% of molecular structures on a validation set (112K unique samples) with our method.</p>


2020 ◽  
Vol 11 (38) ◽  
pp. 10378-10389
Author(s):  
Tuan Le ◽  
Robin Winter ◽  
Frank Noé ◽  
Djork-Arné Clevert

Protecting molecular structures from disclosure against external parties is of great relevance for industrial and private associations, such as pharmaceutical companies.


Molecules ◽  
2019 ◽  
Vol 25 (1) ◽  
pp. 24 ◽  
Author(s):  
Edgar Márquez ◽  
José R. Mora ◽  
Virginia Flores-Morales ◽  
Daniel Insuasty ◽  
Luis Calle

The antileukemia cancer activity of organic compounds analogous to ellipticine representes a critical endpoint in the understanding of this dramatic disease. A molecular modeling simulation on a dataset of 23 compounds, all of which comply with Lipinski’s rules and have a structure analogous to ellipticine, was performed using the quantitative structure activity relationship (QSAR) technique, followed by a detailed docking study on three different proteins significantly involved in this disease (PDB IDs: SYK, PI3K and BTK). As a result, a model with only four descriptors (HOMO, softness, AC1RABAMBID, and TS1KFABMID) was found to be robust enough for prediction of the antileukemia activity of the compounds studied in this work, with an R2 of 0.899 and Q2 of 0.730. A favorable interaction between the compounds and their target proteins was found in all cases; in particular, compounds 9 and 22 showed high activity and binding free energy values of around −10 kcal/mol. Theses compounds were evaluated in detail based on their molecular structure, and some modifications are suggested herein to enhance their biological activity. In particular, compounds 22_1, 22_2, 9_1, and 9_2 are indicated as possible new, potent ellipticine derivatives to be synthesized and biologically tested.


Author(s):  
Ranita Pal ◽  
Pratim Kumar Chattaraj

In the current pandemic-stricken world, quantitative structure-activity relationship (QSAR) analysis has become a necessity in the domain of molecular biology and drug design, realizing that it helps estimate properties and activities of a compound, without actually having to spend time and resources to synthesize it in the laboratory. Correlating the molecular structure of a compound with its activity depends on the choice of the descriptors, which becomes a difficult and confusing task when we have so many to choose from. In this mini-review, the authors delineate the importance of very simple and easy to compute descriptors in estimating various molecular properties/toxicity.


2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Abaid ur Rehman Virk ◽  
M. A. Rehman ◽  
Ce Shi ◽  
Waqas Nazeer

Topological indices give us a mathematical language to study molecular structures. They convert a chemical compound into a single number which foresees properties, for example, boiling points, viscosity, and the radius of gyrations. Drugs and other chemical compounds are often modeled as various polygonal shapes, trees, and graphs. In this paper, we will compute some irregularity indices for bismuth tri-iodide chain and sheet that are useful in the quantitative structure-activity relationship.


Author(s):  
Mabrouk Hamadache ◽  
Abdeltif Amrane ◽  
Salah Hanini ◽  
Othmane Benkortbi

Quantitative Structure Activity Relationship (QSAR) models are expected to play an important role in the risk assessment of chemicals on humans and the environment. In this study, a QSAR model based on 10 molecular descriptors to predict acute oral toxicity of 91 fungicides to rats was developed and validated. Good results (PRESS/SSY = 0.085 and VIF < 5) were obtained, showing the validation of descriptors in the obtained model. The best results were obtained with a 10/11/1 Artificial Neural Network model trained with the Levenberg-Marquardt algorithm. The prediction accuracy for the external validation set was estimated by the Q2ext which was equal to 0.960. Accordingly, the model developed in this study provided excellent predictions and can be used to predict the acute oral toxicity of fungicides, particularly for those that have not been tested as well as new fungicides.


2015 ◽  
Vol 17 (8) ◽  
pp. 1370-1376 ◽  
Author(s):  
S. Jammer ◽  
D. Rizkov ◽  
F. Gelman ◽  
O. Lev

The enantiomeric enrichment caused by enzymatic enantioselective hydrolysis is studied for a homologous series, revealing a correlation between substrate molecular features and the Rayleigh enantiomeric enrichment factor,εER.


Sign in / Sign up

Export Citation Format

Share Document