scholarly journals Identifying Structure-Property Relationships through SMILES Syntax Analysis With Self-Attention Mechanism

Author(s):  
Shuangjia Zheng ◽  
Xin Yan ◽  
Yuedong Yang ◽  
Jun Xu

<p>Recognizing substructures and their relations embedded in a molecular structure representation is a key process for <a></a><a>structure-activity</a> or structure-property relationship (SAR/SPR) studies. A molecular structure can be either explicitly represented as a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical property, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance comparing with state-of-art methods. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design, and synthesize the activity/property improved compounds.</p>

2018 ◽  
Author(s):  
Shuangjia Zheng ◽  
Xin Yan ◽  
Yuedong Yang ◽  
Jun Xu

<p>Recognizing substructures and their relations embedded in a molecular structure representation is a key process for <a></a><a>structure-activity</a> or structure-property relationship (SAR/SPR) studies. A molecular structure can be either explicitly represented as a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical property, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance comparing with state-of-the-art methods. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design, and synthesize the activity/property improved compounds.</p>


2018 ◽  
Author(s):  
Shuangjia Zheng ◽  
Xin Yan ◽  
Yuedong Yang ◽  
Jun Xu

<p>Recognizing substructures and their relations embedded in a molecular structure representation is a key process for <a></a><a>structure-activity</a> or structure-property relationship (SAR/SPR) studies. A molecular structure can be either explicitly represented as a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical property, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance comparing with state-of-the-art methods. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design, and synthesize the activity/property improved compounds.</p>


Nanomaterials ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. 2188
Author(s):  
Pingping Jiang ◽  
Pascal Boulet ◽  
Marie-Christine Record

Two-dimensional MX (M = Ga, In; X = S, Se, Te) homo- and heterostructures are of interest in electronics and optoelectronics. Structural, electronic and optical properties of bulk and layered MX and GaX/InX heterostructures have been investigated comprehensively using density functional theory (DFT) calculations. Based on the quantum theory of atoms in molecules, topological analyses of bond degree (BD), bond length (BL) and bond angle (BA) have been detailed for interpreting interatomic interactions, hence the structure–property relationship. The X–X BD correlates linearly with the ratio of local potential and kinetic energy, and decreases as X goes from S to Te. For van der Waals (vdW) homo- and heterostructures of GaX and InX, a cubic relationship between microscopic interatomic interaction and macroscopic electromagnetic behavior has been established firstly relating to weighted absolute BD summation and static dielectric constant. A decisive role of vdW interaction in layer-dependent properties has been identified. The GaX/InX heterostructures have bandgaps in the range 0.23–1.49 eV, absorption coefficients over 10−5 cm−1 and maximum conversion efficiency over 27%. Under strain, discordant BD evolutions are responsible for the exclusively distributed electrons and holes in sublayers of GaX/InX. Meanwhile, the interlayer BA adjustment with lattice mismatch explains the constraint-free lattice of the vdW heterostructure.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
J. Jesús Naveja ◽  
B. Angélica Pilón-Jiménez ◽  
Jürgen Bajorath ◽  
José L. Medina-Franco

Abstract Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied “single molecule–single scaffold” correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure–property relationships, and identification of analog series and ASBS. The molecule–core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure–property relationships analyses.


2012 ◽  
Vol 90 (8) ◽  
pp. 640-651
Author(s):  
Jing Song ◽  
Ying Zhang ◽  
Hui Hu ◽  
Hui Zhang ◽  
Lin Lin ◽  
...  

Quantitative structure–property relationship (QSPR) studies were performed for the prediction of gas-phase reduced ion mobility constants (K0) of diverse compounds based on three-dimensional (3D) molecular structure representation. The entire set of 159 compounds was divided into a training set of 120 compounds and a test set of 39 compounds according to Kennard and Stones algorithm. Multiple linear regression (MLR) analysis was employed to select the best subset of descriptors and to build linear models, whereas nonlinear models were developed by means of an artificial neural network (ANN). The obtained models with five descriptors involved show good predictive power for the test set: a squared correlation coefficient (R2) of 0.9029 and a standard error of estimation (s) of 0.0549 were achieved by the MLR model, whereas by the ANN model, R2 and s were 0.9292 and 0.496, respectively. The results of this study compare favorably to previously reported prediction methods for the ion mobility constants. In addition, the descriptors used in the models are discussed with respect to the structural features governing the mobility of the compounds.


Sign in / Sign up

Export Citation Format

Share Document