scholarly journals QSPR modelling of the octanol/water partition coefficient of organometallic substances by optimal SMILES-based descriptors

2009 ◽  
Vol 7 (4) ◽  
pp. 846-856 ◽  
Author(s):  
Andrey Toropov ◽  
Alla Toropova ◽  
Emilio Benfenati

AbstractUsually, QSPR is not used to model organometallic compounds. We have modeled the octanol/water partition coefficient for organometallic compounds of Na, K, Ca, Cu, Fe, Zn, Ni, As, and Hg by optimal descriptors calculated with simplified molecular input line entry system (SMILES) notations. The best model is characterized by the following statistics: n=54, r2=0.9807, s=0.677, F=2636 (training set); n=26, r2=0.9693, s=0.969, F=759 (test set). Empirical criteria for the definition of the applicability domain for these models are discussed.

2010 ◽  
Vol 8 (5) ◽  
pp. 1047-1052 ◽  
Author(s):  
A.A. Toropov ◽  
A.P. Toropova ◽  
E. Benfenati

AbstractPredictive quantitative structure - property relationships (QSPR) have been established for normal boiling points and octanol/water partition coefficient for acyclic and cyclic hydrocarbons using optimal descriptors calculated with simplified molecular input line entry system (SMILES). The probabilistic criteria for a rational definition of the domain of applicability of these models are discussed.


2011 ◽  
Vol 9 (1) ◽  
pp. 165-174 ◽  
Author(s):  
Alla Toropova ◽  
Andrey Toropov ◽  
Rodolfo Diaza ◽  
Emilio Benfenati ◽  
Guesippina Gini

AbstractTo validate QSAR models an external test set is increasingly used. However the definition of the compounds for the test set is still debated. We studied, co-evolutions of correlations between optimal descriptors and carcinogenicity (pTD50) for the subtraining, calibration, and test set. Weak correlations for the sub-training set are sometimes accompanied by quite good correlations for the external test set. This can be explained in terms of the probability theory and can help define a suitable test set. The simplified molecular input line entry system (SMILES) was used to represent the molecular structure. Correlation weights for calculating the optimal descriptors are related to fragments of the SMILES. The statistical quality of the model is: n=170, r2=0.6638, q2=0.6554, s=0.828, F=331 (sub-training set); n=170, r2=0.6609, r2pred=0.6520, s=0.825, F=331 (calibration set); and n=61, r2=0.7796, r2pred=0.7658, Rm2=0.7448, s=0.563, F=221 (test set). The calculations were done with CORAL software (http://www.insilico.eu/coral/).


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Mohammad Haekal ◽  
Henki Bayu Seta ◽  
Mayanda Mega Santoni

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.


2021 ◽  
Vol 11 (5) ◽  
pp. 2039
Author(s):  
Hyunseok Shin ◽  
Sejong Oh

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Nadin Ulrich ◽  
Kai-Uwe Goss ◽  
Andrea Ebert

AbstractToday more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.


2008 ◽  
Vol 68 (5-6) ◽  
pp. 415-419 ◽  
Author(s):  
Wan Aini Wan Ibrahim ◽  
Dadan Hermawan ◽  
Mohamed Noor Hasan ◽  
Hassan Y. Aboul Enein ◽  
M. Marsin Sanagi

2010 ◽  
Vol 121-122 ◽  
pp. 574-578
Author(s):  
Hui Yu Jiang ◽  
Min Dong ◽  
Wei Li

The octanol / water partition coefficient (Kow) is an important physical parameters to describe their behavior in the environment. However, because of some reasons, it is difficult to determine the octanol / water partition coefficient of each compound accurately. In this paper, we will introduce RBF neural network and molecular bond connectivity index to forecast the solubility of organic compounds in water. The result is better using the BP network to predict, the correlation coefficient has achieved 0.998, the prediction error in the permission scope.


Author(s):  
Rui Guo ◽  
Xiaobin Hu ◽  
Haoming Song ◽  
Pengpeng Xu ◽  
Haoping Xu ◽  
...  

Abstract Purpose To develop a weakly supervised deep learning (WSDL) method that could utilize incomplete/missing survival data to predict the prognosis of extranodal natural killer/T cell lymphoma, nasal type (ENKTL) based on pretreatment 18F-FDG PET/CT results. Methods One hundred and sixty-seven patients with ENKTL who underwent pretreatment 18F-FDG PET/CT were retrospectively collected. Eighty-four patients were followed up for at least 2 years (training set = 64, test set = 20). A WSDL method was developed to enable the integration of the remaining 83 patients with incomplete/missing follow-up information in the training set. To test generalization, these data were derived from three types of scanners. Prediction similarity index (PSI) was derived from deep learning features of images. Its discriminative ability was calculated and compared with that of a conventional deep learning (CDL) method. Univariate and multivariate analyses helped explore the significance of PSI and clinical features. Results PSI achieved area under the curve scores of 0.9858 and 0.9946 (training set) and 0.8750 and 0.7344 (test set) in the prediction of progression-free survival (PFS) with the WSDL and CDL methods, respectively. PSI threshold of 1.0 could significantly differentiate the prognosis. In the test set, WSDL and CDL achieved prediction sensitivity, specificity, and accuracy of 87.50% and 62.50%, 83.33% and 83.33%, and 85.00% and 75.00%, respectively. Multivariate analysis confirmed PSI to be an independent significant predictor of PFS in both the methods. Conclusion The WSDL-based framework was more effective for extracting 18F-FDG PET/CT features and predicting the prognosis of ENKTL than the CDL method.


Sign in / Sign up

Export Citation Format

Share Document