Exploratory and machine learning analysis of the stability constants of HgII- triazene ligands complexes

2021 ◽  
pp. 1-13
Author(s):  
Ahmadreza Hajihosseinloo ◽  
Maryam Salahinejad ◽  
Mohammad Kazem Rofouei ◽  
Jahan B. Ghasemi

Knowing stability constants for the complexes HgII with extracting ligands is very important from environmental and therapeutic standpoints. Since the selectivity of ligands can be stated by the stability constants of cation–ligand complexes, quantitative structure–property relationship (QSPR) investigations on binding constant of HgII complexes were done. Experimental data of the stability constants in ML2 complexation of HgII and synthesized triazene ligands were used to construct and develop QSPR models. Support vector machine (SVM) and multiple linear regression (MLR) have been employed to create the QSPR models. The final model showed squared correlation coefficient of 0.917 and the standard error of calibration (SEC) value of 0.141 log K units. The proposed model presented accurate prediction with the Leave-One-Out cross validation ( Q LOO 2  = 0.756) and validated using Y-randomization and external test set. Statistical results demonstrated that the proposed models had suitable goodness of fit, predictive ability, and robustness. The results revealed the importance of charge effects and topological properties of ligand in HgII - triazene complexation.

2011 ◽  
Vol 233-235 ◽  
pp. 2536-2540
Author(s):  
Xuan Chen ◽  
Chang Ming Nie ◽  
Song Nian Wen

A new molecular quantum topological index QT was constructed by molecular topological methods and quantum mechanics (QM), which together with Gibbs free energy(G), Constant volume mole hot melting(CV) that were calculated by density functional theory (DFT) at the B3LYP/6-31G(d) level of theory for mercaptans. Index QT can not only efficiently distinguish molecular structures of mercaptans, but also possess good applications of QSPR/QSAR (quantitative structure-property/activity relationships). And most of the correlation coefficients of the models were over 0.99. The LOO CV (leave-one-out cross-validation) method was used to testify the stability and predictive ability of the models. The validation results verified the good stability and predictive ability of the models employing the cross-validation parameters: RCV, SCVand FCV, which demonstrated the wide potential of the index QT for applications to QSPR/ QSAR.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Li Wen ◽  
Qing Li ◽  
Wei Li ◽  
Qiao Cai ◽  
Yong-Ming Cai

Hydroxyl benzoic esters are preservative, being widely used in food, medicine, and cosmetics. To explore the relationship between the molecular structure and antibacterial activity of these compounds and predict the compounds with similar structures, Quantitative Structure-Activity Relationship (QSAR) models of 25 kinds of hydroxyl benzoic esters with the quantum chemical parameters and molecular connectivity indexes are built based on support vector machine (SVM) by using R language. The External Standard Deviation Error of Prediction (SDEPext), fitting correlation coefficient (R2), and leave-one-out cross-validation (Q2LOO) are used to value the reliability, stability, and predictive ability of models. The results show that R2 and Q2LOO of 4 kinds of nonlinear models are more than 0.6 and SDEPext is 0.213, 0.222, 0.189, and 0.218, respectively. Compared with the multiple linear regression (MLR) model (R2=0.421, RSD = 0.260), the correlation coefficient and the standard deviation are both better than MLR. The reliability, stability, robustness, and external predictive ability of models are good, particularly of the model of linear kernel function and eps-regression type. This model can predict the antimicrobial activity of the compounds with similar structure in the applicability domain.


2016 ◽  
Vol 15 (02) ◽  
pp. 1650011 ◽  
Author(s):  
Xinliang Yu ◽  
Xianwei Huang

The glass transition temperature [Formula: see text] is the most important parameter of an amorphous polymer. A quantitative structure-property relationship (QSPR) was developed for [Formula: see text]s of 82 polyacrylates, by applying stepwise multiple linear regression (MLR) analysis. Molecular descriptors used to describe polymer structures were, for the first time, calculated from the motion units of polymer backbones, which are chain segments with 20 carbons in length (10 repeating units). After internal validation with leave-one-out (LOO) method, external validation was carried out to test the stability of the MLR model of [Formula: see text]s. Compared to the models already published in the literature, the MLR model in this paper was accurate and acceptable, although our model was based on bigger data sets. The feasibility of calculating molecular descriptors from the motion units of polymer backbones for developing [Formula: see text] models of polyacrylates has been demonstrated.


2021 ◽  
Author(s):  
Ming Cai Zhang ◽  
Hong Lin Zhai ◽  
Ke Xin Bi ◽  
Bin Qiang Zhao ◽  
Hai Ping Shao

Abstract Biomagnification factor (BMF) is an important index of pollutants in food chains but its experimental determination is quite tedious. In this contribution, as the feature descriptors of molecular information, Tchebichef moments (TMs) were calculated from their structural images. Then stepwise regression was employed to establish the prediction model for the logBMF of organochlorine pollutants. The correlation coefficient with leave-one-out cross-validation (Rcv) was 0.9570 and the correlation coefficient of prediction (Rp) for external independent test set was 0.9594. Compared with traditional two-dimensional (2D) quantitative structure-property relationship (QSPR) and the reported augmented multivariate image analysis applied to QSPR (aug-MIA-QSPR), the proposed approach is more simple, accurate and reliable. This study not only obtained the model with better stability and predictive ability for the BMF of organochlorine pollutants, but also provided another effective approach to QSPR research.


Author(s):  
Khalid Bouhedjar ◽  
Abdelmalek Khorief Nacereddine ◽  
Hamida Ghorab ◽  
Abdelhafid Djerourou

The simplified molecular input line entry system (SMILES) is particularly suitable for high-speed machine processing, based on the Monte Carlo method using CORAL software. Quantitative structure-property relationships (QSPR) of critical temperatures have been established using a dataset of 165 diverse organic compounds employing hybrid optimal descriptors defined by graph and SMILES notation. External validation is one of the most important parts in the evaluation of model performance. However, previous models on the same dataset have poor predictive power in the external test set, or the authors had not done that check. In the present work, the predictive ability of model has been tested using external validation. The statistical quality of the three splits are similar and good. The r2 values for the best model are: r2 = 0.98 for the training set, r2 = 0.95 for the calibration set, and r2 = 0.94 for the validation set.


2020 ◽  
Vol 2020 ◽  
pp. 1-6 ◽  
Author(s):  
Yuantian Sun ◽  
Guichen Li ◽  
Junfei Zhang ◽  
Junbo Sun ◽  
Jiahui Xu

Cemented paste backfill (CPB) is an eco-friendly composite containing mine waste or tailings and has been widely used as construction materials in underground stopes. In the field, the uniaxial compressive strength (UCS) of CPB is critical as it is closely related to the stability of stopes. Predicting the UCS of CPB using traditional mathematical models is far from being satisfactory due to the highly nonlinear relationships between the UCS and a large number of influencing variables. To solve this problem, this study uses a support vector machine (SVM) to predict the UCS of CPB. The hyperparameters of the SVM model are tuned using the beetle antennae search (BAS) algorithm; then, the model is called BSVM. The BSVM is then trained on a dataset collected from the experimental results. To explain the importance of each input variable on the UCS of CPB, the variable importance is obtained using a sensitivity study with the BSVM as the objective function. The results show that the proposed BSVM has high prediction accuracy on the test set with a high correlation coefficient (0.97) and low root-mean-square error (0.27 MPa). The proposed model can guide the design of CPB during mining.


Molecules ◽  
2020 ◽  
Vol 25 (17) ◽  
pp. 3772
Author(s):  
Meade E. Erickson ◽  
Marvellous Ngongang ◽  
Bakhtiyor Rasulev

Predicting the activities and properties of materials via in silico methods has been shown to be a cost- and time-effective way of aiding chemists in synthesizing materials with desired properties. Refractive index (n) is one of the most important defining characteristics of an optical material. Presented in this work is a quantitative structure–property relationship (QSPR) model that was developed to predict the refractive index for a diverse set of polymers. A number of models were created, where a four-variable model showed the best predictive performance with R2 = 0.904 and Q2LOO = 0.897. The robustness and predictability of the best model was validated using the leave-one-out technique, external set and y-scrambling methods. The predictive ability of the model was confirmed with the external set, showing the R2ext = 0.880. For the refractive index, the ionization potential, polarizability, 2D and 3D geometrical descriptors were the most influential properties. The developed model was transparent and mechanistically explainable and can be used in the prediction of the refractive index for new and untested polymers.


2020 ◽  
Vol 12 (3) ◽  
pp. 1063 ◽  
Author(s):  
Dieu Tien Bui ◽  
Ataollah Shirzadi ◽  
Ata Amini ◽  
Himan Shahabi ◽  
Nadhir Al-Ansari ◽  
...  

Local scour depth at complex piers (LSCP) cause expensive costs when constructing bridges. In this study, a hybrid artificial intelligence approach of random subspace (RS) meta classifier, based on the reduced error pruning tree (REPTree) base classifier, namely RS-REPTree, was proposed to predict the LSCP. A total of 122 laboratory datasets were used and portioned into training (70%: 85 cases) and validation (30%: 37 cases) datasets for modeling and validation processes, respectively. The statistical metrics such as mean absolute error (MAE), root mean squared error (RMSE), correlation coefficient (R), and Taylor diagram were used to check the goodness-of-fit and performance of the proposed model. The capability of this model was assessed and compared with four state-of-the-art soft-computing benchmark algorithms, including artificial neural network (ANN), support vector machine (SVM), M5P, and REPTree, along with two empirical models, including the Florida Department of Transportation (FDOT) and Hydraulic Engineering Circular No. 18 (HEC-18). The findings showed that machine learning algorithms had the highest goodness-of-fit and prediction accuracy (0.885 < R < 0.945) in comparison to the other models. The results of sensitivity analysis by the proposed model indicated that pile cap location (Y) was a more sensitive factor for LSCP among other factors. The result also depicted that the RS-REPTree ensemble model (R = 0.945) could well enhance the prediction power of the REPTree base classifier (R = 0.885). Therefore, the proposed model can be useful as a promising technique to predict the LSCP.


2020 ◽  
Vol 69 (11-12) ◽  
pp. 611-630
Author(s):  
Mohammed Moussaoui ◽  
Maamar Laidi ◽  
Salah Hanini ◽  
Mohamed Hentabli

In this study, the solubility of 145 solid solutes in supercritical CO&lt;sub&gt;2&lt;/sub&gt; (scCO&lt;sub&gt;2&lt;/sub&gt;) was correlated using computational intelligence techniques based on Quantitative Structure-Property Relationship (QSPR) models. A database of 3637 solubility values has been collected from previously published papers. Dragon software was used to calculate molecular descriptors of 145 solid systems. The genetic algorithm (GA) was implemented to optimise the subset of the significantly contributed descriptors. The overall average absolute relative deviation MAARD of about 1.345 % between experimental and calculated values by support vector regress SVR-QSPR model was obtained to predict the solubility of 145 solid solutes in supercritical CO&lt;sub&gt;2&lt;/sub&gt;, which is better than that obtained using ANN-QSPR model of 2.772 %. The results show that the developed SVR-QSPR model is more accurate and can be used as an alternative powerful modelling tool for QSAR studies of the solubility of solid solutes in supercritical carbon dioxide (scCO&lt;sub&gt;2&lt;/sub&gt;). The accuracy of the proposed model was evaluated using statistical analysis by comparing the results with other models reported in the literature.


2020 ◽  
Vol 10 (11) ◽  
pp. 3772 ◽  
Author(s):  
Sunil Saha ◽  
Anik Saha ◽  
Tusar Kanti Hembram ◽  
Biswajeet Pradhan ◽  
Abdullah M. Alamri

Landslides are known as the world’s most dangerous threat in mountainous regions and pose a critical obstacle for both economic and infrastructural progress. It is, therefore, quite relevant to discuss the pattern of spatial incidence of this phenomenon. The current research manifests a set of individual and ensemble of machine learning and probabilistic approaches like an artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LR), and their ensembles such as ANN-RF, ANN-SVM, SVM-RF, SVM-LR, LR-RF, LR-ANN, ANN-LR-RF, ANN-RF-SVM, ANN-SVM-LR, RF-SVM-LR, and ANN-RF-SVM-LR for mapping landslide susceptibility in Rudraprayag district of Garhwal Himalaya, India. A landslide inventory map along with sixteen landslide conditioning factors (LCFs) was used. Randomly partitioned sets of 70%:30% were used to ascertain the goodness of fit and predictive ability of the models. The contribution of LCFs was analyzed using the RF model. The altitude and drainage density were found to be the responsible factors in causing the landslide in the study area according to the RF model. The robustness of models was assessed through three threshold dependent measures, i.e., receiver operating characteristic (ROC), precision and accuracy, and two threshold independent measures, i.e., mean-absolute-error (MAE) and root-mean-square-error (RMSE). Finally, using the compound factor (CF) method, the models were prioritized based on the results of the validation methods to choose best model. Results show that ANN-RF-LR indicated a realistic finding, concentrating only on 17.74% of the study area as highly susceptible to landslide. The ANN-RF-LR ensemble demonstrated the highest goodness of fit and predictive capacity with respective values of 87.83% (area under the success rate curve) and 93.98% (area under prediction rate curve), and the highest robustness correspondingly. These attempts will play a significant role in ensemble modeling, in building reliable and comprehensive models. The proposed ANN-RF-LR ensemble model may be used in the other geographic areas having similar geo-environmental conditions. It may also be used in other types of geo-hazard modeling.


Sign in / Sign up

Export Citation Format

Share Document