QSPR Modeling For Critical Temperatures Of Organic Compounds Using Hybrid Optimal Descriptors

Author(s):  
Khalid Bouhedjar ◽  
Abdelmalek Khorief Nacereddine ◽  
Hamida Ghorab ◽  
Abdelhafid Djerourou

The simplified molecular input line entry system (SMILES) is particularly suitable for high-speed machine processing, based on the Monte Carlo method using CORAL software. Quantitative structure-property relationships (QSPR) of critical temperatures have been established using a dataset of 165 diverse organic compounds employing hybrid optimal descriptors defined by graph and SMILES notation. External validation is one of the most important parts in the evaluation of model performance. However, previous models on the same dataset have poor predictive power in the external test set, or the authors had not done that check. In the present work, the predictive ability of model has been tested using external validation. The statistical quality of the three splits are similar and good. The r2 values for the best model are: r2 = 0.98 for the training set, r2 = 0.95 for the calibration set, and r2 = 0.94 for the validation set.

2021 ◽  
pp. 1-13
Author(s):  
Ahmadreza Hajihosseinloo ◽  
Maryam Salahinejad ◽  
Mohammad Kazem Rofouei ◽  
Jahan B. Ghasemi

Knowing stability constants for the complexes HgII with extracting ligands is very important from environmental and therapeutic standpoints. Since the selectivity of ligands can be stated by the stability constants of cation–ligand complexes, quantitative structure–property relationship (QSPR) investigations on binding constant of HgII complexes were done. Experimental data of the stability constants in ML2 complexation of HgII and synthesized triazene ligands were used to construct and develop QSPR models. Support vector machine (SVM) and multiple linear regression (MLR) have been employed to create the QSPR models. The final model showed squared correlation coefficient of 0.917 and the standard error of calibration (SEC) value of 0.141 log K units. The proposed model presented accurate prediction with the Leave-One-Out cross validation ( Q LOO 2  = 0.756) and validated using Y-randomization and external test set. Statistical results demonstrated that the proposed models had suitable goodness of fit, predictive ability, and robustness. The results revealed the importance of charge effects and topological properties of ligand in HgII - triazene complexation.


2022 ◽  
Author(s):  
Carson Lam ◽  
Rahul Thapa ◽  
Jenish Maharjan ◽  
Keyvan Rahmani ◽  
Chak Foon Tso ◽  
...  

BACKGROUND Acute Respiratory Distress Syndrome (ARDS) is a condition that is often considered to have broad and subjective diagnostic criteria and is associated with significant mortality and morbidity. Early and accurate prediction of ARDS and related conditions such as hypoxemia and sepsis could allow timely administration of therapies, leading to improved patient outcomes. OBJECTIVE To perform an exploration of how multi-label classification in the clinical setting can take advantage of the underlying dependencies between ARDS and related conditions to improve early prediction of ARDS. METHODS The electronic health record dataset included 40,073 patient encounters from 7 hospitals from 4/20/2018 to 3/17/2021. A recurrent neural network (RNN) was trained using data from 5 hospitals, and external validation was conducted on data from 2 hospitals. In addition to ARDS, 12 target labels for related conditions such as sepsis, hypoxemia and Covid-19 were used to train the model to classify a total of 13 outputs. As a comparator, XGBoost models were developed for each of the 13 target labels. Model performance was assessed using the area under the receiver operating characteristic (AUROC). Heatmaps to visualize attention scores were generated to provide interpretability to the NNs. Finally, cluster analysis was performed to identify potential phenotypic subgroups of ARDS patients. RESULTS The single RNN model trained to classify 13 outputs outperformed the XGBoost model for ARDS prediction, achieving an AUROC of 0.842 on the external test sets. Models trained on an increasing number of tasks resulted in increasing performance. Earlier diagnosis of ARDS nearly doubled the rate of in-hospital survival. Cluster analysis revealed distinct ARDS subgroups, some of which had similar mortality rates but different clinical presentations. CONCLUSIONS The RNN model presented in this paper can be used as an early warning system to stratify patients who are at risk of developing one of the multiple risk outcomes, hence providing practitioners with means to take early action.


Molecules ◽  
2020 ◽  
Vol 26 (1) ◽  
pp. 8
Author(s):  
Natalia Sizochenko ◽  
Markus Hofmann

In this study, we have investigated quantitative relationships between critical temperatures of superconductive inorganic materials and the basic physicochemical attributes of these materials (also called quantitative structure-property relationships). We demonstrated that one of the most recent studies (titled "A data-driven statistical model for predicting the critical temperature of a superconductor” and published in Computational Materials Science by K. Hamidieh in 2018) reports on models that were based on the dataset that contains 27% of duplicate entries. We aimed to deliver stable models for a properly cleaned dataset using the same modeling techniques (multiple linear regression, MLR, and gradient boosting decision trees, XGBoost). The predictive ability of our best XGBoost model (R2 = 0.924, RMSE = 9.336 using 10-fold cross-validation) is comparable to the XGBoost model by the author of the initial dataset (R2 = 0.920 and RMSE = 9.5 K in ten-fold cross-validation). At the same time, our best model is based on less sophisticated parameters, which allows one to make more accurate interpretations while maintaining a generalizable model. In particular, we found that the highest relative influence is attributed to variables that represent the thermal conductivity of materials. In addition to MLR and XGBoost, we explored the potential of other machine learning techniques (NN, neural networks and RF, random forests).


Sign in / Sign up

Export Citation Format

Share Document