scholarly journals The effect of noise on the predictive limit of QSAR models

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Scott S. Kolmar ◽  
Christopher M. Grulke

AbstractA key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. Graphical Abstract

2008 ◽  
Vol 13 (8) ◽  
pp. 785-794 ◽  
Author(s):  
Alfredo Meneses-Marcel ◽  
Oscar M. Rivera-Borroto ◽  
Yovani Marrero-Ponce ◽  
Alina Montero ◽  
Yanetsy Machado Tugores ◽  
...  

Bond-based quadratic indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis (LDA) were used to discover novel lead trichomonacidals. The obtained LDA-based quantitative structure-activity relationships (QSAR) models, using nonstochastic and stochastic indices, were able to classify correctly 87.91% (87.50%) and 89.01% (84.38%) of the chemicals in training (test) sets, respectively. They showed large Matthews correlation coefficients of 0.75 (0.71) and 0.78 (0.65) for the training (test) sets, correspondingly. Later, both models were applied to the virtual screening of 21 chemicals to find new lead antitrichomonal agents. Predictions agreed with experimental results to a great extent because a correct classification for both models of 95.24% (20 of 21) of the chemicals was obtained. Of the 21 compounds that were screened and synthesized, 2 molecules (chemicals G-1, UC-245) showed high to moderate cytocidal activity at the concentration of 10 μg/ml, another 2 compounds (G-0 and CRIS-148) showed high cytocidal activity only at the concentration of 100 μg/ml, and the remaining chemicals (from CRIS-105 to CRIS-153, except CRIS-148) were inactive at these assayed concentrations. Finally, the best candidate, G-1 (cytocidal activity of 100% at 10 μg/ml) was in vivo assayed in ovariectomized Wistar rats achieving promising results as a trichomonacidal drug-like compound. (Journal of Biomolecular Screening 2008:785-794).


2008 ◽  
Vol 36 (1) ◽  
pp. 15-24 ◽  
Author(s):  
Enrico Mombelli

According to the REACH chemicals legislation, formally adopted by the EU in 2006, Quantitative Structure–Activity Relationships (QSARs) can be used as alternatives to animal testing, which itself poses specific ethical and economical concerns. A critical assessment of the performance of the QSAR models is therefore the first step toward the reliable use of such computational techniques. This article reports the performance of the skin irritation module of three commercially-available software packages: DEREK, HAZARDEXPERT and TOPKAT. Their performances were tested on the basis of data published in the literature, for 116 chemicals. The results of this study show that only TOPKAT was able to predict the irritative potential for the majority of chemicals, whereas DEREK and HAZARDEXPERT could correctly identify only a few irritant substances.


Nanoscale ◽  
2016 ◽  
Vol 8 (13) ◽  
pp. 7203-7208 ◽  
Author(s):  
Natalia Sizochenko ◽  
Agnieszka Gajewicz ◽  
Jerzy Leszczynski ◽  
Tomasz Puzyn

In this paper, we suggest that causal inference methods could be efficiently used in Quantitative Structure–Activity Relationships (QSAR) modeling as additional validation criteria within quality evaluation of the model.


2009 ◽  
Vol 2 (3) ◽  
pp. 184-186 ◽  
Author(s):  
Miloň Tichý ◽  
Marián Rucki

Validation of QSAR models for legislative purposesOECD principles of validation of Quantitative Structure - Activity Relationships (QSAR) models for legislative purposes are given and explained. Reasons of their origination and development, like system REACH, are described. A basic impulse has come from some OECD countries followed by all (almost) other countries of the world.


2017 ◽  
Vol 16 (05) ◽  
pp. 1750038 ◽  
Author(s):  
Abolfazl Barzegar ◽  
Hossein Hamidi

Human immunodeficiency virus-1 (HIV-1) integrase appears to be a crucial target for developing new anti-HIV-1 therapeutic agents. Different quantitative structure–activity relationships (QSARs) algorithms have been used in order to develop efficient model(s) to predict the activity of new pyridinone derivatives against HIV-1 integrase. Multiple linear regression (MLR) and combined principal component analysis (PCA) with MLR have been applied to build QSAR models for a set of new pyridinone derivatives as potent anti-HIV-1 therapeutic agents. Four different approaches based on MLR method including; concrete-MLR, stepwise-MLR, concrete PCA–MLR and stepwise PCA–MLR were utilized for this aim. Twenty two different sets of descriptors containing 1613 descriptors were constructed for each optimized molecule. Comparison between predictability of the “concrete” and “stepwise” procedure in two different algorithms of MLR and PCA models indicated the advantage of the stepwise procedure over that of the simple concrete method. Although the PCA was employed for dimension reduction, using stepwise PCA–MLR model showed that the method has higher ability to predict the compounds’ activity. The stepwise PCA–MLR model showed highly validated statistical results both in fitting and prediction processes ([Formula: see text] and [Formula: see text]). Therefore, using stepwise PCA approach is suitable to remove ineffective descriptors, which results in remaining efficient descriptors for building good predictability stepwise PCA–MLR. The stepwise hybrid approach of PCA–MLR may be useful in derivation of highly predictive and interpretable QSAR models.


2020 ◽  
Author(s):  
Vijay Masand ◽  
Ajaykumar Gandhi ◽  
Vesna Rastija ◽  
Meghshyam K. Patil

<div>In the present work, an extensive QSAR (Quantitative Structure Activity Relationships) analysis of a series of peptide-type SARS-CoV main protease (MPro) inhibitors following the OECD guidelines has been accomplished. The analysis was aimed to identify salient and concealed structural features that govern the MPro inhibitory activity of peptide-type compounds. The QSAR analysis is based on a dataset of sixty-two peptide-type compounds which resulted in the generation of statistically robust and highly predictive multiple models. All the developed models were validated extensively and satisfy the threshold values for many statistical parameters (for e.g. R2 = 0.80–0.82, Q2loo = 0.74–0.77). The developed models identified interrelations of atom pairs as important molecular descriptors. Therefore, the present QSAR models have a good balance of Qualitative and Quantitative approaches, thereby, useful for future modifications of peptide-type compounds for anti- SARS-CoV activity.</div><div><br></div>


Author(s):  
Domenico Gadaleta ◽  
Giuseppe Felice Mangiatordi ◽  
Marco Catto ◽  
Angelo Carotti ◽  
Orazio Nicolotti

Quantitative Structure-Activity Relationships are widely acknowledged predictive methods employed, for years, in organic and medicinal chemistry. More recently, they have assumed a central role also in the context of the explorative toxicology for the protection of environment and human health. However, their real-life application has not been always enthusiastically welcomed, being often retrospectively used and, thus, of limited importance for prospective goals. The need of making more trustable predictions has thus addressed studies on the so-called Applicability Domain, which represents the chemical space from which a model is derived and where a prediction is considered to be reliable. In the present study, the authors survey a number of approaches used to build the Applicability Domain. In particular, they will focus on strategies based on: a) physico-chemical, b) structural and c) response domains. Moreover, some examples integrating different strategies will be also discussed to meet the needs of both model developers and downstream users.


Author(s):  
Alla P. Toropova ◽  
Andrey A. Toropov ◽  
Emilio Benfenati

Three kinds of drug toxicities are examined in this modeling analysis. These are: (i) toxicity of psychotropic drugs; (ii) cardiac toxicity; and (iii) drug carcinogenicity. Predictive models for the toxicity data are built up by the Monte Carlo technique. The simplified molecular input-line entry system (SMILES) is used for the representation of the molecular structure. Quantitative structure – activity relationships (QSAR) developed here are mathematical functions of corresponding SMILES. The index of ideality of correlation was tested as a tool to improve predictive potential of these models.


Author(s):  
Benjamin Stone ◽  
Erik Sapper

Biofilms are congregations of bacteria on a surface, and they grow into obstacles for the functionalities of any device or machinery involves anything biological. Biofilms are developed through a biochemical system known as &lsquo;Quorum Sensing&rsquo; that accounts for the chemical signaling that direct either biofilm formation or inhibition. Computational models that relate chemical and structural features of compounds to their performance properties have been used to aide in the discovery of active small molecules for many decades. These quantitative structure-activity relationship (QSAR) models are also important for predicting the activity of molecules that can have a range of effectiveness in biological systems. This study uses QSAR methodologies combined with and different machine learning algorithms to predict and assess the performance of several different compounds acting in Quorum Sensing. Through computational probing of the quorum sensing molecular interaction, new design rules can be elucidated for countering biofilms.


2020 ◽  
Author(s):  
Vijay Masand ◽  
Ajaykumar Gandhi ◽  
Vesna Rastija ◽  
Meghshyam K. Patil

<div>In the present work, an extensive QSAR (Quantitative Structure Activity Relationships) analysis of a series of peptide-type SARS-CoV main protease (MPro) inhibitors following the OECD guidelines has been accomplished. The analysis was aimed to identify salient and concealed structural features that govern the MPro inhibitory activity of peptide-type compounds. The QSAR analysis is based on a dataset of sixty-two peptide-type compounds which resulted in the generation of statistically robust and highly predictive multiple models. All the developed models were validated extensively and satisfy the threshold values for many statistical parameters (for e.g. R2 = 0.80–0.82, Q2loo = 0.74–0.77). The developed models identified interrelations of atom pairs as important molecular descriptors. Therefore, the present QSAR models have a good balance of Qualitative and Quantitative approaches, thereby, useful for future modifications of peptide-type compounds for anti- SARS-CoV activity.</div><div><br></div>


Sign in / Sign up

Export Citation Format

Share Document