scholarly journals Evaluation of QSAR Equations for Virtual Screening

2020 ◽  
Vol 21 (21) ◽  
pp. 7828
Author(s):  
Jacob Spiegel ◽  
Hanoch Senderowitz

Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, R2 and QCV2. Similar metrics, calculated on an external set of data (e.g., QF1/F2/F32), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -” ignorant”. In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by “classical” metrics, e.g., R2 and QF1/F2/F32 and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable R2 and/or QF1/F2/F32 values were unable to pick a single active compound from within the pool whereas in other cases, models with poor R2 and/or QF1/F2/F32 values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.


2019 ◽  
Vol 33 (9) ◽  
pp. 831-844
Author(s):  
Jonathan Cardoso-Silva ◽  
Lazaros G. Papageorgiou ◽  
Sophia Tsoka

Abstract Quantitative Structure-Activity Relationship (QSAR) models are critical in various areas of drug discovery, for example in lead optimisation and virtual screening. Recently, the need for models that are not only predictive but also interpretable has been highlighted. In this paper, a new methodology is proposed to build interpretable QSAR models by combining elements of network analysis and piecewise linear regression. The algorithm presented, modSAR, splits data using a two-step procedure. First, compounds associated with a common target are represented as a network in terms of their structural similarity, revealing modules of similar chemical properties. Second, each module is subdivided into subsets (regions), each of which is modelled by an independent linear equation. Comparative analysis of QSAR models across five data sets of protein inhibitors obtained from ChEMBL is reported and it is shown that modSAR offers similar predictive accuracy to popular algorithms, such as Random Forest and Support Vector Machine. Moreover, we show that models built by modSAR are interpretatable, capable of evaluating the applicability domain of the compounds and serve well tasks such as virtual screening and the development of new drug leads.



2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Li Wen ◽  
Qing Li ◽  
Wei Li ◽  
Qiao Cai ◽  
Yong-Ming Cai

Hydroxyl benzoic esters are preservative, being widely used in food, medicine, and cosmetics. To explore the relationship between the molecular structure and antibacterial activity of these compounds and predict the compounds with similar structures, Quantitative Structure-Activity Relationship (QSAR) models of 25 kinds of hydroxyl benzoic esters with the quantum chemical parameters and molecular connectivity indexes are built based on support vector machine (SVM) by using R language. The External Standard Deviation Error of Prediction (SDEPext), fitting correlation coefficient (R2), and leave-one-out cross-validation (Q2LOO) are used to value the reliability, stability, and predictive ability of models. The results show that R2 and Q2LOO of 4 kinds of nonlinear models are more than 0.6 and SDEPext is 0.213, 0.222, 0.189, and 0.218, respectively. Compared with the multiple linear regression (MLR) model (R2=0.421, RSD = 0.260), the correlation coefficient and the standard deviation are both better than MLR. The reliability, stability, robustness, and external predictive ability of models are good, particularly of the model of linear kernel function and eps-regression type. This model can predict the antimicrobial activity of the compounds with similar structure in the applicability domain.



Molecules ◽  
2018 ◽  
Vol 23 (9) ◽  
pp. 2348 ◽  
Author(s):  
Letícia Santos-Garcia ◽  
Marco de Mecenas Filho ◽  
Kamil Musilek ◽  
Kamil Kuca ◽  
Teodorico Ramalho ◽  
...  

Malaria is a disease caused by protozoan parasites of the genus Plasmodium that affects millions of people worldwide. In recent years there have been parasite resistances to several drugs, including the first-line antimalarial treatment. With the aim of proposing new drugs candidates for the treatment of disease, Quantitative Structure–Activity Relationship (QSAR) methodology was applied to 83 N-myristoyltransferase inhibitors, synthesized by Leatherbarrow et al. The QSAR models were developed using 63 compounds, the training set, and externally validated using 20 compounds, the test set. Ten different alignments for the two test sets were tested and the models were generated by the technique that combines genetic algorithms and partial least squares. The best model shows r2 = 0.757, q2adjusted = 0.634, R2pred = 0.746, R2m = 0.716, ∆R2m = 0.133, R2p = 0.609, and R2r = 0.110. This work suggested a good correlation with the experimental results and allows the design of new potent N-myristoyltransferase inhibitors.



Molecules ◽  
2019 ◽  
Vol 24 (21) ◽  
pp. 3909 ◽  
Author(s):  
Amit Kumar Halder ◽  
Amal Kanta Giri ◽  
Maria Natália Dias Soeiro Cordeiro

Two isoforms of extracellular regulated kinase (ERK), namely ERK-1 and ERK-2, are associated with several cellular processes, the aberration of which leads to cancer. The ERK-1/2 inhibitors are thus considered as potential agents for cancer therapy. Multitarget quantitative structure–activity relationship (mt-QSAR) models based on the Box–Jenkins approach were developed with a dataset containing 6400 ERK inhibitors assayed under different experimental conditions. The first mt-QSAR linear model was built with linear discriminant analysis (LDA) and provided information regarding the structural requirements for better activity. This linear model was also utilised for a fragment analysis to estimate the contributions of ring fragments towards ERK inhibition. Then, the random forest (RF) technique was employed to produce highly predictive non-linear mt-QSAR models, which were used for screening the Asinex kinase library and identify the most potential virtual hits. The fragment analysis results justified the selection of the hits retrieved through such virtual screening. The latter were subsequently subjected to molecular docking and molecular dynamics simulations to understand their possible interactions with ERK enzymes. The present work, which utilises in-silico techniques such as multitarget chemometric modelling, fragment analysis, virtual screening, molecular docking and dynamics, may provide important guidelines to facilitate the discovery of novel ERK inhibitors.



2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Ying-Hsin Chang ◽  
Jun-Yan Chen ◽  
Chiou-Yi Hor ◽  
Yu-Chung Chuang ◽  
Chang-Biau Yang ◽  
...  

Human estrogen receptor (ER) isoforms, ERα and ERβ, have long been an important focus in the field of biology. To better understand the structural features associated with the binding of ERα ligands to ERα and modulate their function, several QSAR models, including CoMFA, CoMSIA, SVR, and LR methods, have been employed to predict the inhibitory activity of 68 raloxifene derivatives. In the SVR and LR modeling, 11 descriptors were selected through feature ranking and sequential feature addition/deletion to generate equations to predict the inhibitory activity toward ERα. Among four descriptors that constantly appear in various generated equations, two agree with CoMFA and CoMSIA steric fields and another two can be correlated to a calculated electrostatic potential of ERα.





2013 ◽  
Vol 13 (6) ◽  
pp. 1543-1552 ◽  
Author(s):  
Juan Zhang ◽  
Ron Hofmann

The adsorption of 115 emerging contaminants, mainly organic chemicals identified by the US Environmental Protection Agency's 2009 Contaminant Candidate List 3, was ranked using two published classical quantitative structure-activity relationship (QSAR) models and a newly developed quantum QSAR model. Approximately 75% of the investigated contaminants were predicted to be cost-effectively treatable, with an activated carbon usage rate below 10 mg/L. A limited experimental validation campaign was carried out by rapid small-scale column testing (RSSCT) using Lake Ontario water for eight selected compounds: 17β-estradiol, ibuprofen, diazinon, sulfamethoxazole, carbamazepine, 4-nonylphenol diethoxalyate, azithromycin and tylosin, with the activated carbon adsorption of the latter three having never been previously reported. The experimental results were consistent with the quantum chemistry model rankings.



2011 ◽  
Vol 20 (02) ◽  
pp. 253-270 ◽  
Author(s):  
FRANK HOONAKKER ◽  
NICOLAS LACHICHE ◽  
ALEXANDRE VARNEK ◽  
ALAIN WAGNER

Chemical reactions always involve several molecules of two types, reactants and products. Existing data mining techniques, eg. Quantitative Structure Activity Relationship (QSAR) methods, deal with individual molecules only. In this article, we propose to use a Condensed Graph of Reaction (CGR) to merge all molecules involved in a reaction into one molecular graph. This allows one to consider reactions as pseudo-molecules and to develop QSAR models based on fragment descriptors. Then ISIDA (In SIlico Design and Analysis) fragment descriptors built from CGRs are used to generate models for the rate constant of S N 2 reactions in water, using three usual attribute-value regression algorithms (linear regression, support vector machine, and regression trees). This approach is compared favorably to two state-of-the-art relational data mining techniques.



2019 ◽  
Vol 15 (3) ◽  
pp. 243-251 ◽  
Author(s):  
Erol Eroglu

<P>Objective: We present three robust, validated and statistically significant quantitative structure-activity relationship (QSAR) models, which deal with the calculated molecular descriptors and experimental inhibition constant (Ki) of 42 coumarin and sulfocoumarin derivatives measured against CA I and II isoforms. </P><P> Methods: The compounds were subjected to DFT calculations in order to obtain quantum chemical molecular descriptors. Multiple linear regression algorithms were applied to construct QSAR models. Separation of the compounds into training and test sets was accomplished using Kennard-Stone algorithm. Leverage approach was applied to determine Applicability Domain (AD) of the obtained models. </P><P> Results: Three models were developed. The first model, CAI_model1 comprises 30/11 training/test compounds with the statistical parameters of R2=0.85, Q2=0.77, F=27.57, R2 (test) =0.72. The second one, CAII_model2 comprises 30/12 training/test compounds with the statistical parameters of R2=0.86, Q2=0.78, F=30.27, R2 (test) =0.85. The final model, &#916;pKi_model3 consists of 25/3 training/ test compounds with the statistical parameters of R2=0.78, Q2=0.62, F=13.80 and R2(test) =0.99. </P><P> Conclusion: Interpretation of reactivity-related descriptors such as HOMO-1 and LUMO energies and visual inspection of their maps of orbital electron density leads to a conclusion that the binding free energy of the entire binding process may be modulated by the kinetics of the hydrolyzing step of coumarins.</P>



Sign in / Sign up

Export Citation Format

Share Document