scholarly journals Network-based piecewise linear regression for QSAR modelling

2019 ◽  
Vol 33 (9) ◽  
pp. 831-844
Author(s):  
Jonathan Cardoso-Silva ◽  
Lazaros G. Papageorgiou ◽  
Sophia Tsoka

Abstract Quantitative Structure-Activity Relationship (QSAR) models are critical in various areas of drug discovery, for example in lead optimisation and virtual screening. Recently, the need for models that are not only predictive but also interpretable has been highlighted. In this paper, a new methodology is proposed to build interpretable QSAR models by combining elements of network analysis and piecewise linear regression. The algorithm presented, modSAR, splits data using a two-step procedure. First, compounds associated with a common target are represented as a network in terms of their structural similarity, revealing modules of similar chemical properties. Second, each module is subdivided into subsets (regions), each of which is modelled by an independent linear equation. Comparative analysis of QSAR models across five data sets of protein inhibitors obtained from ChEMBL is reported and it is shown that modSAR offers similar predictive accuracy to popular algorithms, such as Random Forest and Support Vector Machine. Moreover, we show that models built by modSAR are interpretatable, capable of evaluating the applicability domain of the compounds and serve well tasks such as virtual screening and the development of new drug leads.


2020 ◽  
Vol 21 (21) ◽  
pp. 7828
Author(s):  
Jacob Spiegel ◽  
Hanoch Senderowitz

Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, R2 and QCV2. Similar metrics, calculated on an external set of data (e.g., QF1/F2/F32), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -” ignorant”. In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by “classical” metrics, e.g., R2 and QF1/F2/F32 and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable R2 and/or QF1/F2/F32 values were unable to pick a single active compound from within the pool whereas in other cases, models with poor R2 and/or QF1/F2/F32 values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.



2019 ◽  
Vol 3 (4) ◽  
pp. 250-252 ◽  
Author(s):  
David M Hille

ObjectiveTo identify changes in the linear trend of the age-standardized incidence of melanoma in Australia for all persons, males, and females. MethodsA two-piece piecewise linear regression was fitted to the data. The piecewise breakpoint varied through an iterative process to determine the model that best fits the data.ResultsStatistically significant changes in the trendof the age-standardized incidence of melanoma in Australia were found for all persons, males, and females. The optimal breakpoint for all persons and males was at 1998. For females, the optimal breakpoint was at 2005. The trend after these breakpoints was flatter than prior to the breakpoints, but still positive.ConclusionMelanoma is a significant public health issue in Australia. Overall incidence continues to increase. However, the rate at which the incidence is increasing appears to be decreasing.



2020 ◽  
Author(s):  
Xiuping Xuan ◽  
Masahide Hamaguchi ◽  
Qiuli Cao ◽  
Okamura Takuro ◽  
Yoshitaka Hashimoto ◽  
...  

Abstract Background Although the triglycerides-glucose (TyG) index was thought to be a practical predictor of incident diabetes, the association between them has not been well characterized. The study aimed to further examine the association between the TyG index and incident diabetes in Japanese adults. Methods The cases were extracted of the individual participating in the NAGALA (NAfld in the Gifu Area, Longitudinal Analysis) study at Murakami Memorial Hospital from 2004 to 2015, and 14297individuals apparently healthy at baseline were included in the study. Cox proportional hazards models were used to evaluate the associations between baseline TyG levels and incident of T2DM, and a two-piecewise linear regression model was use to examine the threshold effect of the baseline TyG on incident diabetes using a smoothing function. The threshold level (i.e., turning point) was determined using trial and error. A log likelihood ratio test was also conducted to compare the one-line linear regression model with a two-piecewise linear model. Results During a median follow-up period of 5.26 (women) and 5.88 (men) years, 47 women and 182 men developed Type 2 diabetes. The risk of diabetes was strongly associated with the baseline TyG index in the fully adjusted model in men but not in women, and no dose-dependent positive relationship between incident diabetes and TyG was observed across TyG tertiles. Intriguingly, two-piecewise linear regression analysis showed a U-shaped association between the TyG index and incident T2DM. The risk of incident diabetes decreased by around 90% in women with TyG < 7.27 (HR: 0.09; P = 0.0435) and 80% in men with TyG < 7.97 (HR 0.21, P = 0.002) with each increment of the TyG index after adjusting for confounders. In contrast, the risk of incident T2DM significantly elevated with the increase in TyG index in men with TyG > 7.97 (HR: 2.42, P < 0.001) and women with TyG > 7.29 (HR 2.76, P = 0.0166). Conclusions A U-shaped association was observed between the TyG index and incident T2DM among healthy individuals, with the TyG threshold of 7.97 in men and 7.27 in women. This information may be useful for reducing incident diabetes by maintaining the TyG index near these thresholds.



IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 29845-29855 ◽  
Author(s):  
Xubing Yang ◽  
Hongxin Yang ◽  
Fuquan Zhang ◽  
Li Zhang ◽  
Xijian Fan ◽  
...  


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Li Wen ◽  
Qing Li ◽  
Wei Li ◽  
Qiao Cai ◽  
Yong-Ming Cai

Hydroxyl benzoic esters are preservative, being widely used in food, medicine, and cosmetics. To explore the relationship between the molecular structure and antibacterial activity of these compounds and predict the compounds with similar structures, Quantitative Structure-Activity Relationship (QSAR) models of 25 kinds of hydroxyl benzoic esters with the quantum chemical parameters and molecular connectivity indexes are built based on support vector machine (SVM) by using R language. The External Standard Deviation Error of Prediction (SDEPext), fitting correlation coefficient (R2), and leave-one-out cross-validation (Q2LOO) are used to value the reliability, stability, and predictive ability of models. The results show that R2 and Q2LOO of 4 kinds of nonlinear models are more than 0.6 and SDEPext is 0.213, 0.222, 0.189, and 0.218, respectively. Compared with the multiple linear regression (MLR) model (R2=0.421, RSD = 0.260), the correlation coefficient and the standard deviation are both better than MLR. The reliability, stability, robustness, and external predictive ability of models are good, particularly of the model of linear kernel function and eps-regression type. This model can predict the antimicrobial activity of the compounds with similar structure in the applicability domain.



Molecules ◽  
2019 ◽  
Vol 24 (21) ◽  
pp. 3909 ◽  
Author(s):  
Amit Kumar Halder ◽  
Amal Kanta Giri ◽  
Maria Natália Dias Soeiro Cordeiro

Two isoforms of extracellular regulated kinase (ERK), namely ERK-1 and ERK-2, are associated with several cellular processes, the aberration of which leads to cancer. The ERK-1/2 inhibitors are thus considered as potential agents for cancer therapy. Multitarget quantitative structure–activity relationship (mt-QSAR) models based on the Box–Jenkins approach were developed with a dataset containing 6400 ERK inhibitors assayed under different experimental conditions. The first mt-QSAR linear model was built with linear discriminant analysis (LDA) and provided information regarding the structural requirements for better activity. This linear model was also utilised for a fragment analysis to estimate the contributions of ring fragments towards ERK inhibition. Then, the random forest (RF) technique was employed to produce highly predictive non-linear mt-QSAR models, which were used for screening the Asinex kinase library and identify the most potential virtual hits. The fragment analysis results justified the selection of the hits retrieved through such virtual screening. The latter were subsequently subjected to molecular docking and molecular dynamics simulations to understand their possible interactions with ERK enzymes. The present work, which utilises in-silico techniques such as multitarget chemometric modelling, fragment analysis, virtual screening, molecular docking and dynamics, may provide important guidelines to facilitate the discovery of novel ERK inhibitors.



1994 ◽  
Vol 21 (4) ◽  
pp. 221-233 ◽  
Author(s):  
John Cologne ◽  
Richard Sposto


Author(s):  
Eduardo Perez-Pellitero ◽  
Jordi Salvador ◽  
Javier Ruiz-Hidalgo ◽  
Bodo Rosenhahn


Sign in / Sign up

Export Citation Format

Share Document