QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

2016 ◽  
Vol 27 (3) ◽  
pp. 299-312
Author(s):  
Nadia Ziani ◽  
Khadidja Amirat ◽  
Djelloul Messadi

Purpose – The purpose of this paper is to predict the aquatic toxicity (LC50) of 92 substituted benzenes derivatives in Pimephales promelas. Design/methodology/approach – Quantitative structure-activity relationship analysis was performed on a series of 92 substituted benzenes derivatives using multiple linear regression (MLR), artificial neural network (ANN) and support vector machines (SVM) methods, which correlate aquatic toxicity (LC50) values of these chemicals to their structural descriptors. At first, the entire data set was split according to Kennard and Stone algorithm into a training set (74 chemicals) and a test set (18 chemical) for statistical external validation. Findings – Models with six descriptors were developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm – variable subset selection procedure. Originality/value – The values of Q2 and RMSE in internal validation for MLR, SVM, and ANN model were: (0.8829; 0.225), (0.8882; 0.222); (0.8980; 0.214), respectively and also for external validation were: (0.9538; 0.141); (0.947; 0.146); (0.9564; 0.146). The statistical parameters obtained for the three approaches are very similar, which confirm that our six parameters model is stable, robust and significant.


2020 ◽  
Vol 85 (4) ◽  
pp. 467-480 ◽  
Author(s):  
Rana Amiri ◽  
Djelloul Messadi ◽  
Amel Bouakkadia

This study aimed at predicting the n-octanol/water partition coefficient (Kow) of 43 organophosphorous insecticides. Quantitative structure?property relationship analysis was performed on the series of 43 insecticides using two different methods, linear (multiple linear regression, MLR) and non-linear (artificial neural network, ANN), which Kow values of these chemicals to their structural descriptors. First, the data set was separated with a duplex algorithm into a training set (28 chemicals) and a test set (15 chemicals) for statistical external validation. A model with four descriptors was developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm (GA)?variable subset selection (VSS) procedure. The values of statistical parameters, R2, Q2 ext, SDEPext and SDEC for the MLR (94.09 %, 92.43 %, 0.533 and 0.471, respectively) and ANN model (97.24 %, 92.17 %, 0.466 and 0.332, respectively) obtained for the three approaches are very similar, which confirmed that the employed four parameters model is stable, robust and significant.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ruolan Zeng ◽  
Jiyong Deng ◽  
Limin Dang ◽  
Xinliang Yu

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.


2021 ◽  
Vol 10 (2) ◽  
pp. 1-13
Author(s):  
Geetha Vaithianathan ◽  
Rajkumar E.

Medical image processing is a complex exercise and involves a number of stages to identify the disease in the arena of medical imaging. Irritable bowel syndrome is an acute disorder that causes intense abdominal pain and leads to changes in the bowel system. It gives rise to various indications like bleeding, bloating, celiac disease, gastric cancer, ulcer, etc. The system proposed here seeks to segment and classify each symptom of the irritable bowel syndrome individually with the aid of supervoxel segmentation algorithm. Features are extracted depending on the color, shape, and texture of the object. The extracted features are fed into the multi-support vector machine to identify the specific region in the medical image. The experiment provides the result of a test set 100 images stored in the data set which improves accuracy that refines the final output.


Author(s):  
Zhixian Chen ◽  
Jialin Tang ◽  
Xueyuan Gong ◽  
Qinglang Su

In order to improve the low accuracy of the face recognition methods in the case of e-health, this paper proposed a novel face recognition approach, which is based on convolutional neural network (CNN). In detail, through resolving the convolutional kernel, rectified linear unit (ReLU) activation function, dropout, and batch normalization, this novel approach reduces the number of parameters of the CNN model, improves the non-linearity of the CNN model, and alleviates overfitting of the CNN model. In these ways, the accuracy of face recognition is increased. In the experiments, the proposed approach is compared with principal component analysis (PCA) and support vector machine (SVM) on ORL, Cohn-Kanade, and extended Yale-B face recognition data set, and it proves that this approach is promising.


Symmetry ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 380 ◽  
Author(s):  
Kai Ye

When identifying the key features of the network intrusion signal based on the GA-RBF algorithm (using the genetic algorithm to optimize the radial basis) to identify the key features of the network intrusion signal, the pre-processing process of the network intrusion signal data is neglected, resulting in an increase in network signal data noise, reducing the accuracy of key feature recognition. Therefore, a key feature recognition algorithm for network intrusion signals based on neural network and support vector machine is proposed. The principal component neural network (PCNN) is used to extract the characteristics of the network intrusion signal and the support vector machine multi-classifier is constructed. The feature extraction result is input into the support vector machine classifier. Combined with PCNN and SVM (Support Vector Machine) algorithms, the key features of network intrusion signals are identified. The experimental results show that the algorithm has the advantages of high precision, low false positive rate and the recognition time of key features of R2L (it is a common way of network intrusion attack) data set is only 3.18 ms.


2003 ◽  
Vol 11 (1) ◽  
pp. 55-70 ◽  
Author(s):  
Laila Stordrange ◽  
Olav M. Kvalheim ◽  
Per A. Hassel ◽  
Dick Malthe-Sørenssen ◽  
Fred Olav Libnau

Partial least squares (PLS) is a powerful tool for multivariate linear regression. But what if the data show a non-linear structure? Near infrared spectra from a pharmaceutical process were used as a case study. An ANOVA test revealed that the data are well described by a 2nd order polynomial. This work investigates the application of regression techniques that account for slightly non-linear data. The regression techniques investigated are: linearising data by applying transformations, local PLS, i.e. splitting of data, and quadratic PLS. These models were compared with ordinary PLS and principal component regression (PCR). The predictive ability of the models was tested on an independent data set acquired a year later. Using the knowledge of non-linear pattern and important spectral regions, simpler models with better predictive ability can be obtained.


2013 ◽  
Vol 14 (1) ◽  
pp. 10-17

Artificial neural networks (ANNs) are being used increasingly to predict water variables. This study offers an alternative approach to quantify the relationship between time of chlorination in potable water (due to convectional treatment procedure) and chlorination by-products concentration (expressed as carbon and bromine) with an ANN model, i.e., capturing non-linear relationships among the water quality variables. Thus, carbon and bromine concentrations in potable water (the second chosen due to the toxicity of brominated trihalomethanes, THMs) were predicted using artificial neural networks (ANNs) based mainly on multi-layer perceptrons (MLPs) architecture. The chlorination (detention) time as much as 58 hours in Athens distributed network, comprised the input variables to the ANNs models. Moreover, to develop an ANN model for estimating carbon and bromine, the available data set was partitioned into training, validation and test set. In order to reach an optimum amount of hidden layers or nodes, different architectures were tested. The quality of the ANN simulations was evaluated in terms of the error in the validation sample set for the proper interpretation of the results. The calculated sum-squared errors for training, validation and test set were 0.056, 0.039 and 0.060 respectively for the best model selected. Comparison of the results showed that a two-layer feed-forward back propagation ANN model could be used as an acceptable model for predicting carbon and bromine contained in potable water THMs.


Author(s):  
B. Elidrissi ◽  
A. Ousaa ◽  
M. Ghamali ◽  
S. Chtita ◽  
M. A. Ajana ◽  
...  

A Quantitative Structure–Activity Relationship (QSAR) study was performed to predict HIV-1 integrase inhibition activity (pIC50) of thirty-five 5-hydroxy-6-oxo-1,6-dihydropyrimidine-4-carboxamide compounds using the electronic and physico-chemical descriptors computed respectively, with Gaussian 03W and ACD/ChemSketch programs. The structures of all compounds were optimized using the hybrid Density Functional Theory (DFT) at the B3LYP/6-31G(d) level of theory. In both approaches, 28 compounds were assigned as the training set and the rest as the test set. These compounds were analyzed by the principal components analysis (PCA) method, the descendant Multiple Linear Regression (MLR) analyses and the Artificial Neural Network (ANN). The robustness of the obtained models was assessed by leave-many-out cross-validation, and external validation through a test set. This study shows that the MLR has served marginally better to predict pIC50 activity, when compared with the results given by predictions made with a (4-3-1) ANN model.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14138-e14138
Author(s):  
Beung-Chul AHN ◽  
Kyoung Ho Pyo ◽  
Dongmin Jung ◽  
Chun-Feng Xin ◽  
Chang Gon Kim ◽  
...  

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.


Sign in / Sign up

Export Citation Format

Share Document