Modeling and prediction of octanol/water partition coefficient of pesticides using QSPR methods

2017 ◽  
Vol 28 (4) ◽  
pp. 579-592 ◽  
Author(s):  
Amel Bouakkadia ◽  
Leila Lourici ◽  
Djelloul Messadi

Purpose The purpose of this paper is to predict the octanol/water partition coefficient (Kow) of 43 organophosphorous compounds. Design/methodology/approach A quantitative structure-property relationship analysis was performed on a series of 43 pesticides using multiple linear regression and support vector machines methods, which correlate the octanol-water partition coefficient (Kow) values of these chemicals to their structural descriptors. At first, the data set was randomly separated into a training set (34 chemicals) and a test set (nine chemicals) for statistical external validation. Findings Models with three descriptors were developed using theoretical descriptors as independent variables derived from Dragon software while applying genetic algorithm-variable subset selection procedure. Originality/value The robustness and the predictive performance of the proposed linear model were verified using both internal and external statistical validation. One influential point which reinforces the model and an outlier were highlighted.

2020 ◽  
Vol 85 (4) ◽  
pp. 467-480 ◽  
Author(s):  
Rana Amiri ◽  
Djelloul Messadi ◽  
Amel Bouakkadia

This study aimed at predicting the n-octanol/water partition coefficient (Kow) of 43 organophosphorous insecticides. Quantitative structure?property relationship analysis was performed on the series of 43 insecticides using two different methods, linear (multiple linear regression, MLR) and non-linear (artificial neural network, ANN), which Kow values of these chemicals to their structural descriptors. First, the data set was separated with a duplex algorithm into a training set (28 chemicals) and a test set (15 chemicals) for statistical external validation. A model with four descriptors was developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm (GA)?variable subset selection (VSS) procedure. The values of statistical parameters, R2, Q2 ext, SDEPext and SDEC for the MLR (94.09 %, 92.43 %, 0.533 and 0.471, respectively) and ANN model (97.24 %, 92.17 %, 0.466 and 0.332, respectively) obtained for the three approaches are very similar, which confirmed that the employed four parameters model is stable, robust and significant.


2016 ◽  
Vol 27 (3) ◽  
pp. 299-312
Author(s):  
Nadia Ziani ◽  
Khadidja Amirat ◽  
Djelloul Messadi

Purpose – The purpose of this paper is to predict the aquatic toxicity (LC50) of 92 substituted benzenes derivatives in Pimephales promelas. Design/methodology/approach – Quantitative structure-activity relationship analysis was performed on a series of 92 substituted benzenes derivatives using multiple linear regression (MLR), artificial neural network (ANN) and support vector machines (SVM) methods, which correlate aquatic toxicity (LC50) values of these chemicals to their structural descriptors. At first, the entire data set was split according to Kennard and Stone algorithm into a training set (74 chemicals) and a test set (18 chemical) for statistical external validation. Findings – Models with six descriptors were developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm – variable subset selection procedure. Originality/value – The values of Q2 and RMSE in internal validation for MLR, SVM, and ANN model were: (0.8829; 0.225), (0.8882; 0.222); (0.8980; 0.214), respectively and also for external validation were: (0.9538; 0.141); (0.947; 0.146); (0.9564; 0.146). The statistical parameters obtained for the three approaches are very similar, which confirm that our six parameters model is stable, robust and significant.


2020 ◽  
Vol 42 (3) ◽  
pp. 447-447
Author(s):  
Mounia Zine Mounia Zine ◽  
Amel Bouakkadia Amel Bouakkadia ◽  
Leila Lourici and Djelloul Messadi Leila Lourici and Djelloul Messadi

The theme of this paper is to foresee relative retention time of 122 volatile organic compounds. QSRR analysis was accomplished on a serial of 122 VOCs. Multiple Linear Regression (MLR) and support vector machine (SVM) methods were used to build linear and nonlinear (QSRR) models, respectively, which correlate the (RTT) values of these chemical substance to their structural descriptors. At first, the data set was separated using Kennard and Stone algorithm into a training set (92 chemicals) and a test set (30 chemicals) for statistical external validation. The five-dimensional models were developed using as independent variables the theoretical descriptors derived from the DRAGON software during the application of the procedure GA (genetic algorithm) - VSS (Variable Subset Selection). The robustness and the predictive performance of the MLR model have been demonstrated by inside and outer statistical validation. Non-linear technique leads to the best QSRR model with good internal and external predictive abilities. It is based on support vector machines using the RBF function for the optimal parameters values. The values of and were 17, 0.2, 0.2, respectively.


2019 ◽  
Vol 15 (4) ◽  
pp. 328-340 ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Napat Songtawee ◽  
Suphakit Siriwong ◽  
Supaluk Prachayasittikul ◽  
Chanin Nantasenamat ◽  
...  

Background: Human immunodeficiency virus (HIV) is an infective agent that causes an acquired immunodeficiency syndrome (AIDS). Therefore, the rational design of inhibitors for preventing the progression of the disease is required. Objective: This study aims to construct quantitative structure-activity relationship (QSAR) models, molecular docking and newly rational design of colchicine and derivatives with anti-HIV activity. Methods: A data set of 24 colchicine and derivatives with anti-HIV activity were employed to develop the QSAR models using machine learning methods (e.g. multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM)), and to study a molecular docking. Results: The significant descriptors relating to the anti-HIV activity included JGI2, Mor24u, Gm and R8p+ descriptors. The predictive performance of the models gave acceptable statistical qualities as observed by correlation coefficient (Q2) and root mean square error (RMSE) of leave-one out cross-validation (LOO-CV) and external sets. Particularly, the ANN method outperformed MLR and SVM methods that displayed LOO−CV 2 Q and RMSELOO-CV of 0.7548 and 0.5735 for LOOCV set, and Ext 2 Q of 0.8553 and RMSEExt of 0.6999 for external validation. In addition, the molecular docking of virus-entry molecule (gp120 envelope glycoprotein) revealed the key interacting residues of the protein (cellular receptor, CD4) and the site-moiety preferences of colchicine derivatives as HIV entry inhibitors for binding to HIV structure. Furthermore, newly rational design of colchicine derivatives using informative QSAR and molecular docking was proposed. Conclusion: These findings serve as a guideline for the rational drug design as well as potential development of novel anti-HIV agents.


2014 ◽  
Vol 79 (8) ◽  
pp. 965-975 ◽  
Author(s):  
Long Jiao ◽  
Xiaofei Wang ◽  
LI. Hua ◽  
Yunxia Wang

The quantitative structure property relationship (QSPR) for gas/particle partition coefficient, Kp, of polychlorinated biphenyls (PCBs) was investigated. Molecular distance-edge vector (MDEV) index was used as the structural descriptor of PCBs. The quantitative relationship between the MDEV index and log Kp was modeled by multivariate linear regression (MLR) and artificial neural network (ANN) respectively. Leave one out cross validation and external validation were carried out to assess the prediction ability of the developed models. When the MLR method is used, the root mean square relative error (RMSRE) of prediction for leave one out cross validation and external validation is 4.72 and 8.62 respectively. When the ANN method is employed, the prediction RMSRE of leave one out cross validation and external validation is 3.87 and 7.47 respectively. It is demonstrated that the developed models are practicable for predicting the Kp of PCBs. The MDEV index is shown to be quantitatively related to the Kp of PCBs.


2018 ◽  
Vol 6 (2) ◽  
pp. 69-92 ◽  
Author(s):  
Asanka G. Perera ◽  
Yee Wei Law ◽  
Ali Al-Naji ◽  
Javaan Chahl

Purpose The purpose of this paper is to present a preliminary solution to address the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. Design/methodology/approach The distinguishing feature of the solution is a dynamic classifier selection architecture. Each video frame is corrected for perspective using projective transformation. Then, a silhouette is extracted as a Histogram of Oriented Gradients (HOG). The HOG is then classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. The dynamic classifier consists of a Support Vector Machine (SVM) classifier C64 that recognizes all 64 classes, and 64 SVM classifiers that recognize four classes each – these four classes are chosen based on the temporal relationship between them, dictated by the gait sequence. Findings The solution provides three main advantages: first, classification is efficient due to dynamic selection (4-class vs 64-class classification). Second, classification errors are confined to neighbors of the true viewpoints. This means a wrongly estimated viewpoint is at most an adjacent viewpoint of the true viewpoint, enabling fast recovery from incorrect estimations. Third, the robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Originality/value Experiments conducted on both fronto-parallel videos and aerial videos confirm that the solution can achieve accurate pose and trajectory estimation for these different kinds of videos. For example, the “walking on an 8-shaped path” data set (1,652 frames) can achieve the following estimation accuracies: 85 percent for viewpoints and 98.14 percent for poses.


2019 ◽  
Vol 47 (3) ◽  
pp. 154-170
Author(s):  
Janani Balakumar ◽  
S. Vijayarani Mohan

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.


2019 ◽  
Vol 12 (4) ◽  
pp. 466-480
Author(s):  
Li Na ◽  
Xiong Zhiyong ◽  
Deng Tianqi ◽  
Ren Kai

Purpose The precise segmentation of brain tumors is the most important and crucial step in their diagnosis and treatment. Due to the presence of noise, uneven gray levels, blurred boundaries and edema around the brain tumor region, the brain tumor image has indistinct features in the tumor region, which pose a problem for diagnostics. The paper aims to discuss these issues. Design/methodology/approach In this paper, the authors propose an original solution for segmentation using Tamura Texture and ensemble Support Vector Machine (SVM) structure. In the proposed technique, 124 features of each voxel are extracted, including Tamura texture features and grayscale features. Then, these features are ranked using the SVM-Recursive Feature Elimination method, which is also adopted to optimize the parameters of the Radial Basis Function kernel of SVMs. Finally, the bagging random sampling method is utilized to construct the ensemble SVM classifier based on a weighted voting mechanism to classify the types of voxel. Findings The experiments are conducted over a sample data set to be called BraTS2015. The experiments demonstrate that Tamura texture is very useful in the segmentation of brain tumors, especially the feature of line-likeness. The superior performance of the proposed ensemble SVM classifier is demonstrated by comparison with single SVM classifiers as well as other methods. Originality/value The authors propose an original solution for segmentation using Tamura Texture and ensemble SVM structure.


Kybernetes ◽  
2014 ◽  
Vol 43 (8) ◽  
pp. 1150-1164 ◽  
Author(s):  
Bilal M’hamed Abidine ◽  
Belkacem Fergani ◽  
Mourad Oussalah ◽  
Lamya Fergani

Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.


Sign in / Sign up

Export Citation Format

Share Document