Chemometric modeling to predict aquatic toxicity of benzene derivatives in Pimephales Promelas

2016 ◽  
Vol 27 (3) ◽  
pp. 299-312
Author(s):  
Nadia Ziani ◽  
Khadidja Amirat ◽  
Djelloul Messadi

Purpose – The purpose of this paper is to predict the aquatic toxicity (LC50) of 92 substituted benzenes derivatives in Pimephales promelas. Design/methodology/approach – Quantitative structure-activity relationship analysis was performed on a series of 92 substituted benzenes derivatives using multiple linear regression (MLR), artificial neural network (ANN) and support vector machines (SVM) methods, which correlate aquatic toxicity (LC50) values of these chemicals to their structural descriptors. At first, the entire data set was split according to Kennard and Stone algorithm into a training set (74 chemicals) and a test set (18 chemical) for statistical external validation. Findings – Models with six descriptors were developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm – variable subset selection procedure. Originality/value – The values of Q2 and RMSE in internal validation for MLR, SVM, and ANN model were: (0.8829; 0.225), (0.8882; 0.222); (0.8980; 0.214), respectively and also for external validation were: (0.9538; 0.141); (0.947; 0.146); (0.9564; 0.146). The statistical parameters obtained for the three approaches are very similar, which confirm that our six parameters model is stable, robust and significant.

2020 ◽  
Vol 85 (4) ◽  
pp. 467-480 ◽  
Author(s):  
Rana Amiri ◽  
Djelloul Messadi ◽  
Amel Bouakkadia

This study aimed at predicting the n-octanol/water partition coefficient (Kow) of 43 organophosphorous insecticides. Quantitative structure?property relationship analysis was performed on the series of 43 insecticides using two different methods, linear (multiple linear regression, MLR) and non-linear (artificial neural network, ANN), which Kow values of these chemicals to their structural descriptors. First, the data set was separated with a duplex algorithm into a training set (28 chemicals) and a test set (15 chemicals) for statistical external validation. A model with four descriptors was developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm (GA)?variable subset selection (VSS) procedure. The values of statistical parameters, R2, Q2 ext, SDEPext and SDEC for the MLR (94.09 %, 92.43 %, 0.533 and 0.471, respectively) and ANN model (97.24 %, 92.17 %, 0.466 and 0.332, respectively) obtained for the three approaches are very similar, which confirmed that the employed four parameters model is stable, robust and significant.


2017 ◽  
Vol 28 (4) ◽  
pp. 579-592 ◽  
Author(s):  
Amel Bouakkadia ◽  
Leila Lourici ◽  
Djelloul Messadi

Purpose The purpose of this paper is to predict the octanol/water partition coefficient (Kow) of 43 organophosphorous compounds. Design/methodology/approach A quantitative structure-property relationship analysis was performed on a series of 43 pesticides using multiple linear regression and support vector machines methods, which correlate the octanol-water partition coefficient (Kow) values of these chemicals to their structural descriptors. At first, the data set was randomly separated into a training set (34 chemicals) and a test set (nine chemicals) for statistical external validation. Findings Models with three descriptors were developed using theoretical descriptors as independent variables derived from Dragon software while applying genetic algorithm-variable subset selection procedure. Originality/value The robustness and the predictive performance of the proposed linear model were verified using both internal and external statistical validation. One influential point which reinforces the model and an outlier were highlighted.


2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.


2019 ◽  
Vol 15 (4) ◽  
pp. 328-340 ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Napat Songtawee ◽  
Suphakit Siriwong ◽  
Supaluk Prachayasittikul ◽  
Chanin Nantasenamat ◽  
...  

Background: Human immunodeficiency virus (HIV) is an infective agent that causes an acquired immunodeficiency syndrome (AIDS). Therefore, the rational design of inhibitors for preventing the progression of the disease is required. Objective: This study aims to construct quantitative structure-activity relationship (QSAR) models, molecular docking and newly rational design of colchicine and derivatives with anti-HIV activity. Methods: A data set of 24 colchicine and derivatives with anti-HIV activity were employed to develop the QSAR models using machine learning methods (e.g. multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM)), and to study a molecular docking. Results: The significant descriptors relating to the anti-HIV activity included JGI2, Mor24u, Gm and R8p+ descriptors. The predictive performance of the models gave acceptable statistical qualities as observed by correlation coefficient (Q2) and root mean square error (RMSE) of leave-one out cross-validation (LOO-CV) and external sets. Particularly, the ANN method outperformed MLR and SVM methods that displayed LOO−CV 2 Q and RMSELOO-CV of 0.7548 and 0.5735 for LOOCV set, and Ext 2 Q of 0.8553 and RMSEExt of 0.6999 for external validation. In addition, the molecular docking of virus-entry molecule (gp120 envelope glycoprotein) revealed the key interacting residues of the protein (cellular receptor, CD4) and the site-moiety preferences of colchicine derivatives as HIV entry inhibitors for binding to HIV structure. Furthermore, newly rational design of colchicine derivatives using informative QSAR and molecular docking was proposed. Conclusion: These findings serve as a guideline for the rational drug design as well as potential development of novel anti-HIV agents.


Author(s):  
Tamiris Maria de Assis ◽  
Teodorico Castro Ramalho ◽  
Elaine Fontes Ferreira da Cunha

Background: The quantitative structure-activity relationship is an analysis method that can be applied for designing new molecules. In 1997, Hopfinger and coworkers developed the 4D-QSAR methodology aiming to eliminate the question of which conformation to use in a QSAR study. In this work, the 4D-QSAR methodology was used to quantitatively determine the influence of structural descriptors on the activity of aryl pyrimidine derivatives as inhibitors of the TGF-β1 receptor. The members of the TGF-β subfamily are interesting molecular targets, since they play an important function in the growth and development of cell cellular including proliferation, apoptosis, differentiation, epithelial-mesenchymal transition (EMT), and migration. In late stages, TGF-β exerts tumor-promoting effects, increasing tumor invasiveness, and metastasis. Therefore, TGF-β is an attractive target for cancer therapy. Objective: The major goal of the current research is to develop 4D-QSAR models aiming to propose new structures of aryl pyrimidine derivatives. Materials and Methods: Molecular dynamics simulation was carried out to generate the conformational ensemble profile of a data set with aryl pyrimidine derivatives. The conformations were overlaid into a three-dimensional cubic box, according to the three-ordered atom alignment. The occupation of the grid cells by the interaction of pharmacophore elements provides the grid cell occupancy descriptors (GCOD), the dependent variables used to build the 4D-QSAR models. The best models were validated (internal and external validation) using several statistical parameters. Docking molecular studies were performed to better understand the binding mode of pyrimidine derivatives inside the TGF-β active site. Results : The 4D-QSAR model presented seven descriptors and acceptable statistical parameters (R2 = 0.89, q2 = 0.68, R2pred = 0.65, r2m = 0.55, R2P = 0.68 and R2rand = 0.21) besides pharmacophores groups important for the activity of these compounds. The molecular docking studies helped to understand the pharmacophoric groups and proposed substituents that increase the potency of aryl pyrimidine derivatives. Conclusion: The best QSAR model showed adequate statistical parameters that ensure their fitness, robustness, and predictivity. Structural modifications were assessed, and five new structures were proposed as candidates for a drug for cancer treatment.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Monty Sutrisna ◽  
Dewi Tjia ◽  
Peng Wu

Purpose This paper aims to identify and examine the factors that influence construction industry-university (IU) collaboration and develop the likelihood model of a potential industry partner within the construction industry to collaborate with universities. Design/methodology/approach Mix method data collection including questionnaire survey and focus groups were used for data collection. The collected data were analysed using descriptive and inferential statistical methods to identify and examine factors. These findings were then used to develop the likelihood predictive model of IU collaboration. A well-known artificial neural network (ANN) model, was trained and cross-validated to develop the predictive model. Findings The study identified company size (number of employees and approximate annual turnover), the length of experience in the construction industry, previous IU collaboration, the importance of innovation and motivation of innovation for short term showed statistically significant influence on the likelihood of collaboration. The study also revealed there was an increase in interest amongst companies to engage the university in collaborative research. The ANN model successfully predicted the likelihood of a potential construction partner to collaborate with universities at the accuracy of 85.5%, which was considered as a reasonably good model. Originality/value The study investigated the nature of collaboration and the factors that can have an impact on the potential IU collaborations and based on that, introduced the implementation of machine learning approach to examine the likelihood of IU collaboration. While the developed model was derived from analysing data set from Western Australian construction industry, the methodology proposed here can be used as the basis of predictive developing models for construction industry elsewhere to help universities in assessing the likelihood for collaborating and partnering with the targeted construction companies.


2018 ◽  
Vol 6 (2) ◽  
pp. 69-92 ◽  
Author(s):  
Asanka G. Perera ◽  
Yee Wei Law ◽  
Ali Al-Naji ◽  
Javaan Chahl

Purpose The purpose of this paper is to present a preliminary solution to address the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. Design/methodology/approach The distinguishing feature of the solution is a dynamic classifier selection architecture. Each video frame is corrected for perspective using projective transformation. Then, a silhouette is extracted as a Histogram of Oriented Gradients (HOG). The HOG is then classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. The dynamic classifier consists of a Support Vector Machine (SVM) classifier C64 that recognizes all 64 classes, and 64 SVM classifiers that recognize four classes each – these four classes are chosen based on the temporal relationship between them, dictated by the gait sequence. Findings The solution provides three main advantages: first, classification is efficient due to dynamic selection (4-class vs 64-class classification). Second, classification errors are confined to neighbors of the true viewpoints. This means a wrongly estimated viewpoint is at most an adjacent viewpoint of the true viewpoint, enabling fast recovery from incorrect estimations. Third, the robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Originality/value Experiments conducted on both fronto-parallel videos and aerial videos confirm that the solution can achieve accurate pose and trajectory estimation for these different kinds of videos. For example, the “walking on an 8-shaped path” data set (1,652 frames) can achieve the following estimation accuracies: 85 percent for viewpoints and 98.14 percent for poses.


2019 ◽  
Vol 47 (3) ◽  
pp. 154-170
Author(s):  
Janani Balakumar ◽  
S. Vijayarani Mohan

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.


2019 ◽  
Vol 12 (4) ◽  
pp. 466-480
Author(s):  
Li Na ◽  
Xiong Zhiyong ◽  
Deng Tianqi ◽  
Ren Kai

Purpose The precise segmentation of brain tumors is the most important and crucial step in their diagnosis and treatment. Due to the presence of noise, uneven gray levels, blurred boundaries and edema around the brain tumor region, the brain tumor image has indistinct features in the tumor region, which pose a problem for diagnostics. The paper aims to discuss these issues. Design/methodology/approach In this paper, the authors propose an original solution for segmentation using Tamura Texture and ensemble Support Vector Machine (SVM) structure. In the proposed technique, 124 features of each voxel are extracted, including Tamura texture features and grayscale features. Then, these features are ranked using the SVM-Recursive Feature Elimination method, which is also adopted to optimize the parameters of the Radial Basis Function kernel of SVMs. Finally, the bagging random sampling method is utilized to construct the ensemble SVM classifier based on a weighted voting mechanism to classify the types of voxel. Findings The experiments are conducted over a sample data set to be called BraTS2015. The experiments demonstrate that Tamura texture is very useful in the segmentation of brain tumors, especially the feature of line-likeness. The superior performance of the proposed ensemble SVM classifier is demonstrated by comparison with single SVM classifiers as well as other methods. Originality/value The authors propose an original solution for segmentation using Tamura Texture and ensemble SVM structure.


Kybernetes ◽  
2014 ◽  
Vol 43 (8) ◽  
pp. 1150-1164 ◽  
Author(s):  
Bilal M’hamed Abidine ◽  
Belkacem Fergani ◽  
Mourad Oussalah ◽  
Lamya Fergani

Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.


Sign in / Sign up

Export Citation Format

Share Document