Principles of QSAR Modeling

At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.

Download Full-text

Quantitative Structure–Activity Relationship and Molecular Docking Studies of Imidazolopyrimidine Amides as Potent Dipeptidyl Peptidase-4 (DPP4) Inhibitors

Journal of Pharmaceutical Research International ◽

10.9734/jpri/2019/v27i630186 ◽

2019 ◽

pp. 1-15 ◽

Cited By ~ 2

Author(s):

Leila Emami ◽

Razieh Sabet ◽

Amirhossein Sakhteman ◽

Mehdi Khoshnevis Zade

Keyword(s):

Genetic Algorithm ◽

Linear Regression ◽

Variable Selection ◽

Multiple Linear Regression ◽

Quantitative Structure Activity Relationship ◽

Dipeptidyl Peptidase ◽

Docking Studies ◽

Quantitative Structure ◽

Dipeptidyl Peptidase 4 ◽

Structure Activity

Type 2 diabetes (T2DM) is a metabolic disorder disease and DPP-4 inhibitors are a class of oral hypoglycemic that blocks the dipeptidyl peptidase-4 (DPP-4) enzyme. DPP-4 inhibitors reduce glucagon and blood glucose levels and don’t have side effects such as hypoglycemia or weight gain. In this paper, a series of imidazolopyrimidine amides analogues as DPP4 inhibitors were selected for quantitative structure-activity relationship (QSAR) analysis and docking studies. A collection of chemometric methods such as multiple linear regression (MLR), factor analysis-based multiple linear regression (FA-MLR), principal component regression (PCR), genetic algorithm for variable selection-MLR (GA-MLR) and partial least squared combined with genetic algorithm for variable selection (GA-PLS), were conducted to make relations between structural features and DPP4 inhibitory of a variety of imidazolopyrimidine amides derivatives. GA-PLS represented superior results with high statistical quality (R2 = 0.94 and Q2 = 0.80) for predicting the activity of the compounds. Docking studies of these compounds reveals and confirms that compounds 15, 18, 25, 26, and 28 are introduced as good candidates for DPP-4 inhibitors were introduced as a good candidate for DPP-4 inhibitory compounds.

Download Full-text

QSAR Models for Predicting Aquatic Toxicity of Esters Using Genetic Algorithm-Multiple Linear Regression Methods

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666190618150856 ◽

2019 ◽

Vol 22 (5) ◽

pp. 317-325

Author(s):

Mehdi Rajabi ◽

Fatemeh Shafiei

Keyword(s):

Genetic Algorithm ◽

Linear Regression ◽

Multiple Linear Regression ◽

Molecular Descriptors ◽

Biological Activities ◽

Aquatic Toxicity ◽

Qsar Model ◽

Data Set ◽

Test Set ◽

Aliphatic Esters

Aim and Objective: Esters are of great importance in industry, medicine, and space studies. Therefore, studying the toxicity of esters is very important. In this research, a Quantitative Structure–Activity Relationship (QSAR) model was proposed for the prediction of aquatic toxicity (log 1/IGC50) of aliphatic esters towards Tetrahymena pyriformis using molecular descriptors. Materials and Methods: A data set of 48 aliphatic esters was separated into a training set of 34 compounds and a test set of 14 compounds. A large number of molecular descriptors were calculated with Dragon software. The Genetic Algorithm (GA) and Multiple Linear Regression (MLR) methods were used to select the suitable descriptors and to generate the correlation models that relate the chemical structural features to the biological activities. Results: The predictive powers of the MLR models are discussed by using Leave-One-Out (LOO) cross-validation and external test set. The best QSAR model is obtained with R2 value of 0.899, Q2 LOO =0.928, F=137.73, RMSE=0.263. Conclusion: The predictive ability of the GA-MLR model with two selected molecular descriptors is satisfactory and it can be used for designing similar group and predicting of toxicity (log 1/IGC50) of ester derivatives.

Download Full-text

Genetic algorithm as a variable selection procedure for the simulation of 13C nuclear magnetic resonance spectra of flavonoid derivatives using multiple linear regression

Journal of Molecular Graphics and Modelling ◽

10.1016/j.jmgm.2008.03.004 ◽

2008 ◽

Vol 27 (2) ◽

pp. 105-115 ◽

Cited By ~ 13

Author(s):

Raoof Ghavami ◽

Amir Najafi ◽

Mohammad Sajadi ◽

Farhad Djannaty

Keyword(s):

Genetic Algorithm ◽

Nuclear Magnetic Resonance ◽

Magnetic Resonance ◽

Linear Regression ◽

Variable Selection ◽

Multiple Linear Regression ◽

Selection Procedure ◽

Variable Selection Procedure ◽

Nuclear Magnetic Resonance Spectra ◽

13C Nuclear Magnetic Resonance

Download Full-text

Variable Selection in Multiple Linear Regression Using a Genetic Algorithm

Handbook of Research on Modern Optimization Algorithms and Applications in Engineering and Economics - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-9644-0.ch005 ◽

2016 ◽

pp. 133-159 ◽

Cited By ~ 1

Author(s):

Javier Trejos ◽

Mario A. Villalobos-Arias ◽

Jose Luis Espinoza

Keyword(s):

Genetic Algorithm ◽

Linear Regression ◽

Variable Selection ◽

Multiple Linear Regression ◽

Least Squares ◽

Fitness Function ◽

Data Sets ◽

Genetic Operators ◽

Wheel Model ◽

Crossover And Mutation

In this article it is studied the application of a genetic algorithm in the problem of variable selection for multiple linear regression, minimizing the least squares criterion. The algorithm is based on a chromosomic representation of variables that are considered in the least squares model. A binary chromosome indicates the presence (1) or absence (0) of a variable in the model. The fitness function is based on the adjusted square R, proportional to the fitness for chromosome selection in a roulette wheel model selection. Usual genetic operators, such as crossover and mutation are implemented. Comparisons are performed with benchmark data sets, obtaining satisfying and promising results.

Download Full-text

Determination of Ethanol and Methyl Tert-Butyl Ether (MTBE) in Gasoline by NIR–AOTF-based Spectroscopy and Multiple Linear Regression with Variables Selected by Genetic Algorithm

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.154 ◽

1998 ◽

Vol 6 (1) ◽

pp. 333-339 ◽

Cited By ~ 12

Author(s):

Renato Guchardi ◽

Paulo Augusto da Costa Filho ◽

Ronei J. Poppi ◽

Celio Pasquini

Keyword(s):

Genetic Algorithm ◽

Linear Regression ◽

Variable Selection ◽

Multiple Linear Regression ◽

Near Infrared ◽

Methyl Tert Butyl Ether ◽

Special Importance ◽

Butyl Ether ◽

Tert Butyl

This paper describes a near infrared spectroscopic method developed for determination of ethanol and methyl tert-butyl ether (MTBE) as additives in gasoline. The methodology employs data collected from a near infrared spectrophotometer whose monochromator is an Acousto-Optic Tunable Filter (AOTF) operating in the 1500–2400 nm range. Genetic Algorithm variable selection was used in the multiple linear regression (MLR) modelling. Seven wavelengths were selected by the algorithm and the results obtained by MLR revealed that the method produces improved results, when compared with the PLS regression method, as confirmed by the lower RMSEP obtained for ethanol and MTBE determination. Besides the improvement achieved in the analytical results, the variable selection allows a reduction in the time necessary for data acquisition. This fact has special importance when AOTFs are being used as the monochromator element. The AOTF's capability of random access to the selected wavelengths can be employed to access the necessary information very rapidly, enabling the methodology to be used for in-line monitoring of fuel additives.

Download Full-text