scholarly journals Principles of QSAR Modeling

Author(s):  
Paola Gramatica

At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.

Author(s):  
Leila Emami ◽  
Razieh Sabet ◽  
Amirhossein Sakhteman ◽  
Mehdi Khoshnevis Zade

Type 2 diabetes (T2DM) is a metabolic disorder disease and DPP-4 inhibitors are a class of oral hypoglycemic that blocks the dipeptidyl peptidase-4 (DPP-4) enzyme.  DPP-4 inhibitors reduce glucagon and blood glucose levels and don’t have side effects such as hypoglycemia or weight gain. In this paper, a series of imidazolopyrimidine amides analogues as DPP4 inhibitors were selected for quantitative structure-activity relationship (QSAR) analysis and docking studies. A collection of chemometric methods such as multiple linear regression (MLR), factor analysis-based multiple linear regression (FA-MLR), principal component regression (PCR), genetic algorithm for variable selection-MLR (GA-MLR) and partial least squared combined with genetic algorithm for variable selection (GA-PLS), were conducted to make relations between structural features and DPP4 inhibitory of a variety of imidazolopyrimidine amides derivatives. GA-PLS represented superior results with high statistical quality (R2 = 0.94 and Q2 = 0.80) for predicting the activity of the compounds. Docking studies of these compounds reveals and confirms that compounds 15, 18, 25, 26, and 28 are introduced as good candidates for DPP-4 inhibitors were introduced as a good candidate for DPP-4 inhibitory compounds.


2019 ◽  
Vol 22 (5) ◽  
pp. 317-325
Author(s):  
Mehdi Rajabi ◽  
Fatemeh Shafiei

Aim and Objective: Esters are of great importance in industry, medicine, and space studies. Therefore, studying the toxicity of esters is very important. In this research, a Quantitative Structure–Activity Relationship (QSAR) model was proposed for the prediction of aquatic toxicity (log 1/IGC50) of aliphatic esters towards Tetrahymena pyriformis using molecular descriptors. Materials and Methods: A data set of 48 aliphatic esters was separated into a training set of 34 compounds and a test set of 14 compounds. A large number of molecular descriptors were calculated with Dragon software. The Genetic Algorithm (GA) and Multiple Linear Regression (MLR) methods were used to select the suitable descriptors and to generate the correlation models that relate the chemical structural features to the biological activities. Results: The predictive powers of the MLR models are discussed by using Leave-One-Out (LOO) cross-validation and external test set. The best QSAR model is obtained with R2 value of 0.899, Q2 LOO =0.928, F=137.73, RMSE=0.263. Conclusion: The predictive ability of the GA-MLR model with two selected molecular descriptors is satisfactory and it can be used for designing similar group and predicting of toxicity (log 1/IGC50) of ester derivatives.


Author(s):  
Javier Trejos ◽  
Mario A. Villalobos-Arias ◽  
Jose Luis Espinoza

In this article it is studied the application of a genetic algorithm in the problem of variable selection for multiple linear regression, minimizing the least squares criterion. The algorithm is based on a chromosomic representation of variables that are considered in the least squares model. A binary chromosome indicates the presence (1) or absence (0) of a variable in the model. The fitness function is based on the adjusted square R, proportional to the fitness for chromosome selection in a roulette wheel model selection. Usual genetic operators, such as crossover and mutation are implemented. Comparisons are performed with benchmark data sets, obtaining satisfying and promising results.


1998 ◽  
Vol 6 (1) ◽  
pp. 333-339 ◽  
Author(s):  
Renato Guchardi ◽  
Paulo Augusto da Costa Filho ◽  
Ronei J. Poppi ◽  
Celio Pasquini

This paper describes a near infrared spectroscopic method developed for determination of ethanol and methyl tert-butyl ether (MTBE) as additives in gasoline. The methodology employs data collected from a near infrared spectrophotometer whose monochromator is an Acousto-Optic Tunable Filter (AOTF) operating in the 1500–2400 nm range. Genetic Algorithm variable selection was used in the multiple linear regression (MLR) modelling. Seven wavelengths were selected by the algorithm and the results obtained by MLR revealed that the method produces improved results, when compared with the PLS regression method, as confirmed by the lower RMSEP obtained for ethanol and MTBE determination. Besides the improvement achieved in the analytical results, the variable selection allows a reduction in the time necessary for data acquisition. This fact has special importance when AOTFs are being used as the monochromator element. The AOTF's capability of random access to the selected wavelengths can be employed to access the necessary information very rapidly, enabling the methodology to be used for in-line monitoring of fuel additives.


Sign in / Sign up

Export Citation Format

Share Document