Influential data cases when the criterion is used for variable selection in multiple linear regression

2006 ◽  
Vol 50 (7) ◽  
pp. 1840-1854 ◽  
Author(s):  
S.J. Steel ◽  
D.W. Uys
Author(s):  
Leila Emami ◽  
Razieh Sabet ◽  
Amirhossein Sakhteman ◽  
Mehdi Khoshnevis Zade

Type 2 diabetes (T2DM) is a metabolic disorder disease and DPP-4 inhibitors are a class of oral hypoglycemic that blocks the dipeptidyl peptidase-4 (DPP-4) enzyme.  DPP-4 inhibitors reduce glucagon and blood glucose levels and don’t have side effects such as hypoglycemia or weight gain. In this paper, a series of imidazolopyrimidine amides analogues as DPP4 inhibitors were selected for quantitative structure-activity relationship (QSAR) analysis and docking studies. A collection of chemometric methods such as multiple linear regression (MLR), factor analysis-based multiple linear regression (FA-MLR), principal component regression (PCR), genetic algorithm for variable selection-MLR (GA-MLR) and partial least squared combined with genetic algorithm for variable selection (GA-PLS), were conducted to make relations between structural features and DPP4 inhibitory of a variety of imidazolopyrimidine amides derivatives. GA-PLS represented superior results with high statistical quality (R2 = 0.94 and Q2 = 0.80) for predicting the activity of the compounds. Docking studies of these compounds reveals and confirms that compounds 15, 18, 25, 26, and 28 are introduced as good candidates for DPP-4 inhibitors were introduced as a good candidate for DPP-4 inhibitory compounds.


Author(s):  
Paola Gramatica

At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.


Author(s):  
Javier Trejos ◽  
Mario A. Villalobos-Arias ◽  
Jose Luis Espinoza

In this article it is studied the application of a genetic algorithm in the problem of variable selection for multiple linear regression, minimizing the least squares criterion. The algorithm is based on a chromosomic representation of variables that are considered in the least squares model. A binary chromosome indicates the presence (1) or absence (0) of a variable in the model. The fitness function is based on the adjusted square R, proportional to the fitness for chromosome selection in a roulette wheel model selection. Usual genetic operators, such as crossover and mutation are implemented. Comparisons are performed with benchmark data sets, obtaining satisfying and promising results.


Sign in / Sign up

Export Citation Format

Share Document