Variable-importance Measures

AbstractThe increased prevalence of childhood obesity is expected to translate in the near future into a concomitant soaring of multiple cardio-metabolic diseases. Obesity has a complex, multifactorial etiology, that includes multiple and multidomain potential risk factors: genetics, dietary and physical activity habits, socio-economic environment, lifestyle, etc. In addition, all these factors are expected to exert their influence through a specific and especially convoluted way during childhood, given the fast growth along this period. Machine Learning methods are the appropriate tools to model this complexity, given their ability to cope with high-dimensional, non-linear data. Here, we have analyzed by Machine Learning a sample of 221 children (6–9 years) from Madrid, Spain. Both Random Forest and Gradient Boosting Machine models have been derived to predict the body mass index from a wide set of 190 multidomain variables (including age, sex, genetic polymorphisms, lifestyle, socio-economic, diet, exercise, and gestation ones). A consensus relative importance of the predictors has been estimated through variable importance measures, implemented robustly through an iterative process that included permutation and multiple imputation. We expect this analysis will help to shed light on the most important variables associated to childhood obesity, in order to choose better treatments for its prevention.

Download Full-text

Sign-consistency based variable importance for machine learning in brain imaging

10.1101/124453 ◽

2017 ◽

Author(s):

Vanessa Gómez-Verdejo ◽

Emilio Parrado-Hernández ◽

Jussi Tohka ◽

Keyword(s):

Brain Imaging ◽

Hypothesis Test ◽

Variable Importance ◽

Support Vector ◽

Single Subject ◽

Imaging Data ◽

Magnetic Resonance Imaging Data ◽

Extra Step ◽

Sign Consistency ◽

Variable Importance Measures

AbstractAn important problem that hinders the use of supervised classification algorithms for brain imaging is that the number of variables per single subject far exceeds the number of training subjects available. Deriving multivariate measures of variable importance becomes a challenge in such scenarios. This paper proposes a new measure of variable importance termed sign-consistency bagging (SCB). The SCB captures variable importance by analyzing the sign consistency of the corresponding weights in an ensemble of linear support vector machine (SVM) classifiers. Further, the SCB variable importances are enhanced by means of transductive conformal analysis. This extra step is important when the data can be assumed to be heterogeneous. Finally, the proposal of these SCB variable importance measures is completed with the derivation of a parametric hypothesis test of variable importance. The new importance measures were compared with a t-test based univariate and an SVM-based multivariate variable importances using anatomical and functional magnetic resonance imaging data. The obtained results demonstrated that the new SCB based importance measures were superior to the compared methods in terms of reproducibility and classification accuracy.

Download Full-text

An experimental study of the intrinsic stability of random forest variable importance measures

BMC Bioinformatics ◽

10.1186/s12859-016-0900-5 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 20

Author(s):

Huazhen Wang ◽

Fan Yang ◽

Zhiyuan Luo

Keyword(s):

Experimental Study ◽

Random Forest ◽

Variable Importance ◽

Intrinsic Stability ◽

Variable Importance Measures

Download Full-text

Using variable importance measures to identify a small set of SNPs to predict heading date in perennial ryegrass

Scientific Reports ◽

10.1038/s41598-017-03232-8 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 8

Author(s):

Stephen L. Byrne ◽

Patrick Conaghan ◽

Susanne Barth ◽

Sai Krishna Arojju ◽

Michael Casler ◽

...

Keyword(s):

Perennial Ryegrass ◽

Heading Date ◽

Variable Importance ◽

Small Set ◽

Variable Importance Measures

Download Full-text

Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework

Minerals ◽

10.3390/min10050420 ◽

2020 ◽

Vol 10 (5) ◽

pp. 420

Author(s):

Chris Aldrich

Keyword(s):

Random Forest ◽

Case Studies ◽

Random Forests ◽

Performance Indicator ◽

Variable Importance ◽

Predictor Variables ◽

Importance Measure ◽

Variable Importance Measure ◽

Operational Variables ◽

Variable Importance Measures

Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear models to date. In this paper, the approach is extended to random forests, and the results are compared with some of the empirical variable importance measures widely used with these models, i.e., permutation and Gini variable importance measures. Four case studies are considered, of which two are based on simulated data and two on real world data from the mineral process industries. These case studies suggest that the random forest Shapley variable importance measure may be a more reliable indicator of the influence of predictor variables than the other measures that were considered. Moreover, the results obtained with the Gini variable importance measure was as reliable or better than that obtained with the permutation measure of the random forest.

Download Full-text