A new variable selection approach using Random Forests

2013 ◽  
Vol 60 ◽  
pp. 50-69 ◽  
Author(s):  
A. Hapfelmeier ◽  
K. Ulm
2021 ◽  
pp. 096228022110463
Author(s):  
Liangyuan Hu ◽  
Jung-Yi Joyce Lin ◽  
Jiayi Ji

Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic. Parametric regression are susceptible to misspecification, and as a result are sub-optimal for variable selection. Flexible machine learning methods mitigate the reliance on the parametric assumptions, but do not provide as naturally defined variable importance measure as the covariate effect native to parametric models. We investigate a general variable selection approach when both the covariates and outcomes can be missing at random and have general missing data patterns. This approach exploits the flexibility of machine learning models and bootstrap imputation, which is amenable to nonparametric methods in which the covariate effects are not directly available. We conduct expansive simulations investigating the practical operating characteristics of the proposed variable selection approach, when combined with four tree-based machine learning methods, extreme gradient boosting, random forests, Bayesian additive regression trees, and conditional random forests, and two commonly used parametric methods, lasso and backward stepwise selection. Numeric results suggest that, extreme gradient boosting and Bayesian additive regression trees have the overall best variable selection performance with respect to the [Formula: see text] score and Type I error, while the lasso and backward stepwise selection have subpar performance across various settings. There is no significant difference in the variable selection performance due to imputation methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome with data from the Study of Women’s Health Across the Nation.


2019 ◽  
Author(s):  
Sierra Bainter ◽  
Thomas Granville McCauley ◽  
Tor D Wager ◽  
Elizabeth Reynolds Losin

In this paper we address the problem of selecting important predictors from some larger set of candidate predictors. Standard techniques are limited by lack of power and high false positive rates. A Bayesian variable selection approach used widely in biostatistics, stochastic search variable selection, can be used instead to combat these issues by accounting for uncertainty in the other predictors of the model. In this paper we present Bayesian variable selection to aid researchers facing this common scenario, along with an online application (https://ssvsforpsych.shinyapps.io/ssvsforpsych/) to perform the analysis and visualize the results. Using an application to predict pain ratings, we demonstrate how this approach quickly identifies reliable predictors, even when the set of possible predictors is larger than the sample size. This technique is widely applicable to research questions that may be relatively data-rich, but with limited information or theory to guide variable selection.


2019 ◽  
Vol 158 (5) ◽  
pp. 210
Author(s):  
Bo Ning ◽  
Alexander Wise ◽  
Jessi Cisewski-Kehe ◽  
Sarah Dodson-Robinson ◽  
Debra Fischer

The Analyst ◽  
2014 ◽  
Vol 139 (19) ◽  
pp. 4836 ◽  
Author(s):  
Bai-chuan Deng ◽  
Yong-huan Yun ◽  
Yi-zeng Liang ◽  
Lun-zhao Yi

2021 ◽  
Author(s):  
Katrina L Kezios

Abstract In any research study, there is an underlying research process that should begin with a clear articulation of the study’s goal. The study’s goal drives this process; it determines many study features including the estimand of interest, the analytic approaches that can be used to estimate it, and which coefficients, if any, should be interpreted. “Misalignment” can occur in this process when analytic approaches and/or interpretations do not match the study’s goal; misalignment is potentially more likely to arise when study goals are ambiguously framed. This study documented misalignment in the observational epidemiologic literature and explored how the framing of study goals contributes to its occurrence. The following misalignments were examined: 1) use of an inappropriate variable selection approach for the goal (a “goal-methods” misalignment) and 2) interpretation of coefficients of variables for which causal considerations were not made (e.g., Table 2 Fallacy, a “goal-interpretation” misalignment). A random sample of 100 articles published 2014-2018 in the top 5 general epidemiology journals were reviewed. Most reviewed studies were causal, with either explicitly stated (13/103, 13%) or associationally-framed (71/103, 69%) aims. Full alignment of goal-methods-interpretations was infrequent (9/103, 9%), although clearly causal studies (5/13, 38%) were more often fully aligned than seemingly causal ones (3/71, 4%). Goal-methods misalignments were common (34/103, 33%), but most frequently, methods were insufficiently reported to draw conclusions (47/103, 46%). Goal-interpretations misalignments occurred in 31% (32/103) of studies and occurred less often when the methods were aligned (2/103, 2%) compared with when the methods were misaligned (13/103, 13%).


2003 ◽  
Vol 19 (1) ◽  
pp. 90-97 ◽  
Author(s):  
K. E. Lee ◽  
N. Sha ◽  
E. R. Dougherty ◽  
M. Vannucci ◽  
B. K. Mallick

Sign in / Sign up

Export Citation Format

Share Document