scholarly journals Random KNN feature selection - a fast and stable alternative to Random Forests

2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Shengqiao Li ◽  
E James Harner ◽  
Donald A Adjeroh
2021 ◽  
Author(s):  
Zhuo Wang ◽  
Huan Li ◽  
Bin Nie ◽  
Jianqiang Du ◽  
Yuwen Du ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 761
Author(s):  
Franc Drobnič ◽  
Andrej Kos ◽  
Matevž Pustišek

In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of “black box” character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model’s feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.


2017 ◽  
Vol 2645 (1) ◽  
pp. 157-167 ◽  
Author(s):  
Jishun Ou ◽  
Jingxin Xia ◽  
Yao-Jan Wu ◽  
Wenming Rao

Urban traffic flow forecasting is essential to proactive traffic control and management. Most existing forecasting methods depend on proper and reliable input features, for example, weather conditions and spatiotemporal lagged variables of traffic flow. However, the feature selection process is often done manually without comprehensive evaluation and leads to inaccurate results. For that challenge, this paper presents an approach combining the bias-corrected random forests algorithm with a data-driven feature selection strategy for short-term urban traffic flow forecasting. First, several input features were extracted from traffic flow time series data. Then the importance of these features was quantified with the permutation importance measure. Next, a data-driven feature selection strategy was introduced to identify the most important features. Finally, the forecasting model was built on the bias-corrected random forests algorithm and the selected features. The proposed approach was validated with data collected from three types of urban roads (expressway, major arterial, and minor arterial) in Kunshan City, China. The proposed approach was also compared with 10 existing approaches to verify its effectiveness. The results of the validation and comparison show that even without further model tuning, the proposed approach achieves the lowest average mean absolute error and root mean square error on six stations while it achieves the second-best average performance in mean absolute percentage error. Meanwhile, the training efficiency is improved compared with the original random forests method owing to the use of the feature selection strategy.


Sign in / Sign up

Export Citation Format

Share Document