Improved Variable Importance Measure of Random Forest via Combining of Proximity Measure and Support Vector Machine for Stable Feature Selection

2015 ◽  
Vol 12 (8) ◽  
pp. 3241-3252 ◽  
Author(s):  
Huazhen Wang
2021 ◽  
Vol 128 (1) ◽  
pp. 65-85
Author(s):  
Shufang Song ◽  
Ruyang He ◽  
Zhaoyin Shi ◽  
Weiya Zhang

Minerals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 420
Author(s):  
Chris Aldrich

Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear models to date. In this paper, the approach is extended to random forests, and the results are compared with some of the empirical variable importance measures widely used with these models, i.e., permutation and Gini variable importance measures. Four case studies are considered, of which two are based on simulated data and two on real world data from the mineral process industries. These case studies suggest that the random forest Shapley variable importance measure may be a more reliable indicator of the influence of predictor variables than the other measures that were considered. Moreover, the results obtained with the Gini variable importance measure was as reliable or better than that obtained with the permutation measure of the random forest.


2019 ◽  
Author(s):  
Jeiran Choupan ◽  
Yaniv Gal ◽  
Pamela K. Douglas ◽  
Mark S. Cohen ◽  
David C. Reutens ◽  
...  

AbstractThe importance of spatiotemporal feature selection in fMRI decoding studies has not been studied exhaustively. Temporal embedding of features allows the incorporation of brain activity dynamics into multivariate pattern classification, and may provide enriched information about stimulus-specific response patterns and potentially improve prediction accuracy. This study investigates the possibility of enhancing the classification performance by exploring spatial and temporal (spatiotemporal) domain, to identify the optimum combination of the spatiotemporal features based on the classification performance. We investigated the importance of spatiotemporal feature selection using a slow event-related design adapted from the classic Haxby et al. (2001) study. Data were collected using a multiband fMRI sequence with temporal resolution of 0.568 seconds. A wide range of spatiotemporal observations was created as various combinations of spatiotemporal features. Using both random forest, and support vector machine, classifiers, prediction accuracies for these combinations were then compared with the single time-point spatial multivariate pattern approach that uses only a single temporal observation. The results showed that on average spatiotemporal feature selection improved prediction accuracy. Moreover, the random forest algorithm outperformed the support vector machine and benefitted from temporal information to a greater extent. As expected, the most influential temporal durations were found to be around the peak of the hemodynamic response function, a few seconds after the stimuli onset until ∼4 seconds after the peak of the hemodynamic response function. The superiority of spatiotemporal feature selection over single time-point spatial approaches invites future work to design systematic and optimal approaches to the incorporation of spatiotemporal dependencies into feature selection for decoding.HighlightsSpatiotemporal feature selection effect on MVPC was assessed in slow event-related fMRISpatiotemporal feature selection improved brain decoding accuracyFrom ∼2-11 seconds after stimuli onset were the most informative part of each trialRandom forest outperformed support vector machinesRandom forest benefited more from temporal changes compared with support vector machine


2019 ◽  
Vol 8 (2S3) ◽  
pp. 1630-1635

In the present century, various classification issues are raised with large data and most commonly used machine learning algorithms are failed in the classification process to get accurate results. Datamining techniques like ensemble, which is made up of individual classifiers for the classification process and to generate the new data as well. Random forest is one of the ensemble supervised machine learning technique and essentially used in numerous machine learning applications such as the classification of text and image data. It is popular since it collects more relevant features such as variable importance measure, Out-of-bag error etc. For the viable learning and classification of random forest, it is required to reduce the number of decision trees (Pruning) in the random forest. In this paper, we have presented systematic overview of random forest algorithm along with its application areas. In addition, we presented a brief review of machine learning algorithm proposed in the recent years. Animal classification is considered as an important problem and most of the recent studies are classifying the animals by taking the image dataset. But, very less work has been done on attribute-oriented animal classification and poses many challenges in the process of extracting the accurate features. We have taken a real-time dataset from the Kaggle to classify the animal by collecting the more relevant features with the help of variable importance measure metric and compared with the other popular machine learning models.


2021 ◽  
Vol 11 (24) ◽  
pp. 11988
Author(s):  
Robin Singh Bhadoria ◽  
Naman Bhoj ◽  
Hatim G. Zaini ◽  
Vivek Bisht ◽  
Md. Manzar Nezami ◽  
...  

Advancement in network technology has vastly increased the usage of the Internet. Consequently, there has been a rise in traffic volume and data sharing. This has made securing a network from sophisticated intrusion attacks very important to preserve users’ information and privacy. Our research focuses on combating and detecting intrusion attacks and preserving the integrity of online systems. In our research we first create a benchmark model for detecting intrusions and then employ various combinations of feature selection techniques based upon ensemble machine learning algorithms to improve the performance of the intrusion detection system. The performance of our model was investigated using three evaluation metrics namely: elimination time, accuracy and F1-score. The results of the experiment indicated that the random forest feature selection technique had the minimum elimination time, whereas the support vector machine model had the best accuracy and F1-score. Therefore, conclusive evidence could be drawn that the combination of random forest and support vector machine is suitable for low latency and highly accurate intrusion detection systems.


2021 ◽  
Author(s):  
Shuhei Kimura ◽  
Yahiro Takeda ◽  
Masato Tokuhisa ◽  
Mariko Okada

Abstract Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.


Sign in / Sign up

Export Citation Format

Share Document