Optimal Feature Set Size in Random Forest Regression

One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.

Download Full-text

On the Optimal Size of Candidate Feature Set in Random forest

Applied Sciences ◽

10.3390/app9050898 ◽

2019 ◽

Vol 9 (5) ◽

pp. 898 ◽

Cited By ~ 3

Author(s):

Sunwoo Han ◽

Hyunjoong Kim

Keyword(s):

Random Forest ◽

Specific Pattern ◽

Search Method ◽

Optimal Size ◽

Grid Search ◽

Random Subset ◽

Typical Size ◽

Grid Search Method ◽

Candidate Feature ◽

Novel Algorithm

Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the ‘SearchSize’ exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.

Download Full-text

Optimized Hyperparameter Tuned Random Forest Regressor Algorithm in Predicting Resale Car Value based on Grid Search Method

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1217 ◽

2021 ◽

pp. 106-113

Author(s):

Aruna M ◽

M Anjana ◽

Harshita Chauhan ◽

Deepa R

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Trees ◽

Search Method ◽

Grid Search ◽

Random Forest Regression ◽

Prediction Rate ◽

Grid Search Method ◽

Randomized Search ◽

The Cost

The price of a car depreciates right from the time it is bought. The resale value of cars is influenced by many factors and influences both buyers and sellers, making it a prominent problem in the machine learning field. Diverse methodologies in machine learning can help us use all the varied factors and process a large amount of data to predict the cost. For our dataset, the Random Forest Regression algorithm shows a significant increase in the prediction rate. In order to optimise the Random Forest Regressor model, best hyperparameters can be found using hyperparameter tuning strategies. On comparing Grid Search and Randomized Search, a better prediction rate is accounted for using the former. These parameters are then passed to the algorithm as hyperparameter tuning can help collect the best batch of decision trees in the random forest for the most optimised prediction rate.

Download Full-text

A Random Forest Regression Model Predicting the Winners of Summer Olympic Events

Proceedings of the 2020 2nd International Conference on Big Data Engineering ◽

10.1145/3404512.3404513 ◽

2020 ◽

Author(s):

Mengjie Jia ◽

Yue Zhao ◽

Furong Chang ◽

Bofeng Zhang ◽

Kenji Yoshigoe

Keyword(s):

Random Forest ◽

Regression Model ◽

Random Forest Regression

Download Full-text

Deterministic and probabilistic occupancy detection with a novel heuristic optimization and Back-Propagation (BP) based algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189748 ◽

2021 ◽

pp. 1-13

Author(s):

Nuzhat Fatema ◽

Saeid Gholami Farkoush ◽

Mashhood Hasan ◽

H Malik

Keyword(s):

Convergence Rate ◽

Search Algorithm ◽

Gravitational Search Algorithm ◽

Hybrid Approach ◽

Experimental Studies ◽

Back Propagation ◽

Heuristic Optimization ◽

Local Minima ◽

Occupancy Detection ◽

Trapping Problem

In this paper, a novel hybrid approach for deterministic and probabilistic occupancy detection is proposed with a novel heuristic optimization and Back-Propagation (BP) based algorithms. Generally, PB based neural network (BPNN) suffers with the optimal value of weight, bias, trapping problem in local minima and sluggish convergence rate. In this paper, the GSA (Gravitational Search Algorithm) is implemented as a new training technique for BPNN is order to enhance the performance of the BPNN algorithm by decreasing the problem of trapping in local minima, enhance the convergence rate and optimize the weight and bias value to reduce the overall error. The experimental results of BPNN with and without GSA are demonstrated and presented for fair comparison and adoptability. The demonstrated results show that BPNNGSA has outperformance for training and testing phase in form of enhancement of processing speed, convergence rate and avoiding the trapping problem of standard BPNN. The whole study is analyzed and demonstrated by using R language open access platform. The proposed approach is validated with different hidden-layer neurons for both experimental studies based on BPNN and BPNNGSA.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Real-Time Moisture Ratio Study of Drying Date Fruit Chips Based on On-Line Image Attributes Using kNN and Random Forest Regression Methods

Measurement ◽

10.1016/j.measurement.2020.108899 ◽

2020 ◽

pp. 108899

Author(s):

Madi Keramat-Jahromi ◽

Seyed Saeid Mohtasebi ◽

Hossein Mousazadeh ◽

Mahdi Ghasemi-Varnamkhasri ◽

Maryam Rahimi-Movassagh

Keyword(s):

Random Forest ◽

Real Time ◽

Moisture Ratio ◽

Random Forest Regression ◽

Regression Methods ◽

Line Image ◽

Date Fruit ◽

On Line

Download Full-text

Random forest regression results in accurate assessment of potato nitrogen status based on multispectral data from different platforms and the critical concentration approach

Field Crops Research ◽

10.1016/j.fcr.2021.108158 ◽

2021 ◽

Vol 268 ◽

pp. 108158

Author(s):

Junxiang Peng ◽

Kiril Manevski ◽

Kirsten Kørup ◽

René Larsen ◽

Mathias Neumann Andersen

Keyword(s):

Random Forest ◽

Critical Concentration ◽

Accurate Assessment ◽

Nitrogen Status ◽

Random Forest Regression ◽

Multispectral Data

Download Full-text

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

Geoscientific Model Development ◽

10.5194/gmd-12-1209-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 1209-1225 ◽

Cited By ~ 15

Author(s):

Christoph A. Keller ◽

Mat J. Evans

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Phase ◽

Atmospheric Chemistry ◽

Random Forest Regression ◽

Data Set ◽

Gas Phase Chemistry ◽

Chemical Conditions ◽

Phase Chemistry ◽

The Impact

Abstract. Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry transport models. Our training data consist of 1 month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model v10. From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine-learning-driven GEOS-Chem model compares well to the standard simulation. For ozone (O3), errors from using the random forests (compared to the reference simulation) grow slowly and after 5 days the normalized mean bias (NMB), root mean square error (RMSE) and R2 are 4.2 %, 35 % and 0.9, respectively; after 30 days the errors increase to 13 %, 67 % and 0.75, respectively. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10 % and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short-lived nitrogen species, errors become large, with NMB, RMSE and R2 reaching >2100 % >400 % and <0.1, respectively. This proof-of-concept implementation takes 1.8 times more time than the direct integration of the differential equations, but optimization and software engineering should allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations, and its applicability to operational air quality activities.

Download Full-text