Optimized Hyperparameter Tuned Random Forest Regressor Algorithm in Predicting Resale Car Value based on Grid Search Method

The price of a car depreciates right from the time it is bought. The resale value of cars is influenced by many factors and influences both buyers and sellers, making it a prominent problem in the machine learning field. Diverse methodologies in machine learning can help us use all the varied factors and process a large amount of data to predict the cost. For our dataset, the Random Forest Regression algorithm shows a significant increase in the prediction rate. In order to optimise the Random Forest Regressor model, best hyperparameters can be found using hyperparameter tuning strategies. On comparing Grid Search and Randomized Search, a better prediction rate is accounted for using the former. These parameters are then passed to the algorithm as hyperparameter tuning can help collect the best batch of decision trees in the random forest for the most optimised prediction rate.

Download Full-text

Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency

DEStech Transactions on Computer Science and Engineering ◽

10.12783/dtcse/cece2017/14611 ◽

2017 ◽

Cited By ~ 3

Author(s):

Muhammad Murtadha RAMADHAN ◽

Imas Sukaesih SITANGGANG ◽

Fahrendi Rizky NASUTION ◽

Abdullah GHIFARI

Keyword(s):

Random Forest ◽

Parameter Tuning ◽

Search Method ◽

Gender Classification ◽

Grid Search ◽

Grid Search Method

Download Full-text

On the Optimal Size of Candidate Feature Set in Random forest

Applied Sciences ◽

10.3390/app9050898 ◽

2019 ◽

Vol 9 (5) ◽

pp. 898 ◽

Cited By ~ 3

Author(s):

Sunwoo Han ◽

Hyunjoong Kim

Keyword(s):

Random Forest ◽

Specific Pattern ◽

Search Method ◽

Optimal Size ◽

Grid Search ◽

Random Subset ◽

Typical Size ◽

Grid Search Method ◽

Candidate Feature ◽

Novel Algorithm

Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the ‘SearchSize’ exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.

Download Full-text

An Improved Model for Breast Cancer Classification Using Random Forest with Grid Search Method

Algorithms for Intelligent Systems - Proceedings of Second International Conference on Smart Energy and Communication ◽

10.1007/978-981-15-6707-0_39 ◽

2021 ◽

pp. 407-415

Author(s):

Yagya Buttan ◽

Alka Chaudhary ◽

Komal Saxena

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Search Method ◽

Cancer Classification ◽

Grid Search ◽

Breast Cancer Classification ◽

Grid Search Method ◽

Improved Model

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

Geoscientific Model Development ◽

10.5194/gmd-12-1209-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 1209-1225 ◽

Cited By ~ 15

Author(s):

Christoph A. Keller ◽

Mat J. Evans

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Phase ◽

Atmospheric Chemistry ◽

Random Forest Regression ◽

Data Set ◽

Gas Phase Chemistry ◽

Chemical Conditions ◽

Phase Chemistry ◽

The Impact

Abstract. Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry transport models. Our training data consist of 1 month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model v10. From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine-learning-driven GEOS-Chem model compares well to the standard simulation. For ozone (O3), errors from using the random forests (compared to the reference simulation) grow slowly and after 5 days the normalized mean bias (NMB), root mean square error (RMSE) and R2 are 4.2 %, 35 % and 0.9, respectively; after 30 days the errors increase to 13 %, 67 % and 0.75, respectively. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10 % and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short-lived nitrogen species, errors become large, with NMB, RMSE and R2 reaching >2100 % >400 % and <0.1, respectively. This proof-of-concept implementation takes 1.8 times more time than the direct integration of the differential equations, but optimization and software engineering should allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations, and its applicability to operational air quality activities.

Download Full-text

A new approach to grid search method in slope stability analysis using Box–Behnken statistical design

Applied Mathematics and Computation ◽

10.1016/j.amc.2015.01.022 ◽

2015 ◽

Vol 256 ◽

pp. 425-437 ◽

Cited By ~ 8

Author(s):

Srđan Kostić ◽

Nebojša Vasović ◽

Duško Sunarić

Keyword(s):

Stability Analysis ◽

Slope Stability ◽

Statistical Design ◽

Slope Stability Analysis ◽

Search Method ◽

Grid Search ◽

New Approach ◽

Grid Search Method

Download Full-text

Machine Learning Predictive Models for Pile Drivability: An Evaluation of Random Forest Regression and Multivariate Adaptive Regression Splines

Information Technology in Geo-Engineering - Springer Series in Geomechanics and Geoengineering ◽

10.1007/978-3-030-32029-4_21 ◽

2019 ◽

pp. 243-255

Author(s):

Wengang Zhang ◽

Chongzhi Wu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

Random Forest Regression ◽

Adaptive Regression ◽

Adaptive Regression Splines

Download Full-text

Development of an ensemble machine learning prognostic model to predict 60-day risk of major adverse cardiac events in adults with chest pain

10.1101/2021.03.08.21252615 ◽

2021 ◽

Author(s):

Chris J. Kennedy ◽

Dustin G. Mark ◽

Jie Huang ◽

Mark J. van der Laan ◽

Alan E. Hubbard ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Chest Pain ◽

Random Forest ◽

Decision Trees ◽

Low Risk ◽

Major Adverse Cardiac Events ◽

Risk Scores ◽

Cardiac Events ◽

Adverse Cardiac Events

Background: Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment. Objectives: We sought to assess machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that are accurately predicted to have less than 0.5% MACE risk and may be eligible for reduced testing. Population Studied: 116,764 adult patients presenting with chest pain in the ED and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 1.9%. Methods: We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). We benchmarked performance to key biomarkers, validated clinical risk scores, decision trees, and logistic regression. We explained the models through variable importance ranking and accumulated local effect visualization. Results: The best discrimination (area under the precision-recall [PR-AUC] and receiver operating characteristic [ROC-AUC] curves) was provided by SuperLearner ensembling (0.148, 0.867), followed by random forest (0.146, 0.862). Logistic regression (0.120, 0.842) and decision trees (0.094, 0.805) exhibited worse discrimination, as did risk scores [HEART (0.064, 0.765), EDACS (0.046, 0.733)] and biomarkers [serum troponin level (0.064, 0.708), electrocardiography (0.047, 0.686)]. The ensemble's risk estimates were miscalibrated by 0.2 percentage points. The ensemble accurately identified 50% of patients to be below a 0.5% 60-day MACE risk threshold. The most important predictors were age, peak troponin, HEART score, EDACS score, and electrocardiogram. GLRM imputation achieved 90% reduction in root mean-squared error compared to median-mode imputation. Conclusion: Use of ML algorithms, combined with broad predictor sets, improved MACE risk prediction compared to simpler alternatives, while providing calibrated predictions and interpretability. Standard risk scores may neglect important health information available in other characteristics and combined in nuanced ways via ML.

Download Full-text

Abstract 14955: Machine Learning Predicts Hemodynamic Instability in Children After Cardiac Surgery in Pediatric Intensive Care Unit (PICU)

Circulation ◽

10.1161/circ.142.suppl_3.14955 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Koichi Sughimoto ◽

Jacob Levman ◽

Fazleem Baig ◽

Derek Berger ◽

Yoshihiro Oshima ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Surgery ◽

Random Forest ◽

Decision Trees ◽

Blood Lactate ◽

Heart Surgery ◽

Ground Truth ◽

Hemodynamic Instability ◽

Serum Lactate ◽

Lactate Levels

Introduction: Despite improvements in management for children after cardiac surgery, a non-negligible proportion of patients suffer from cardiac arrest, having a poor prognosis. Although serum lactate levels are widely accepted markers of hemodynamic instability, measuring lactate requires discrete blood sampling. An alternative method to evaluate hemodynamic stability/instability continuously and non-invasively may assist in improving the standard of patient care. Hypothesis: We hypothesize that blood lactate in PICU patients can be predicted using machine learning applied to arterial waveforms and perioperative characteristics. Methods: Forty-eight children, who underwent heart surgery, were included. Patient characteristics and physiological measurements were acquired and analyzed using specialized software/hardware, including heart rate, lactate level, arterial waveform sharpness, and area under the curve. Predicting a patient’s blood lactate levels was accomplished using regression-based supervised learning algorithms, including regression decision trees, tuned decision trees, random forest regressor, tuned random forest, AdaBoost regressor, and hypertuned AdaBoost. All algorithms were compared with hold-out cross validation. Two approaches were considered: basing prediction on the currently acquired physiological measurements along with those acquired at admission, as well as adding the most recent lactate measurement and the time since that measurement as prediction parameters. The second approach supports updating the learning system’s predictive capacity whenever a patient has a new ground truth blood lactate reading acquired. Results: In both approaches, the best performing machine learning method was the tuned random forest, which yielded a mean absolute error of 5.60 mg/dL in the first approach, and 4.62 mg/dL when predicting blood lactate with updated ground truth. Conclusions: In conclusion, the tuned random forest is capable of predicting the level of serum lactate by analyzing perioperative variables, including the arterial pressure waveform. Machine learning can predict the patient’s hemodynamics non-invasively, continuously, and with accuracy that may demonstrate clinical utility.

Download Full-text

Detecting Cognitive Distraction Using Random Forest by Considering Eye Movement Type

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch069 ◽

2018 ◽

pp. 1587-1599

Author(s):

Hiroaki Koma ◽

Taku Harada ◽

Akira Yoshizawa ◽

Hirotoshi Iwasaki

Keyword(s):

Machine Learning ◽

Eye Movements ◽

Random Forest ◽

Decision Trees ◽

Eye Movement ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Still Images ◽

Cognitive Distraction ◽

Movement Type

Detecting distracted states can be applied to various problems such as danger prevention when driving a car. A cognitive distracted state is one example of a distracted state. It is known that eye movements express cognitive distraction. Eye movements can be classified into several types. In this paper, the authors detect a cognitive distraction using classified eye movement types when applying the Random Forest machine learning algorithm, which uses decision trees. They show the effectiveness of considering eye movement types for detecting cognitive distraction when applying Random Forest. The authors use visual experiments with still images for the detection.

Download Full-text