A Hybrid Model Based on EMD-Feature Selection and Random Forest Method for Medical Data Forecasting

Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.

Download Full-text

Optimal feature selection for partial discharge recognition of cable systems based on the random forest method

2016 China International Conference on Electricity Distribution (CICED) ◽

10.1109/ciced.2016.7576360 ◽

2016 ◽

Cited By ~ 1

Author(s):

Xiaosheng Peng ◽

Guangyao Yang ◽

Shijie Zheng ◽

Lei Xiong ◽

Junyang Bai

Keyword(s):

Feature Selection ◽

Random Forest ◽

Partial Discharge ◽

Random Forest Method ◽

Cable Systems ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature

Download Full-text

A Hybrid Model Based on ANFIS and Nonlinear Feature Selection for Credit Risk Evaluation

DEStech Transactions on Social Science Education and Human Science ◽

10.12783/dtssehs/emass2018/20398 ◽

2018 ◽

Author(s):

Zhi-bin XIONG

Keyword(s):

Feature Selection ◽

Credit Risk ◽

Hybrid Model ◽

Risk Evaluation ◽

Model Based ◽

Selection For ◽

Nonlinear Feature

Download Full-text

A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103763 ◽

2021 ◽

pp. 103763

Author(s):

Jaime Lynn Speiser

Keyword(s):

Feature Selection ◽

Random Forest ◽

Longitudinal Data ◽

Prediction Models ◽

Random Forest Method ◽

Selection For

Download Full-text

An improved Random Forest based on Feature Selection and Feature weighting for case retrieval in CBR system Application to medical data

International Journal of Software Innovation ◽

10.4018/ijsi.293265 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Random Forest ◽

Medical Data ◽

Feature Weighting ◽

Diagnostic Process ◽

Case Based Reasoning ◽

Medical Databases ◽

Medical Diagnostic ◽

Past Experiences ◽

Retrieval Phase

: The medical diagnostic process works very similarly to the Case Based Reasoning (CBR) cycle scheme. CBR is a problem solving approach based on the reuse of past experiences called cases. To improve the performance of the retrieval phase, a Random Forest (RF) model is proposed, in this respect we used this algorithm in three different ways (three different algorithms): Classic Random Forest (CRF) algorithm, Random Forest with Feature Selection (RF_FS) algorithm where we selected the most important attributes and deleted the less important ones and Weighted Random Forest (WRF) algorithm where we weighted the most important attributes by giving them more weight. We did this by multiplying the entropy with the weight corresponding to each attribute.We tested our three algorithms CRF, RF_FS and WRF with CBR on data from 11 medical databases and compared the results they produced. We found that WRF and RF_FS give better results than CRF. The experiemental results show the performance and robustess of the proposed approach.

Download Full-text

Accuracy improvement of quantitative LIBS analysis of coal properties using a hybrid model based on a wavelet threshold de-noising and feature selection method

Applied Optics ◽

10.1364/ao.394746 ◽

2020 ◽

Vol 59 (22) ◽

pp. 6443

Author(s):

Peng Lu ◽

Zhuang Zhuo ◽

Wenhao Zhang ◽

Jing Tang ◽

Hailong Tang ◽

...

Keyword(s):

Feature Selection ◽

Hybrid Model ◽

Feature Selection Method ◽

Selection Method ◽

Accuracy Improvement ◽

Model Based ◽

Coal Properties

Download Full-text

An effective hybrid model based on PSO-SVM algorithm with a new local search for feature selection

2014 4th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke.2014.6993448 ◽

2014 ◽

Cited By ~ 2

Author(s):

Ehsan Eslami ◽

Mahdi Eftekhari

Keyword(s):

Feature Selection ◽

Local Search ◽

Hybrid Model ◽

Model Based ◽

Svm Algorithm

Download Full-text

A hybrid model integrating improved flower pollination algorithm-based feature selection and improved random forest for NO X emission estimation of coal-fired power plants

Measurement ◽

10.1016/j.measurement.2018.04.069 ◽

2018 ◽

Vol 125 ◽

pp. 303-312 ◽

Cited By ~ 4

Author(s):

Fang Wang ◽

Suxia Ma ◽

He Wang ◽

Yaodong Li ◽

Zhiguo Qin ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Hybrid Model ◽

Power Plants ◽

Flower Pollination Algorithm ◽

Flower Pollination ◽

Emission Estimation

Download Full-text

Feature selection to increase the random forest method performance on high dimensional data

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i3.471 ◽

2020 ◽

Vol 6 (3) ◽

pp. 303

Author(s):

Maria Irmina Prasetiyowati ◽

Nur Ulfa Maulidevi ◽

Kridanto Surendro

Keyword(s):

Feature Selection ◽

Random Forest ◽

Land Cover ◽

High Dimensional Data ◽

Feature Selection Method ◽

Urban Land ◽

Selection Method ◽

High Dimensional ◽

Urban Land Cover ◽

Random Forest Method

Random Forest is a supervised classification method based on bagging (Bootstrap aggregating) Breiman and random selection of features. The choice of features randomly assigned to the Random Forest makes it possible that the selected feature is not necessarily informative. So it is necessary to select features in the Random Forest. The purpose of choosing this feature is to select an optimal subset of features that contain valuable information in the hope of accelerating the performance of the Random Forest method. Mainly for the execution of high-dimensional datasets such as the Parkinson, CNAE-9, and Urban Land Cover dataset. The feature selection is done using the Correlation-Based Feature Selection method, using the BestFirst method. Tests were carried out 30 times using the K-Cross Fold Validation value of 10 and dividing the dataset into 70% training and 30% testing. The experiments using the Parkinson dataset obtained a time difference of 0.27 and 0.28 seconds faster than using the Random Forest method without feature selection. Likewise, the trials in the Urban Land Cover dataset had 0.04 and 0.03 seconds, while for the CNAE-9 dataset, the difference time was 2.23 and 2.81 faster than using the Random Forest method without feature selection. These experiments showed that the Random Forest processes are faster when using the first feature selection. Likewise, the accuracy value increased in the two previous experiments, while only the CNAE-9 dataset experiment gets a lower accuracy. This research’s benefits is by first performing feature selection steps using the Correlation-Base Feature Selection method can increase the speed of performance and accuracy of the Random Forest method on high-dimensional data.

Download Full-text