Improving an Ensemble Model Using Instance Selection Method

Author(s):  
Sung-Hwan Min ◽  
2020 ◽  
pp. 1-17
Author(s):  
Dongqi Yang ◽  
Wenyu Zhang ◽  
Xin Wu ◽  
Jose H. Ablanedo-Rosas ◽  
Lingxiao Yang ◽  
...  

With the rapid development of commercial credit mechanisms, credit funds have become fundamental in promoting the development of manufacturing corporations. However, large-scale, imbalanced credit application information poses a challenge to accurate bankruptcy predictions. A novel multi-stage ensemble model with fuzzy clustering and optimized classifier composition is proposed herein by combining the fuzzy clustering-based classifier selection method, the random subspace (RS)-based classifier composition method, and the genetic algorithm (GA)-based classifier compositional optimization method to achieve accuracy in predicting bankruptcy among corporates. To overcome the inherent inflexibility of traditional hard clustering methods, a new fuzzy clustering-based classifier selection method is proposed based on the mini-batch k-means algorithm to obtain the best performing base classifiers for generating classifier compositions. The RS-based classifier composition method was applied to enhance the robustness of candidate classifier compositions by randomly selecting several subspaces in the original feature space. The GA-based classifier compositional optimization method was applied to optimize the parameters of the promising classifier composition through the iterative mechanism of the GA. Finally, six datasets collected from the real world were tested with four evaluation indicators to assess the performance of the proposed model. The experimental results showed that the proposed model outperformed the benchmark models with higher predictive accuracy and efficiency.


2020 ◽  
Vol 38 (4) ◽  
pp. 835-858
Author(s):  
Jiaming Liu ◽  
Liuan Wang ◽  
Linan Zhang ◽  
Zeming Zhang ◽  
Sicheng Zhang

PurposeThe primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).Design/methodology/approachThis study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.FindingsThe results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.Practical implicationsThis study proposed a novel BG prediction framework for better predictive analytics in health care.Social implicationsThis study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.Originality/valueThe majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.


Author(s):  
Shayane de Oliveira Moura ◽  
Marcelo Bassani de Freitas ◽  
Halisson A. C. Cardoso ◽  
George D. C. Cavalcanti

2020 ◽  
Vol 20 (10) ◽  
pp. 2040039
Author(s):  
SANG-HONG LEE

In this study, a new instance selection method that combines the neural network with weighted fuzzy memberships (NEWFM) and Takagi–Sugeno (T–S) fuzzy model was proposed to improve the classification accuracy of healthy people and Parkinson’s disease (PD) patients. In order to evaluate the proposed instance selection for the classification accuracy of healthy people and PD patients, foot pressure data were collected from healthy people and PD patients as experimental data. This study uses wavelet transforms (WTs) to remove the noise from the foot pressure data in preprocessing step. The proposed instance selection method is an algorithm that selects instances using both weighted mean defuzzification (WMD) in the T–S fuzzy model and the confidence interval of a normal distribution used in statistics. The classification accuracy was compared before and after instance selection was applied to prove the superiority of instance selection. Classification accuracy before and after instance selection was 77.33% and 78.19%, respectively. The classification accuracy after instance selection exhibited a higher classification accuracy than that before instance selection by 0.86%. Further, McNemar’s test, which is used in statistics, was employed to show the difference in classification accuracy before and after instance selection was applied. The results of the McNemar’s test revealed that the probability of significance was smaller than 0.05, which reaffirmed that the classification accuracy was better when instance selection was applied than when instance selection was not applied. NEWFM includes the bounded sum of weighted fuzzy memberships (BSWFMs) that can easily show the differences in the graphically distinct characteristics between healthy people and PD patients. This study proposes new technique that NEWFM can detect PD patients from foot pressure data by the BSWFMs embedded in devices or systems.


Sign in / Sign up

Export Citation Format

Share Document