scholarly journals HYPER HEURISTIC EVOLUTIONARY APPROACH FOR CONSTRUCTING DECISION TREE CLASSIFIERS

2021 ◽  
Vol 20 (Number 2) ◽  
pp. 249-276
Author(s):  
Sunil Kumar ◽  
Saroj Ratnoo ◽  
Jyoti Vashishtha

Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.

2020 ◽  
Vol 6 (2) ◽  
pp. 169-178
Author(s):  
Wahyu Setiady ◽  
Y.B. Adyapaka Apatya

Rancang bangun alat klasifikasi suhu dan kelembaban ruang kerja dengan menggunakan model decision tree. Berdasarkan tabel standar tata cara perencanaan teknis konservasi energi pada bangunan gedung, suhu nyaman optimal ada pada kisaran 22,8oC – 25,8 oC dengan ambang atas 28 oC dan kelembaban 70%. Dengan memanfaatkan decision tree classifier, suhu dan kelembaban ruangan yang dideteksi oleh sensor DHT11 diklasifikasikan berdasarkan model yang telah dibuat dengan menggunakan Raspberry Pi 3 dan node red. Penelitian ini dilaksanakan di laboratorium komputer Politeknik Industri ATMI yang juga digunakan sebagai laboratorium riset terapan yang bekerjasama dengan industri dalam bidang pengembangan perangkat lunak otomasi. Penelitian ini berhasil membuat alat klasifikasi suhu dan kelembaban ruang kerja dengan menggunakan model decision tree yang menghasilkan status dingin, sejuk nyaman, nyaman optimal, hangat nyaman dan panas dengan tingkat prediksi model 0,983.  


Author(s):  
Junhua Hu ◽  
Xiangzhu Ou ◽  
Pei Liang ◽  
Bo Li

AbstractWart is a disease caused by human papillomavirus with common and plantar warts as general forms. Commonly used methods to treat warts are immunotherapy and cryotherapy. The selection of proper treatment is vital to cure warts. This paper establishes a classification and regression tree (CART) model based on particle swarm optimisation to help patients choose between immunotherapy and cryotherapy. The proposed model can accurately predict the response of patients to the two methods. Using an improved particle swarm algorithm (PSO) to optimise the parameters of the model instead of the traditional pruning algorithm, a more concise and more accurate model is obtained. Two experiments are conducted to verify the feasibility of the proposed model. On the hand, five benchmarks are used to verify the performance of the improved PSO algorithm. On the other hand, the experiment on two wart datasets is conducted. Results show that the proposed model is effective. The proposed method classifies better than k-nearest neighbour, C4.5 and logistic regression. It also performs better than the conventional optimisation method for the CART algorithm. Moreover, the decision tree model established in this study is interpretable and understandable. Therefore, the proposed model can help patients and doctors reduce the medical cost and improve the quality of healing operation.


Loan Default Prediction For Social Lending Is An Emerging Area Of Research In Predictive Analytics. The Need For Large Amount Of Data And Few Available Studies In The Current Loan Default Prediction Models For Social Lending Suggest That Other Viable And Easily Implementable Models Should Be Investigated And Developed. In View Of This, This Study Developed A Data Mining Model For Predicting Loan Default Among Social Lending Patrons, Specifically The Small Business Owners, Using Boosted Decision Tree Model. The United States Small Business Administration (Usba) PubliclyAvailable Loan Administration Dataset Of 27 Features And 899164 Data Instances Was Used In 80:20 Ratios For The Training And Testing Of The Model. 16 Data Features Were Finally Used As Predictors After Data Cleaning And Feature Engineering. The Gradient Boosting Decision Tree Classifier Recorded 99% Accuracy Compared To The Basic Decision Tree Classifier Of 98%. The Model Is Further Evaluated With (A) Receiver Operating Characteristics (Roc) And Area Under Curve (Auc), (B) Cumulative Accuracy Profile (Cap), And (C) Cumulative Accuracy Profile (Cap) Under Auc. Each Of These Model Performance Evaluation Metrics, Especially Roc-Auc, Showed The Relationship Between The True Positives And False Positives That Implies The Model Is A Good Fit.


Author(s):  
Umu Sa'adah ◽  
Masithoh Yessi Rochayani ◽  
Ani Budi Astuti

<p>Classifying high-dimensional data are a challenging task in data mining. Gene expression data is a type of high-dimensional data that has thousands of features. The study was proposing a method to extract knowledge from high-dimensional gene expression data by selecting features and classifying. Lasso was used for selecting features and the classification and regression tree (CART) algorithm was used to construct the decision tree model. To examine the stability of the lasso decision tree, we performed bootstrap aggregating (Bagging) with 50 replications. The gene expression data used was an ovarian tumor dataset that has 1,545 observations, 10,935 gene features, and binary class. The findings of this research showed that the lasso decision tree could produce an interpretable model that theoretically correct and had an accuracy of 89.32%. Meanwhile, the model obtained from the majority vote gave an accuracy of 90.29% which showed an increase in accuracy of 1% from the single lasso decision tree model. The slightly increasing accuracy shows that the lasso decision tree classifier is stable.</p>


2014 ◽  
Vol 26 (05) ◽  
pp. 1450059 ◽  
Author(s):  
Kan Luo ◽  
Jianqing Li ◽  
Jianfeng Wu ◽  
Hua Yang ◽  
Gaozhi Xu

Unintentional falls cause serious health problem and high medical cost, particularly among the elders. Efficient fall detection can ensure fallen subjects with timely rescue, less pain and lower health-care expense. However, the accuracy of the present fall detection system with single accelerometer does not meet the requirement of practical application. In this paper, a fall detection method using three wearable triaxial accelerometers and a decision-tree classifier is proposed. The three triaxial accelerometers are, respectively mounted on the head, the waist and the ankle to capture the acceleration signals of human movement. A Kalman filter is adopted to estimate the body tilt angle. After the features are extracted, the trained decision-tree model is used to predict the fall. The efficiency improvement is evidenced by the scripted and unscripted lateral fall experiments, involving five young healthy volunteers (three males and two females; age: 23.3 ± 1 years). The classification of fall and activities of daily living (ADL) achieve recall, precision and F-value of 93.1%, 95.9%, and 94.5%, respectively, and the system detects all falls during the extended unscripted trials. The experimental results indicate that the complementary movement information coming from three accelerometers can enhance the performance of fall detection. The proposed method is efficient, and it has remarkable improvements in comparison to the method of using one or two accelerometers.


Author(s):  
Avijit Kumar Chaudhuri ◽  
Deepankar Sinha ◽  
Dilip K. Banerjee ◽  
Anirban Das

Sign in / Sign up

Export Citation Format

Share Document